David Chinner <dgc@xxxxxxx> writes:
> Nope. To do that, we'd need to implement some type of Reed-Solomon
> coding and would need to use more bits on disk to store the ECC
> data. That would have a much bigger impact on log throughput than a
> table based CRC on a chunk of data that is hot in the CPU cache.
Processing or rewriting cache hot data shouldn't be significantly
different in cost (assuming the basic CPU usage of the algorithms
is not too different); just the cache lines need to be already exclusive
which is likely the case with logs.
> And we'd have to write the code as well. ;)
Modern kernels have R-S functions in lib/reed_solomon. They
are used in some of the flash file systems. I haven't checked
how their performance compares to standard CRC though.
> However, I'm not convinced that this sort of error correction is the
> best thing to do at a high level as all the low level storage
> already does Reed-Solomon based bit error correction. I'd much
> prefer to use a different method of redundancy in the filesystem so
> the error detection and correction schemes at different levels don't
> have the same weaknesses.
Agreed. On the file system level the best way to handle this is
likely data duplicated on different blocks.
> That means the filesystem needs strong enough CRCs to detect bit
> errors and sufficient structure validity checking to detect gross
> errors. XFS already does pretty good structure checking; we don't
The trouble is that it tends to go to too drastic measures (shutdown) if it
detects any inconsistency.