>> There is one vital detail here: the XFS design in effect makes
>> two assumptions:
>> * The block layer is error free. By and large XFS does not even
>> check that the block layer behaves perfectly. It is the sysadm
>> responsibility to ensure that.
> Now, that's not quite accurate. XFS is -very- good at handling
> IO errors in general, and at detecting & handling metadata
But what about recovering from bad blocks? For example bad blocks
that happen in the middle of a chunk of metadata? IIRC one cannot
even pass a list of bad blocks to 'mkfs.xfs' (JFS can handle that
statically, 'ext3' semi-dynamically). If one is lucky XFS will
have in some cases enough redundancy in the metadata to allow a
repair, but it does not seem to me that this has been a design
goal, just happenstance and lots of work in 'xfs_repair'.
> (potentially coming up from bad hardware) at runtime.... (where
> "handling" may mean "detecting and shutting down gracefully")
Without data loss? Because that is what matter to quit a few
people. Handling data loss by acknowledging it is a bit of an
optimistic usage of "handling", even if ignoring errors is even
worse. My impression is that XFS assumes that the block layer
handles all data loss (and maybe corruption) issues.
> XFS -does- expect that when the hardware says an IO is complete,
> it is complete and safe on disk.
As to this, actually this is one of the few areas where XFS
actually is to be praised as it does try to check whether the
block layer claims to do that.
> If that's what you refer to then we're in agreement.
I was not saying that XFS ignores errors; I am referring to making
the file system detects and works around block layer failures
without data loss or at least usability loss.
Some file systems add metadata to each block or extent to
facilitate data reconstruction in the case of bad blocks arising,
whether detected by the block layer or even without, some add it
to facilitate metadata index reconstruction (e.g. accidentally
VFAT, by design Reiser).
Personally I think that designing XFS to put all data loss issues
to the block device layer was the right decision given its likely
goals. I am more of a believer in end-to-end reliability checks
and/or block layer redundancy.