Here are some things I stumbled over during XFS stress testing.
I don't have time to fix them, they're not that critical (except for the
last one perhaps which is a admittedly bit vague); but I thought I would
just note them.
- When the log replay fails not all objects in the XFS zones are freed.
This causes one of the zones not to be freed on module unload afterwards,
and when XFS module loads again stumbles over an BUG() in slab.c that
checks for duplicate zones.
- pagebuf layer will surely not run on sparc32; it passes the
_irqsave interrupt mask between functions which causes quick crashes there
because on that architecture the interrupt mask includes the register
window pointer and it can only be restored in the same function.
(e.g. _pagebuf_free_lockable_buffer violates that)
[I personally do not care about sparc32, I just thought I would note it.
It is probably fairly easy to fix if anyone on the list is motivated]
- When a filesystem shutdown occurs there seem to be some problems in the
error cleanup paths. For example page locks seem to get leaked. In one
case I had a whole bunch of processes stuck in lock_page and wait_on_page
after a shutdown.
- I had a buffer leak for some time in my version of XFS that eventually
caused it to run out of memory because pagecache pages were not freed
(no other corruption though) When that happened under heavy fsstress
load I usually pressed reset because user space was dead. I think the
page leak itself didn't cause any corruption. In several cases I got a
corrupted file system afterwards that needed an xfs_repair; otherwise even
after a remount with log replay it would shutdown the file system (from
various places) while accessing fsstress test directory. In one case
I got an corrupted log with garbage in it that caused log replay to fail.