On Fri, Sep 21, 2007 at 11:39:28PM -0500, Eric Sandeen wrote:
> if I do:
> for I in 173 174 178; do ./check $I; done
> it's not terribly interesting, things seem to go ok, just normal
> filestreams failures ;-)
> if I do:
> ./check 173 174 178
> things go very badly; the very first repair in 178 finds a horribly
> corrupted filesystem, and repair tips over (memory appears corrupted, as
> witnessed by):
Well, i get:
budgie:~/dgc/xfstests # ./check -l 173 174 178
FSTYP -- xfs (debug)
PLATFORM -- Linux/ia64 budgie 2.6.23-rc4-dgc-xfs
MKFS_OPTIONS -- -f -bsize=4096 /dev/sdb9
MOUNT_OPTIONS -- /dev/sdb9 /mnt/scratch
173 75s ...
174 16s ...
178 *** glibc detected *** /sbin/xfs_repair: double free or corruption
(!prev): 0x600000000000ebc0 ***
======= Backtrace: =========
======= Memory map: ========
Just executing ./check -l 174 178 isn't sufficient, but
./check -l 172 174 178 triggers it. 172,173,178 does not
trigger it, so it's something to do with test 174 running after
another filestreams test but before 178.
Well, what does test 178 do? Oh, it mkfs's a new filesystem on
the scratch device and then hoses the superblock and tries to
use secondary superblocks to reconstruct it successfully.
I'm guessing that it is finding a superblock from a previous test
and incorrectly using that, finding stuff all nasty and inconsistent
due to the more recent mkfs....
Given this error:
bad length 156382 for agf 0, should be LENGTH
bad length # 156382 for agi 0, should be LENGTH
I think that is what is happening - those messages only come up
when teh agf/agi lengths don't match the superblock, and that points
to using the wrong superblock for recovery.
# mkfs.xfs -f /dev/sdb9
meta-data=/dev/sdb9 isize=256 agcount=8, agsize=156382 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=1251056, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=2560, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
An AG length of 156382 is correct.
Hmmm - just a plain:
# ./check -l 172 174 ; mkfs.xfs -f /dev/sdb9; dd if=/dev/zero of=/dev/sdb9
bs=512 count=1 ; xfs_repair /dev/sdb9
reproduces the problem.
Barry - I think xfs_repair might be finding the incorrect superblock
for the repair. Tests 172, 173 and 174 use less than the whole disk,
so there are going to be stale superblocks all over the place....
> hm, no zone name, length of 0x22222274?
> I already provided a metadump image to Barry, but I wonder why the
> timing(?) seems to make a difference here... first sign of things going
> awry in repair is:
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> bad length 131072 for agf 0, should be 4096
> bad length # 131072 for agi 0, should be 4096
Yes - test 173 uses 1GB filesystem with 64x16MB AGs - 4096 * 4k block
size = 16MB AG. definitely looks like a stale superblock being
Barry, I think that the secondary superblock needs better verification
(e.g. that there really are AG headers where the sb says there
are supposed to be and all the lengths match up).
Eric - you can relax. Filestreams is not hosing your filesystem; xfs_reapir
SGI Australian Software Group