[Top] [All Lists]

Re: Xfs Access to block zero exception and system crash

To: xfs@xxxxxxxxxxx
Subject: Re: Xfs Access to block zero exception and system crash
From: Sagar Borikar <sagar_borikar@xxxxxxxxxxxxxx>
Date: Mon, 30 Jun 2008 11:37:23 +0530
In-reply-to: <20080630034112.055CF18904C4@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Organization: PMC Sierra Inc
References: <340C71CD25A7EB49BFA81AE8C839266701323BD8@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20080625084931.GI16257@xxxxxxxxxxxxxxxxxxxxx> <340C71CD25A7EB49BFA81AE8C839266701323BE8@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20080626070215.GI11558@disturbed> <4864BD5D.1050202@xxxxxxxxxxxxxx> <4864C001.2010308@xxxxxxxxxxxxxx> <20080628000516.GD29319@disturbed> <340C71CD25A7EB49BFA81AE8C8392667028A1CA7@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20080629215647.GJ29319@disturbed> <20080630034112.055CF18904C4@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird (X11/20080421)

Sagar Borikar wrote:
Dave Chinner wrote:
On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote:
  Device Boot Start End Blocks Id System
/dev/scsibd1             126         286       20608   83  Linux
/dev/scsibd2             287        1023       94336   83  Linux
/dev/scsibd3            1149        1309       20608   83  Linux
/dev/scsibd4            1310        2046       94336   83  Linux

I'd have to assume thats a flash based root drive, right?

That's right,
Disk /dev/md0: 251.0 GB, 251000160256 bytes
2 heads, 4 sectors/track, 61279336 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/dm-0: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Neither of these tell me what /dev/RAIDA/vol is....
It is the device node to which /mnt/RAIDA/vol is mapped to. Its a JBOD with 233 GB size.
But still the issue is why doesn't it happen every time and less stress?

I am surprised to see to let this happen immediately when the
subdirectories increase more than 30. Else it decays slowly.

So it happens when you get more than 30 entries in a directory
under a certain load? That might be an extent->btree format
conversion bug or vice versa. I'd suggest setting up a test based
around this to try to narrow down the problem.


Thanks for all your help. Shall keep you posted with the progress on debugging.


Sorry if I was not clear. As I mentioned the frequency of finding bad extents is much higher when I increase simultaneous transactions to 30 ( say in 5 min ) but if I run only two copies in infinite loop, the issue crops up in 2-3 hours roughly. And all the copies plus pdflush are in uninterruptible sleep state continuously. And it is not uninterruptible sleep and waiting state ( DW ) but just uninterruptible ( D ).

<Prev in Thread] Current Thread [Next in Thread>