[Top] [All Lists]

Re: XFS recovery issues

To: Ash <my_qa2004@xxxxxxxxx>
Subject: Re: XFS recovery issues
From: Nathan Scott <nathans@xxxxxxx>
Date: Tue, 31 Aug 2004 11:29:22 +1000
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20040830073422.28472.qmail@xxxxxxxxxxxxxxxxxxxxxxx>
References: <20040830073422.28472.qmail@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.3i
On Mon, Aug 30, 2004 at 12:34:22AM -0700, Ash wrote:
> Hi
> I was running a kind of crash test on an XFS
> filesystem to check recovery/corruptions from unclean
> shutdowns.
> ...
> logged the dmesg outputs for each reboot cycle
> and all of them showed that XFS recovery did not face
> any problems. The message seen in each dmesg log was
> Starting XFS recovery on filesystem: cciss/c0d0p8
> (dev: cciss/c0d0p8)
> Ending XFS recovery on filesystem: cciss/c0d0p8 (dev:
> cciss/c0d0p8

Hmm, we should fix that dup'd device name (looks like
you're running a debug version of XFS here...?)

> Here, in the "rm -rf" command for one of the
> directories, I noticed a hang.

A kdb backtrace at this point would have been useful (in
case you see it again).

> After sometime of inactivity, I rebooted the system (a
> clean reboot) and noticed
> that XFS recovery failed. The relevant sections of the
> boot messages are attached in xfs_bootup_failure.txt
> Next, I tried xfs_check. It basically printed a lot of
> "block 12/232064 type unknown not expected" messages
> and stopped responding too. I noticed a defunct xfs_db
> process on the system at this point.

That would be due to not yet running log recovery.  More
recent versions of xfs_check now act like repair, and wont
run on a filesystem with a dirty log.

> xfs_repair with -L also results in a hang after this
> point.
> Any ideas whats going wrong ?
> Basically, its looking like my filesystem is
> inaccessible now.
> I am unable to mount it or run any repair on it.

If you can't even repair, looks like the device has got
into a funny state (repair talks directly to the device).
I'd reboot to try clear that up, then run repair with -L
again see if that resolves it.

If repair still hangs, kdb will be of use - get a backtrace
on the hung repair process.

> Unable to handle kernel NULL pointer dereference at virtual address 000002f2
>  printing eip:
> c026447f
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in: usbcore
> CPU:    0
> EIP:    0060:[<c026447f>]    Not tainted
> EFLAGS: 00010282   (2.6.7-mirahp1compiled30jul)
> EIP is at xfs_trans_brelse+0x1f/0x100

There's a couple of known use-after-free bugs related to forced
filesystem shutdown, I suspect thats what you're hitting here
where it oops'd.




<Prev in Thread] Current Thread [Next in Thread>