xfs
[Top] [All Lists]

Re: kernel oops on debian , 2.6.18-5

To: "Yann Dupont" <Yann.Dupont@xxxxxxxxxxxxxx>, "David Chinner" <dgc@xxxxxxx>
Subject: Re: kernel oops on debian , 2.6.18-5
From: "Barry Naujok" <bnaujok@xxxxxxx>
Date: Wed, 19 Dec 2007 11:38:52 +1100
Cc: xfs@xxxxxxxxxxx, "Jacky Carimalo" <jacky.carimalo@xxxxxxxxxxxxxx>
In-reply-to: <4767DC20.1080406@xxxxxxxxxxxxxx>
Organization: SGI
References: <476790D5.6040205@xxxxxxxxxxxxxx> <20071218123259.GL4396912@xxxxxxx> <4767DC20.1080406@xxxxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Opera Mail/9.24 (Win32)
On Wed, 19 Dec 2007 01:41:36 +1100, Yann Dupont <Yann.Dupont@xxxxxxxxxxxxxx> wrote:

David Chinner wrote:
On Tue, Dec 18, 2007 at 10:20:21AM +0100, Yann Dupont wrote:

Hello, we got a kernel oops, probably in xfs on a debian kernel.

This volume is on SAN + device mapper.
this is a 1 TB  volume. It was in service for more than 2 ou 3 years.
There is a high humber of files on it, as this volume serves for a
rsyncd, where 200+ servers sync their root filesystem on it every day.

here is the oops :

Dec 16 23:27:32 inchgower kernel: XFS internal error
XFS_WANT_CORRUPTED_GOTO at line 1561 of file fs/xfs/xfs_alloc.c. Caller
0xffffffff881857b7
Dec 16 23:27:32 inchgower kernel:
Dec 16 23:27:32 inchgower kernel: Call Trace:
Dec 16 23:27:32 inchgower kernel:  [<ffffffff88183ec0>]
:xfs:xfs_free_ag_extent+0x19f/0x67f


corrupted freespace btree. what does xfs_check tell you about the
filesystem on dm-3?


xfs_check tells me to run xfs_repair -L, the attempts to mount the FS
to clear the logs ending in kernel oops.

[snip]

Phase 4 - check for duplicate blocks...
         - setting up duplicate extent list...
         - check for inodes claiming duplicate blocks...
         - agno = 0

)

And now the process seems stuck.
There is no activity on the san disk ;

a ps show this :

root 7885 6466 7885 0 6 1447133 5660020 6 09:55 pts/0 00:00:19 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17190 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17191 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17192 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17193 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17194 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2


and a strace this :
inchgower:~# strace -fp 7885
Process 17194 attached with 6 threads - interrupt to quit
[pid 17191] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...>
[pid 17192] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...>
[pid 17193] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...>
[pid 17194] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...>
[pid 17190] futex(0x67e4f8, FUTEX_WAIT, 2, NULL

Can I stop the process and start another version without risking problems ?

Yes, you can stop and restart. In your scenario, run xfs_repair -P to
disable prefetch which is getting stuck.

Barry.


<Prev in Thread] Current Thread [Next in Thread>