Received: with ECARTIS (v1.0.0; list xfs); Tue, 18 Dec 2007 16:38:19 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id lBJ0c9Rd005507 for ; Tue, 18 Dec 2007 16:38:14 -0800 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA11697; Wed, 19 Dec 2007 11:38:10 +1100 Date: Wed, 19 Dec 2007 11:38:52 +1100 To: "Yann Dupont" , "David Chinner" Subject: Re: kernel oops on debian , 2.6.18-5 From: "Barry Naujok" Organization: SGI Cc: xfs@oss.sgi.com, "Jacky Carimalo" Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <476790D5.6040205@univ-nantes.fr> <20071218123259.GL4396912@sgi.com> <4767DC20.1080406@univ-nantes.fr> Message-ID: In-Reply-To: <4767DC20.1080406@univ-nantes.fr> User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/5174/Tue Dec 18 11:07:58 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id lBJ0cGRd005523 X-archive-position: 14008 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Wed, 19 Dec 2007 01:41:36 +1100, Yann Dupont wrote: > David Chinner wrote: >> On Tue, Dec 18, 2007 at 10:20:21AM +0100, Yann Dupont wrote: >> >>> Hello, we got a kernel oops, probably in xfs on a debian kernel. >>> >>> This volume is on SAN + device mapper. >>> this is a 1 TB volume. It was in service for more than 2 ou 3 years. >>> There is a high humber of files on it, as this volume serves for a >>> rsyncd, where 200+ servers sync their root filesystem on it every day. >>> >>> here is the oops : >>> >>> Dec 16 23:27:32 inchgower kernel: XFS internal error >>> XFS_WANT_CORRUPTED_GOTO at line 1561 of file fs/xfs/xfs_alloc.c. >>> Caller >>> 0xffffffff881857b7 >>> Dec 16 23:27:32 inchgower kernel: >>> Dec 16 23:27:32 inchgower kernel: Call Trace: >>> Dec 16 23:27:32 inchgower kernel: [] >>> :xfs:xfs_free_ag_extent+0x19f/0x67f >>> >> >> corrupted freespace btree. what does xfs_check tell you about the >> filesystem on dm-3? >> >> > xfs_check tells me to run xfs_repair -L, the attempts to mount the FS > to clear the logs ending in kernel oops. [snip] > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > > ) > > And now the process seems stuck. > There is no activity on the san disk ; > > a ps show this : > > root 7885 6466 7885 0 6 1447133 5660020 6 09:55 pts/0 > 00:00:19 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17190 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17191 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17192 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17193 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17194 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > > > and a strace this : > inchgower:~# strace -fp 7885 > Process 17194 attached with 6 threads - interrupt to quit > [pid 17191] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL > [pid 17192] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL > [pid 17193] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL > [pid 17194] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL > [pid 17190] futex(0x67e4f8, FUTEX_WAIT, 2, NULL > > Can I stop the process and start another version without risking > problems ? Yes, you can stop and restart. In your scenario, run xfs_repair -P to disable prefetch which is getting stuck. Barry.