Received: with ECARTIS (v1.0.0; list xfs); Mon, 28 Apr 2008 17:44:08 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m3T0htpx027490 for ; Mon, 28 Apr 2008 17:43:58 -0700 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA29903; Tue, 29 Apr 2008 10:44:32 +1000 Date: Tue, 29 Apr 2008 10:48:27 +1000 To: "Daniel Bast" , xfs@oss.sgi.com Subject: Re: xfs_admin -c 1 + xfs_repair problem From: "Barry Naujok" Organization: SGI Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <481617E0.3070801@gmx.net> Message-ID: In-Reply-To: <481617E0.3070801@gmx.net> User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id m3T0i0px027517 X-archive-position: 15660 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Tue, 29 Apr 2008 04:30:56 +1000, Daniel Bast wrote: > Hi, > > i tried to enable lazy counts with "xfs_admin -c 1 device" with > xfs_admin from xfsprogs 2.9.8. Unfortunately that process got stuck > without any message. After several hours without any IO or CPU workload > i killed the process and started xfs_repair, but that also got stuck (in > "Phase 6") without any IO or CPU workload or any extra message. The > xfs_repair being stuck in "Phase 6" is reproduceable with a > metadump-image of the filesystem. > > I was able to mount the device but don't want to use it because i'm not > sure if everything is ok. "xfs_admin -c 1" internally runs xfs_repair and hence why it got stuck too. Your filesystems is fine, the only changes that occured for enabling lazy-counters was in Phase 5, but may not have been written to disk. > How can i resolve that problem? What information do you need? I can > provide the metadump image (bzip compressed: 28MB) if necessary. Run xfs_repair -P to disable prefetch. The metadump would be very useful in finding out why xfs_repair got stuck. Regards, Barry. > Here are some informations that are maybe useful: > > xfs_repair -v /dev/sda7 > Phase 1 - find and verify superblock... > - block cache size set to 11472 entries > Phase 2 - using internal log > - zero log... > zero_log: head block 2 tail block 2 > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > Phase 5 - rebuild AG headers and trees... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - reset superblock... > Phase 6 - check inode connectivity... > - resetting contents of realtime bitmap and summary inodes > - traversing filesystem ... > - agno = 0 > > > after the killed xfs_admin -c 1 and xfs_repair processes: > xfs_info /dev/sda7 > meta-data=/dev/sda7 isize=256 agcount=4, agsize=24719013 > blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=98876050, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 > log =internal bsize=4096 blocks=32768, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=65536 blocks=0, rtextents=0 > > > a new 'xfs_repair -v /dev/sda7' straced: > strace -ff -p 6364 > Process 6409 attached with 6 threads - interrupt to quit > [pid 6364] futex(0x851e2cc, FUTEX_WAIT, 2, NULL > [pid 6405] futex(0xb146e3d8, FUTEX_WAIT, 0, NULL > [pid 6406] futex(0xb146e358, FUTEX_WAIT, 1, NULL > [pid 6407] futex(0xb146e358, FUTEX_WAIT, 2, NULL > [pid 6408] futex(0xb146e358, FUTEX_WAIT, 3, NULL > [pid 6409] futex(0xb146e358, FUTEX_WAIT, 4, NULL > [pid 6406] <... futex resumed> ) = -1 EAGAIN (Resource > temporarily unavailable) > [pid 6407] <... futex resumed> ) = -1 EAGAIN (Resource > temporarily unavailable) > [pid 6408] <... futex resumed> ) = -1 EAGAIN (Resource > temporarily unavailable) > [pid 6406] futex(0xb146e358, FUTEX_WAIT, 4, NULL > [pid 6407] futex(0xb146e358, FUTEX_WAIT, 4, NULL > [pid 6408] futex(0xb146e358, FUTEX_WAIT, 4, NULL > > > Thanks > Daniel > > P.S. Please CC me, because i'm not subscribed to the list. > >