Received: with ECARTIS (v1.0.0; list xfs); Sun, 23 Sep 2007 22:47:23 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.2.0-pre1-r499012 (2007-01-23) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_43 autolearn=no version=3.2.0-pre1-r499012 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l8O5lGQ3020726 for ; Sun, 23 Sep 2007 22:47:19 -0700 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA08432; Mon, 24 Sep 2007 15:47:09 +1000 Date: Mon, 24 Sep 2007 15:50:57 +1000 To: "David Chinner" , "Eric Sandeen" Subject: Re: something very strange w/ filestreams... From: "Barry Naujok" Organization: SGI Cc: xfs-oss Content-Type: text/plain; format=flowed; delsp=yes; charset=iso-8859-15 MIME-Version: 1.0 References: <46F49C80.60007@sandeen.net> <20070923092444.GQ995458@sgi.com> Message-ID: In-Reply-To: <20070923092444.GQ995458@sgi.com> User-Agent: Opera Mail/9.10 (Win32) X-Virus-Scanned: ClamAV version 0.90, clamav-milter version devel-120207 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id l8O5lKQ3020739 X-archive-position: 13056 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Sun, 23 Sep 2007 19:24:44 +1000, David Chinner wrote: > Barry - I think xfs_repair might be finding the incorrect superblock > for the repair. Tests 172, 173 and 174 use less than the whole disk, > so there are going to be stale superblocks all over the place.... > >> hm, no zone name, length of 0x22222274? >> >> I already provided a metadump image to Barry, but I wonder why the >> timing(?) seems to make a difference here... first sign of things going >> awry in repair is: >> >> Phase 2 - using internal log >> - zero log... >> - scan filesystem freespace and inode maps... >> bad length 131072 for agf 0, should be 4096 >> bad length # 131072 for agi 0, should be 4096 > > Yes - test 173 uses 1GB filesystem with 64x16MB AGs - 4096 * 4k block > size = 16MB AG. definitely looks like a stale superblock being > found. > > Barry, I think that the secondary superblock needs better verification > (e.g. that there really are AG headers where the sb says there > are supposed to be and all the lengths match up). > > Eric - you can relax. Filestreams is not hosing your filesystem; > xfs_reapir > is.... Test 178 is designed to test mkfs.xfs in http://oss.sgi.com/archives/xfs/2007-07/msg00139.html and will still make xfs_repair go bananas if there is other old AG headers. So, before running this test, you should make sure your test partitions are completely zeroed from mkfs's that occurred before that recent version of mkfs.xfs was installed. I tried on my test box and sure enough, xfs_repair barfed. After zeroing the devices, 172, 174 & 178 sequence succeeded. If you have failures after the zeroing and ONLY using the latest mkfs.xfs then something else is wrong. Also, xfs_copy/xfs_mdrestore of different images could still trigger the problem. There is a TODO to improve xfs_repair's handling of this scenario. Barry.