Received: with ECARTIS (v1.0.0; list linux-xfs); Wed, 13 Apr 2005 22:11:04 -0700 (PDT) Received: from sccrmhc13.comcast.net (sccrmhc13.comcast.net [204.127.202.64]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j3E5B2Pq005025 for ; Wed, 13 Apr 2005 22:11:02 -0700 Received: from spectre (c-66-30-216-118.hsd1.ma.comcast.net[66.30.216.118]) by comcast.net (sccrmhc13) with SMTP id <2005041405105601600aocske>; Thu, 14 Apr 2005 05:10:56 +0000 From: "Ian Westmacott" To: "James Foris" , Cc: Subject: RE: linux software RAID, 2.6.6, XFS, Postgres: corrupt files Date: Thu, 14 Apr 2005 00:53:05 -0400 Message-ID: <000401c540ad$d865d780$76d81e42@hsd1.ma.comcast.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2377.0 In-Reply-To: <425DD4CD.1090108@wi.rr.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 Importance: Normal X-Virus-Scanned: ClamAV 0.83/825/Tue Apr 12 15:53:21 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 5280 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: ianw@intellivid.com Precedence: bulk X-list: linux-xfs Content-Length: 1864 Lines: 53 Well, I can provide a bit more information. -- We have a number of these hardware systems. As I said, it is very easy to reproduce, at some of them. As it goes, it is at our Beta sites where it is easy to reproduce, and in our lab where it is tough to reproduce. We are looking into why. -- I was unable to try the sunit=0 & swidth=0 experiment: no matter what parameters I give to mkfs.xfs (sunit, swidth, su, sw, various args), or what options I use in mount, the filesystem is always created/mounted with the geometry read from the RAID. (perhaps this is a known issue) -- we are currently verifying a workaround: we added a pseudo-service during shutdown that does dd if=/dev/zero of=/xfs_filesystem/junk bs=64k count=8k (and removes junk on startup). On a system where this was nearly 100% repeatable, we have now gone though 10 reboot cycles without a problem (tests continue -- tough at a beta site). -- The problem remains unchanged if Linux Software RAID is removed from the equation. I stopped the RAID, formatted one of the disks as XFS (installed Postgres, etc.), and got the corruption on the first reboot. Is there any definitive information known about what hardware configurations are susceptible? Thanks, --Ian > -----Original Message----- > Yes, this does sound like it might be the problem we are working on. > > The definitive test is to do a unmount/mount cycle instead of a reboot; if > data corruption is found, then we are looking at the same thing. > > BTW, we have now duplicated this problem on the current Ubuntu Linux > release, > and on SUSE 9.2 (we will be checking 9.3 as soon as we get a copy... but > I think > we can recreate it there, too). > > Fortunately, it seems to be very hard to hit; it seems to be very > hardware configuration > dependent. > > Jim Foris