[Top] [All Lists]

RE: linux software RAID, 2.6.6, XFS, Postgres: corrupt files

To: "James Foris" <jforis@xxxxxxxxx>, <linux-xfs@xxxxxxxxxxx>
Subject: RE: linux software RAID, 2.6.6, XFS, Postgres: corrupt files
From: "Ian Westmacott" <ianw@xxxxxxxxxxxxxx>
Date: Thu, 14 Apr 2005 00:53:05 -0400
Cc: <ianw@xxxxxxxxxxxxxx>
Importance: Normal
In-reply-to: <425DD4CD.1090108@xxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Well, I can provide a bit more information.

-- We have a number of these hardware systems.  As I said, it is very
   easy to reproduce, at some of them.  As it goes, it is at our Beta
   sites where it is easy to reproduce, and in our lab where it is
   tough to reproduce.  We are looking into why.

-- I was unable to try the sunit=0 & swidth=0 experiment: no matter
   what parameters I give to mkfs.xfs (sunit, swidth, su, sw, various
   args), or what options I use in mount, the filesystem is always
   created/mounted with the geometry read from the RAID.  (perhaps this
   is a known issue)

-- we are currently verifying a workaround:  we added a pseudo-service
   during shutdown that does

   dd if=/dev/zero of=/xfs_filesystem/junk bs=64k count=8k

   (and removes junk on startup).  On a system where this was
   nearly 100% repeatable, we have now gone though 10 reboot cycles
   without a problem (tests continue -- tough at a beta site).

-- The problem remains unchanged if Linux Software RAID is removed
   from the equation.  I stopped the RAID, formatted one of the disks
   as XFS (installed Postgres, etc.), and got the corruption on the
   first reboot.

Is there any definitive information known about what hardware
configurations are susceptible?



> -----Original Message-----
> Yes, this does sound like it might be the problem we are working on.
> The definitive test is to do a unmount/mount cycle instead of a reboot; if
> data corruption is found, then we are looking at the same thing.
> BTW, we have now duplicated this problem on the current Ubuntu Linux
> release,
> and on SUSE 9.2 (we will be checking 9.3 as soon as we get a copy... but
> I think
> we can recreate it there, too).
> Fortunately, it seems to be very hard to hit; it seems to be very
> hardware configuration
> dependent.
> Jim Foris

<Prev in Thread] Current Thread [Next in Thread>