Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id fADExIJ15899 for linux-xfs-outgoing; Tue, 13 Nov 2001 06:59:18 -0800 Received: from relay.xlink.net (relay.xlink.net [193.141.40.4]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id fADExA015875 for ; Tue, 13 Nov 2001 06:59:10 -0800 Received: from lizard.webland.de (lizard.webland.de [194.122.76.201]) by relay.xlink.net (8.9.3/8.8.7) with ESMTP id PAA28824; Tue, 13 Nov 2001 15:59:08 +0100 (MET) Received: (from uucp@localhost) by lizard.webland.de (8.8.8/8.8.7) id PAA19994; Tue, 13 Nov 2001 15:59:08 +0100 (MET) >Received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id A5E2D57306; Tue, 13 Nov 2001 15:58:52 +0100 (CET) Received: from ch.sauter-bc.com (support.cad.sba [10.1.200.117]) by mobile.sauter-bc.com (Postfix) with ESMTP id 6A8D625835; Tue, 13 Nov 2001 15:58:52 +0100 (CET) Message-ID: <3BF1352C.3EE78829@ch.sauter-bc.com> Date: Tue, 13 Nov 2001 15:58:52 +0100 From: Simon Matter Organization: Sauter AG, Basel X-Mailer: Mozilla 4.77 [de] (X11; U; Linux 2.2.19-6.2.12 i686) X-Accept-Language: de-CH, en MIME-Version: 1.0 To: Marcus Hast Cc: linux-xfs@oss.sgi.com Subject: Re: Harddrive error and XFS corruption References: <20011113143336109.AAA296.51@e414.mhk.lu.se> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk Marcus Hast schrieb: > > Hi all, > I have 3 disks in a LVM volume with XFS on it. After a recent powerfailiure it > no longer came up. At first it would try to do a recovery and get a lot of: > > hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error } > hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=134840697, > sector=134840696 > end_request: I/O error, dev 22:01 (hdg), sector 134840696 It means you have a disk with bad sectors. Some people can work for along time with this drive until the OS wants to access the area with bad sectors. If you are using softraid you will not be able to use the drive because the problem will occur immediately when resyncing the array. This error was always critical for me. > > As I go through the log now however I see some new errors: > > hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } > hdg: read_intr: error=0x40 { UncorrectableError }, LBAsect=134840697, > sector=134840696 > end_request: I/O error, dev 22:01 (hdg), sector 134840696 Don't know what this is: DataRequest Error. I guess the root of the problem is the same like above. > > I take it that this means it has gone worse. (read_intr error instead of > dma_intr which I have seen is quite common.) This is on a LVM volume with 220G > of data. I felt the same pain several times... > > So I have a few questions: > Is there any way of getting the data on the other disks back? From what I've > seen of the logs it's hdg that's bad. > > Is there any way of getting warned about this before it happens? I did get a > lot of dma_intr errors first, but it seemed to me then that a lot of other > people were getting them and safely (?) ignoring them. (From the kernel and > LVM > lists.) > > Is there any way I can be "proactive" in avoiding this? By storing metadata > redundantly for instance? (I assume that in this particular case it's those > parts of the drive which has gone, which is why I'm left with an unmount and > unrecoverable system.) > > Would a check with for instance Bonnie catch a problem like this before it > gets > bad? My answer: Use RAID! All those cheap big IDE drives have the problem of not being very reliable and SoftRAID is very good on linux. Use it. Sometimes you can buy 8 identical drives and 3 or 4 of them fail after some hours of stress. S.M.A.R.T should also tell you when things go bad with you drive but I don't care about that. -Simon > > I've seen this in a couple of places now, perhaps it would be a good idea to > put it in the FAQ or some documents? > > Marcus Hast, Lund, Sweden, Earth. > Living long and prosperous.