xfs
[Top] [All Lists]

Re: Harddrive error and XFS corruption

To: Marcus Hast <hast@xxxxxxxxx>
Subject: Re: Harddrive error and XFS corruption
From: Simon Matter <simon.matter@xxxxxxxxxxxxxxxx>
Date: Tue, 13 Nov 2001 15:58:52 +0100
>received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id A5E2D57306; Tue, 13 Nov 2001 15:58:52 +0100 (CET)
Cc: linux-xfs@xxxxxxxxxxx
Organization: Sauter AG, Basel
References: <20011113143336109.AAA296.51@xxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
Marcus Hast schrieb:
> 
> Hi all,
> I have 3 disks in a LVM volume with XFS on it. After a recent powerfailiure it
> no longer came up. At first it would try to do a recovery and get a lot of:
> 
> hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=134840697,
> sector=134840696
> end_request: I/O error, dev 22:01 (hdg), sector 134840696

It means you have a disk with bad sectors. Some people can work for
along time with this drive until the OS wants to access the area with
bad sectors. If you are using softraid you will not be able to use the
drive because the problem will occur immediately when resyncing the
array. This error was always critical for me.

> 
> As I go through the log now however I see some new errors:
> 
> hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
> hdg: read_intr: error=0x40 { UncorrectableError }, LBAsect=134840697,
> sector=134840696
> end_request: I/O error, dev 22:01 (hdg), sector 134840696

Don't know what this is: DataRequest Error. I guess the root of the
problem is the same like above.

> 
> I take it that this means it has gone worse. (read_intr error instead of
> dma_intr which I have seen is quite common.) This is on a LVM volume with 220G
> of data.

I felt the same pain several times...

> 
> So I have a few questions:
> Is there any way of getting the data on the other disks back? From what I've
> seen of the logs it's hdg that's bad.
> 
> Is there any way of getting warned about this before it happens? I did get a
> lot of dma_intr errors first, but it seemed to me then that a lot of other
> people were getting them and safely (?) ignoring them. (From the kernel and
> LVM
> lists.)
> 
> Is there any way I can be "proactive" in avoiding this? By storing metadata
> redundantly for instance? (I assume that in this particular case it's those
> parts of the drive which has gone, which is why I'm left with an unmount and
> unrecoverable system.)
> 
> Would a check with for instance Bonnie catch a problem like this before it
> gets
> bad?

My answer: Use RAID! All those cheap big IDE drives have the problem of
not being very reliable and SoftRAID is very good on linux. Use it.
Sometimes you can buy 8 identical drives and 3 or 4 of them fail after
some hours of stress. S.M.A.R.T should also tell you when things go bad
with you drive but I don't care about that.

-Simon

> 
> I've seen this in a couple of places now, perhaps it would be a good idea to
> put it in the FAQ or some documents?
> 
> Marcus Hast, Lund, Sweden, Earth.
> Living long and prosperous.



<Prev in Thread] Current Thread [Next in Thread>