xfs
[Top] [All Lists]

Re: Segfault of xfs_repair during repair of a xfs filesystem

To: Rainer Krienke <krienke@xxxxxxxxxxxxxx>
Subject: Re: Segfault of xfs_repair during repair of a xfs filesystem
From: Eric Sandeen <sandeen@xxxxxxx>
Date: Tue, 6 Jan 2004 08:40:56 -0600 (CST)
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <200401060905.44078.krienke@xxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Tue, 6 Jan 2004, Rainer Krienke wrote:

> I think originally after I had started the machines (from power off) the 
> filesystem on server1 (that later on could be repaired) was mounted but not 
> accessible. If I tried to list the directory and ls reported an I/O error.

Maybe the filesystem had encountered an error and shut down at this
point.  Anything in the logs?

> So 
> I unexported it, unmounted it and the tried to run xfs_repair which reported 
> that there was still a log on the filesystem that should be replayed by 
> mounting the filesystem again or using xfs_restore -L. So I tried to mount it 
> again and now mount said, that either there are too many mounts, wrong 
> filesystem type or invalid superblock. 

At that point you need to look for the specific failure message from
the kernel, either dmesg or /var/log/messages, to know what really
happened.

> > Sounds like the filesystem shut down due to some error, can you check
> > your logs?  In fact checking your logs in general might be useful
> > here, I wonder if there is anything else going on.
> 
> One the first machine (server1) I found a sequence of messages like the log 
> attached to this mail. But this message was generated upon startup after 
> powerfail not before. Before the power failure there is nothing xfs related 
> in the logs.

Ok, I'll take a look...

> > Can you convince it to dump a corefile?  We could then have a better
> > idea of what's going wrong.
> 
> Yes I ran xfs_repair once again (perhaps the 5th time) on the corrupted 
> filesystem on server2 and lifted the ulimit for core dumps. It produced one. 
> You can download it under
> 
> http://www.uni-koblenz.de/~krienke/core-xfs_repair.gz

Can you put your xfs_repair binary there as well?

> Thanks for these explanations. Perhaps the buffer in the hardware raids that 
> are used as basis for data storage are to blame (IFT 7250 Raid (Level 5), 
> with 12 160GB IDE disks inside). I'll try to find out if this cache is read 
> only or read/write.

Ah, and if the hardware raid does write caching, and it's not battery
backed, then you -could- have inconsistency problems.  if the raid
claims that something is written when in fact it is only cached,
then there is a chance that the fs+log is inconsistent in the
 case of a power failure.

-Eric


<Prev in Thread] Current Thread [Next in Thread>