xfs
[Top] [All Lists]

Re: xfs_repair stops on "traversing filesystem..."

To: xfs@xxxxxxxxxxx
Subject: Re: xfs_repair stops on "traversing filesystem..."
From: Tomek Kruszona <bloodyscarion@xxxxxxxxx>
Date: Fri, 10 Jul 2009 23:02:33 +0200
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=RZMKG9LWrcaxvtgAiJ75RIOj0SseEEh6qh+lcxjP22E=; b=jx0lG7SVmGLTB8RNqHC381GNTIC2ozDEQO9/pgrrTBc6hdc9KjYeZViVtEwBId0N2q navWF8XDgSpcX41+Z5HyoXh38Y5xAri6fzchqRhswqgc/R61l57i1vS72Q3Qvh61PV4v Vt/u0vNaWHYhf1AfG+Xfn42b983O/CiHmOz6Y=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=H3UZTp6mVW1nrm82oUqkAv5vkNxoyukeNEG+XcEzz27vpJ41G+aMXQBskKWNBcq1lw yquddq9nxjWeAcq7whcj2WJSbdHoUkP6WzGjQZRRIX/0bNh6PE+4nl0w5GxdQiGmHTjv UgMSJyW7bBQ05CS3tFReVu+Ui06ki1TCUgoDs=
In-reply-to: <4A57A1C4.40004@xxxxxxxxxxx>
References: <4A55FAF7.5040908@xxxxxxxxx> <4A56D176.9010702@xxxxxxxxxxx> <4A56ED5F.10400@xxxxxxxxx> <4A57A1C4.40004@xxxxxxxxxxx>
User-agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)
Eric Sandeen wrote:
> This looks like some of the caching that xfs_repair does is mis-sized,
> and it gets stuck when it's unable to find a slot for a new node to
> cache.  IMHO that's still a bug that I'd like to work out.  If it gets
> stuck this way, it'd probably be better to exit, and suggest a larger
> hash size.
> 
> But anyway, I forced a bigger hash size:
> 
> xfs_repair -P -o bhash=1024 <blah>
> 
> and it did complete.  1024 is probably over the top, but it worked for
> me on a 4G machine w/ some swap.
:D

Is it safe to use xfs_repair without this options after the FS was
repaired? Or maybe I should use them every time I have similar problem?

> I'd strongly suggest doing a non-obfuscated xfs_metadump, do
> xfs_mdrestore of that to some temp.img, run xfs_repair <blah> on that
> temp.img, mount it, and see what you're left with; that way you'll know
> what you're getting into w/ repair.
> I ended up w/ about 5000 files in lost+found just FWIW...
It doesn't matter. On this filesystem is a lot of small files. Those are
image sequences used for video composition. It's backup machine so if
they're gone from filesystem they will be copied back from original
machine. No stress :)

I'm doing xfs_repair on the image now - it's Phase 4 and for now list of
files looks very similar to list that I saw during xfs_repair without
options you suggested.

> Out of curiosity, do you know how the fs was damaged?
I'm not sure. I see some possibilities. I played with write cache
options on the RAID controller when the FS was mounted and running.
Maybe then something went wrong... Second possible reason is that we had
power loss last time and this machine went down then :/
Last one is that I have some problems with XFS filesytems on LVM2. in
kernels <2.6.30 barriers are automatically disabled when underlying
device is some dm-device. As I'm using RAID controllers I should have
write cache disabled. So after upgrade to 2.6.30 message about disabled
barriers disappeared and it was safe to enable write cache again.
Somewhere in the meantime I wanted to check filesystem that everything
is ok with it and then the problem started - I couldn't finish
xfs_repair. This power loss was IIRC after my troubles with xfs_repair,
so the filesystem wasn't totally clean when power failed. Maybe this is
the reason of this mess ;)

Best regards
Tomasz Kruszona

<Prev in Thread] Current Thread [Next in Thread>