On Fri, Jul 30, 2010 at 06:27:46AM -0400, Christoph Hellwig wrote:
> On Fri, Jul 30, 2010 at 08:55:46AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > Under heavy load parallel metadata loads (e.g. dbench), we can fail
> > to mark all the inodes in a cluster being freed as XFS_ISTALE as we
> > skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
> > When this happens and the inode cluster buffer has already been
> > marked stale and freed, inode reclaim can try to write the inode out
> > as it is dirty and not marked stale. This can result in writing th
> > metadata to an freed extent, or in the case it has already
> > been overwritten trigger a magic number check failure and return an
> > EUCLEAN error such as:
> > Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117
> > Fix this by ensuring that we hoover up all in memory inodes in the
> > cluster and mark them XFS_ISTALE when freeing the cluster.
> Why do you move the loop over the log items around? From all that
> I can see the original place is much better as we just have to loop
> over the items once. Then once we look up the inodes in memory
> we skip over the inodes that already are stale, so the behaviour
> should be the same.
You are right - it is doing the same as the old code where it is
marking them stale first. I rearranged some code when trying
a couple of crazy ideas, but forgot to move it back when I
had somethign that fixed the bug. I'll move it back - that shoul
dmake the diff lots smaller.
> Also instead of the i-- and continue for the
> lock failure an explicit goto retry would make it a lot more obvious.
Good point. I fix it up and test it again.