On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote:
> On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
> > I'm seeing what appears to be an infinite loop in xfssyncd. It is
> > triggered when writing to a file system that is full or nearly full. I
> > have pinpointed the change that introduced this problem: it's
> > "TAKE 947395 - Fixing potential deadlock in space allocation and
> > freeing due to ENOSPC"
> > git commit d210a28cd851082cec9b282443f8cc0e6fc09830.
> Thanks for tracking that down - I've been trying to isolate a test case
> for another report of this looping in xfssyncd.
> [Luciano - this is the same problem we've been trying to track down.]
> > I hope you XFS experts see what might be wrong with that bug fix. It's
> > ironic but for me, this (apparent) infinite loop seems much easier to hit
> > than the out-of-order locking problem that the commit in question was
> > supposed to fix. Let me know if I can get you any more info.
> Now we know what patch introduces the problem, we know where to look.
> Stay tuned...
I've had a quick look at the above commit. I'm not yet certain that
everything is correct in terms of the semantics laid down in the
change or that enough blocks are reserved for btree splits , but I
can see a hole in the implementation on multiprocessor machines.
Stephane/Luciano - can you test the following patch (note: compile
tested only) and see if it fixes the problem?
SGI Australian Software Group
fs/xfs/xfs_mount.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2006-08-18 15:29:28.000000000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2006-08-23 14:28:18.059385018 +1000
@@ -2108,11 +2108,11 @@ again:
BUG_ON((mp->m_resblks - mp->m_resblks_avail) != 0);
- lcounter = icsbp->icsb_fdblocks;
+ lcounter = icsbp->icsb_fdblocks - SET_ASIDE_BLOCKS;
lcounter += delta;
if (unlikely(lcounter < 0))
- icsbp->icsb_fdblocks = lcounter;
+ icsbp->icsb_fdblocks = lcounter + SET_ASIDE_BLOCKS;