xfs
[Top] [All Lists]

[PATCH 5/5] xfs: fix race condition in AIL push trigger

To: xfs@xxxxxxxxxxx
Subject: [PATCH 5/5] xfs: fix race condition in AIL push trigger
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 6 May 2011 12:54:08 +1000
In-reply-to: <1304650448-28438-1-git-send-email-david@xxxxxxxxxxxxx>
References: <1304650448-28438-1-git-send-email-david@xxxxxxxxxxxxx>
From: Dave Chinner <dchinner@xxxxxxxxxx>

The recent conversion of the xfsaild functionality to a work queue
introduced a hard-to-hit log space grant hang. One is caused by a
race condition in determining whether there is a psh in progress or
not.

The XFS_AIL_PUSHING_BIT is used to determine whether a push is
currently in progress.  When the AIL push work completes, it checked
whether the target changed and cleared the PUSHING bit to allow a
new push to be requeued. The race condition is as follows:

        Thread 1                push work

        smp_wmb()
                                smp_rmb()
                                check ailp->xa_target unchanged
        update ailp->xa_target
        test/set PUSHING bit
        does not queue
                                clear PUSHING bit
                                does not requeue

Now that the push target is updated, new attempts to push the AIL
will not trigger as the push target will be the same, and hence
despite trying to push the AIL we won't ever wake it again.

The fix is to ensure that the AIL push work clears the PUSHING bit
before it checks if the target is unchanged.

As a result, both push triggers operate on the same test/set bit
criteria, so even if we race in the push work and miss the target
update, the thread requesting the push will still set the PUSHING
bit and queue the push work to occur. For safety sake, the same
queue check is done if the push work detects the target change,
though only one of the two will will queue new work due to the use
of test_and_set_bit() checks.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
 fs/xfs/xfs_trans_ail.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index d7eebbf..5fc2380 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -487,15 +487,19 @@ out_done:
                ailp->xa_last_pushed_lsn = 0;
 
                /*
-                * Check for an updated push target before clearing the
-                * XFS_AIL_PUSHING_BIT. If the target changed, we've got more
-                * work to do. Wait a bit longer before starting that work.
+                * We clear the XFS_AIL_PUSHING_BIT first before checking
+                * whether the target has changed. If the target has changed,
+                * this pushes the requeue race directly onto the result of the
+                * atomic test/set bit, so we are guaranteed that either the
+                * the pusher that changed the target or ourselves will requeue
+                * the work (but not both).
                 */
+               clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags);
                smp_rmb();
-               if (XFS_LSN_CMP(ailp->xa_target, target) == 0) {
-                       clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags);
+               if (XFS_LSN_CMP(ailp->xa_target, target) == 0 ||
+                   test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))
                        return;
-               }
+
                tout = 50;
        } else if (XFS_LSN_CMP(lsn, target) >= 0) {
                /*
-- 
1.7.4.4

<Prev in Thread] Current Thread [Next in Thread>