X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p461jjIj170816 for ; Thu, 5 May 2011 20:45:46 -0500 X-ASG-Debug-ID: 1304646560-1ff801800000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id DC8A412C18EB; Thu, 5 May 2011 18:49:23 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id QfdCvWkC2GsrjCJE; Thu, 05 May 2011 18:49:23 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Au8DAD9Sw015LBzagWdsb2JhbACmPhUBARYmJYhyvQYOhXkEnis Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 06 May 2011 11:19:18 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QIAAE-000134-Oe; Fri, 06 May 2011 11:49:06 +1000 Date: Fri, 6 May 2011 11:49:06 +1000 From: Dave Chinner To: Christoph Hellwig Cc: linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno Pr?mont , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110506014906.GF26837@dastard> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> <20110505122117.GB26837@dastard> <20110505123959.GA21098@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110505123959.GA21098@infradead.org> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304646564 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62898 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Thu, May 05, 2011 at 08:39:59AM -0400, Christoph Hellwig wrote: > > The third problem is that updating the push target is not safe on 32 > > bit machines. We cannot copy a 64 bit LSN without the possibility of > > corrupting the result when racing with another updating thread. We > > have function to do this update safely without needing to care about > > 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when > > updating the AIL push target. > > But reading xa_target without xa_lock isn't safe on 32-bit either, is it? Not sure - I think it depends on the platform. I don't think we protect LSN reads in any specific way on 32 bit platforms. In this case, I don't think it matters so much on read, because if we get a race with a write that mixes upper/lower words of the target we will eventually hit the stop condition and we won't get a match. That will trigger the requeue code and we'll start the push again. The problem with getting such a race on the target write is that we could get a cycle/block pair that is beyond the current head of the log and we'd never be able to push the AIL again as all push thresholds are truncated to the current head LSN on disk... > For the first read it can trivially be moved into the critical > section a few lines below, and the second one should probably use > XFS_LSN_CMP. > > > @@ -482,19 +481,24 @@ xfs_ail_worker( > > /* assume we have more work to do in a short while */ > > tout = 10; > > if (!count) { > > +out_done: > > Jumping into conditionals is really ugly. By initializing count a bit > earlier you can just jump in front of the if/else clauses. And while > you're there maybe moving the tout = 10; into an else clause would > also make the code more readable. > an uninitialied used of tout. Ok, I'll rework that. > > + if (ailp->xa_target == target || > > + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) > > no need for braces around the test_and_set_bit call. *nod*. Left over from developing the fix... I'll split all these and post them to the xfs-list for review... Cheers, Dave. -- Dave Chinner david@fromorbit.com