X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o6TE2gkD054879 for ; Thu, 29 Jul 2010 09:02:42 -0500 X-ASG-Debug-ID: 1280412347-4a0100390000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 519F647E775 for ; Thu, 29 Jul 2010 07:05:47 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id Lo3DbDus4TetNvBT for ; Thu, 29 Jul 2010 07:05:47 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAMglUUx5LclT/2dsb2JhbACgB3LAEoU4BA Received: from ppp121-45-201-83.lns20.cbr1.internode.on.net (HELO laptop.local0.net) ([121.45.201.83]) by ipmail07.adl2.internode.on.net with ESMTP; 29 Jul 2010 23:35:46 +0930 Received: by laptop.local0.net (Postfix, from userid 1000) id 27EA52981C; Fri, 30 Jul 2010 00:05:46 +1000 (EST) Date: Fri, 30 Jul 2010 00:05:46 +1000 From: Nick Piggin To: Dave Chinner Cc: Nick Piggin , xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS hang in xlog_grant_log_space Subject: Re: XFS hang in xlog_grant_log_space Message-ID: <20100729140546.GB7217@amd> References: <20100722190100.GA22269@amd> <20100723135514.GJ32635@dastard> <20100727070538.GA2893@amd> <20100727080632.GA4958@amd> <20100727113626.GA2884@amd> <20100727133038.GP7362@dastard> <20100727145808.GQ7362@dastard> <20100728131744.GS7362@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100728131744.GS7362@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1280412349 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.36436 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, Jul 28, 2010 at 11:17:44PM +1000, Dave Chinner wrote: > Something very strange is happening, and to make matters worse I > cannot reproduce it with a debug kernel (ran for 3 hours without > failing). Hence it smells like a race condition somewhere. > > I've reproduced it without delayed logging, so it is not directly > related to that functionality. > > I've seen this warning: > > Filesystem "ram0": inode 0x704680 background reclaim flush failed with 117 > > Which indicates we failed to mark an inode stale when freeing an > inode cluster, but I think I've fixed that and the problem still > shows up. It's posible the last version didn't fix it, but.... I've seen that one a couple of times too. Keeps coming back each time you echo 3 > /proc/sys/vm/drop_caches :) > Now I've got the ag iterator rotor patch in place as well and > possibly a different version of the cluster free fix to what I > previously tested and it's now been running for almost half an hour. > I can't say yet whether I've fixed the bug of just changed the > timing enough to avoid it. I'll leave this test running over night > and redo individual patch testing tomorrow. I reproduced it with fs_stress now too. Any patches I could test for you just let me know.