X-Spam-Checker-Version: SpamAssassin 3.3.0-rupdated (updated) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-rupdated Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id n1JAE6Uo100325 for ; Thu, 19 Feb 2009 04:14:08 -0600 X-ASG-Debug-ID: 1235038412-213102ba0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp.welcomes-you.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7E936133244 for ; Thu, 19 Feb 2009 02:13:32 -0800 (PST) Received: from smtp.welcomes-you.com (welcomes-you.com [85.214.50.128]) by cuda.sgi.com with ESMTP id xyA0fS9kPYnQwlHc for ; Thu, 19 Feb 2009 02:13:32 -0800 (PST) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.welcomes-you.com (Postfix) with ESMTP id 90022218079; Thu, 19 Feb 2009 11:13:31 +0100 (CET) X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Scanned: Debian amavisd-new at welcomes-you.com Received: from smtp.welcomes-you.com ([127.0.0.1]) by localhost (smtp.welcomes-you.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Cd3DT48V2Mmd; Thu, 19 Feb 2009 11:13:31 +0100 (CET) Received: from [192.168.9.51] (a89-183-67-234.net-htp.de [89.183.67.234]) by smtp.welcomes-you.com (Postfix) with ESMTP id CAF7B218072; Thu, 19 Feb 2009 11:13:30 +0100 (CET) Message-ID: <499D30B1.30802@aei.mpg.de> Date: Thu, 19 Feb 2009 11:13:05 +0100 From: Carsten Aulbert User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103) MIME-Version: 1.0 To: "xfs@oss.sgi.com" , linux-kernel@vger.kernel.org, npiggin@suse.de X-ASG-Orig-Subj: Re: xfs problems (possibly after upgrading from linux kernel 2.6.27.10 to .14) Subject: Re: xfs problems (possibly after upgrading from linux kernel 2.6.27.10 to .14) References: <499ACE6C.4060304@aei.mpg.de> <20090218091935.GD8830@disturbed> <499BD6BB.2000406@aei.mpg.de> <20090219061925.GE8830@disturbed> In-Reply-To: <20090219061925.GE8830@disturbed> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: welcomes-you.com[85.214.50.128] X-Barracuda-Start-Time: 1235038414 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0208 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.1.18251 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Status: Clean Hi again, Dave Chinner schrieb: >> I can try doing that on a few machines, would a metadump help on a >> machine where this corruption occurred some time ago and is still in >> this state? > > If you unmount the filesystem, mount it again and then touch a new > file and it reports the error again, then yes, a metadump woul dbe > great. > > If the error doesn't show up after a unmount/mount, then I > can't use a metadump image to reproduce the problem. > I've done it on two nodes so far and the result is not good (metadump wise): [1344887.778232] Filesystem "sda6": xfs_log_force: error 5 returned. [1344887.778432] xfs_force_shutdown(sda6,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff8031dd7e [1344889.579836] Filesystem "sda6": xfs_log_force: error 5 returned. [1344889.580044] Filesystem "sda6": xfs_log_force: error 5 returned. [1344889.580257] Filesystem "sda6": xfs_log_force: error 5 returned. [1344889.580450] Filesystem "sda6": xfs_log_force: error 5 returned. [1344889.624774] Filesystem "sda6": xfs_log_force: error 5 returned. [1344915.783844] XFS mounting filesystem sda6 [1344915.872333] Starting XFS recovery on filesystem: sda6 (logdev: internal) [1344917.399834] Ending XFS recovery on filesystem: sda6 (logdev: internal) After that I can touch/create all files I want on the fs again. > I suspect so. We've already had XFS trigger one bug in the new > lockless pagecache code, and the fix for that went in 2.6.27.11 - > between the good version and the version that you've been seeing > these memory corruptions on. I'm wondering if that fix exposed or > introduced another bug that you've hit.... > > Nick? If it was triggered by a user job, it might have been in the kernel for longer and the user just did not run it for a few weeks. I'll try to gather more information. Cheers Carsten