X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p1OAHfqI128831 for ; Thu, 24 Feb 2011 04:17:42 -0600 X-ASG-Debug-ID: 1298542826-194e02fa0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-yx0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 08D712F898A for ; Thu, 24 Feb 2011 02:20:26 -0800 (PST) Received: from mail-yx0-f181.google.com (mail-yx0-f181.google.com [209.85.213.181]) by cuda.sgi.com with ESMTP id 0MB4QeDNU5Scf7nX for ; Thu, 24 Feb 2011 02:20:26 -0800 (PST) Received: by yxm8 with SMTP id 8so239449yxm.26 for ; Thu, 24 Feb 2011 02:20:26 -0800 (PST) MIME-Version: 1.0 Received: by 10.100.165.6 with SMTP id n6mr338448ane.10.1298542826262; Thu, 24 Feb 2011 02:20:26 -0800 (PST) Received: by 10.100.189.18 with HTTP; Thu, 24 Feb 2011 02:20:26 -0800 (PST) In-Reply-To: <20110223162316.45a49880@harpe.intellique.com> References: <20110223154651.54f0a8dc@harpe.intellique.com> <20110223162316.45a49880@harpe.intellique.com> Date: Thu, 24 Feb 2011 11:20:26 +0100 Message-ID: X-ASG-Orig-Subj: Re: XFS corruption on 3ware RAID6-volume Subject: Re: XFS corruption on 3ware RAID6-volume From: Erik Gulliksson To: Emmanuel Florac Cc: xfs@oss.sgi.com Content-Type: text/plain; charset=ISO-8859-1 X-Barracuda-Connect: mail-yx0-f181.google.com[209.85.213.181] X-Barracuda-Start-Time: 1298542827 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.56252 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Thanks for your comments Emmanuel. > So the RAID array looks OK, the RAID controller doesn't report any > particular problem. You said it was reported as 0 K. Where did you see > 0 K reported? No I meant it is "OK" with "O" :) > What gives "dmesg | grep 3w-9xxx" ? and "tw_cli alarms" ? Was the > filesystem under heavy write when the problem occured ? The server has been restarted since the problems started, so nothing notable in "tw_cli alarms" or dmesg. The controller was performing rebuild on another the other unit when it happened, however I don't think the actual xfs-filesystem was particularly loaded. > > I'd start with launching a RAID verify, to detect and correct possible > on-disk coherency problems (it can't hurt anyway): > > tw_cli /c0/u0 start verify > > Then "tail -f /var/log/messages | grep 3w-9xxx" ... I will try this over night and see if something is reported. > I suppose that there are no problems to be discovered. Most probably > IOs to the array were lost because of the bus reset. That's what I am afraid of too.