X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_57 autolearn=no version=3.4.0-r929098 Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p8DLwCXP091547; Tue, 13 Sep 2011 16:58:13 -0500 Received: from xmail.sgi.com (pv-excas3-dc21.corp.sgi.com [137.38.102.206]) by relay3.corp.sgi.com (Postfix) with ESMTP id EF1B5AC003; Tue, 13 Sep 2011 14:58:11 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Tue, 13 Sep 2011 16:58:11 -0500 Subject: Re: [xfs-masters] xfs deadlock in stable kernel 3.0.4 From: Alex Elder Reply-To: To: Christoph Hellwig CC: Stefan Priebe - Profihost AG , "xfs-masters@oss.sgi.com" , "xfs@oss.sgi.com" In-Reply-To: <1315950742.2159.89.camel@doink> References: <1D2B34A7-7BB9-4E4E-9CA2-382C210E125F@profihost.ag> <20110912152133.GA8345@infradead.org> <20110912200543.GA22409@infradead.org> <4E6EF274.7050007@profihost.ag> <20110913205018.GA8543@infradead.org> <1315950742.2159.89.camel@doink> Content-Type: text/plain; charset="UTF-8" Date: Tue, 13 Sep 2011 16:58:13 -0500 Message-ID: <1315951093.2159.92.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, 2011-09-13 at 16:52 -0500, Alex Elder wrote: > On Tue, 2011-09-13 at 16:50 -0400, Christoph Hellwig wrote: > > On Tue, Sep 13, 2011 at 08:04:36AM +0200, Stefan Priebe - Profihost AG wrote: > > > I just reported it to the scsi list as i didn't knew where the > > > problems is. But then some people told be it must be a XFS problem. > > > > > > Some more informations: > > > 1.) It's running with 2.6.32 and 2.6.38 > > > 2.) I can also write to another ext2 part on the same disk > > > array(aacraid driver) while xfs stucks - so i think it must be an > > > xfs problem > > > > That points a bit more towards XFS, although we've seen storage setups > > create issues depending on the exact workload. The prime culprit for > > used to be the md software RAID driver, though. > > > > > 3.) I've also tried running 3.1-rc5 but then i'm seeing this error: > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000012c > > > IP: [] inode_dio_done+0x4/0x25 > > > > Oops, that's a bug that I actually introduced myself. Fix below: > > Yikes. I'll prepare that one to send to Linus for 3.1. > I'll wait for your formal signoff, though, Christoph. > > Reviewed-by: Alex Elder Nevermind--the latest code doesn't look quite like that and doesn't suffer the same problem. Christoph, will you please ensure the fix gets to the stable folks though? You have my review for the change. -Alex