Received: (from majordom@localhost) by lips.thebarn.com (8.12.2/8.12.0) id g35LOkVB067211 for linux-xfs-outgoing; Fri, 5 Apr 2002 15:24:46 -0600 (CST) Received: from tolkor.sgi.com (tolkor.sgi.com [192.48.180.13]) by lips.thebarn.com (8.12.2/8.12.0) with ESMTP id g35LOjon067206 for ; Fri, 5 Apr 2002 15:24:45 -0600 (CST) Received: from zeus-e8.americas.sgi.com (zeus-e8.americas.sgi.com [128.162.8.103]) by tolkor.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id g35LOYkw025911; Fri, 5 Apr 2002 15:24:34 -0600 Received: from poppy-e185.americas.sgi.com (poppy-e185.americas.sgi.com [128.162.185.207]) by zeus-e8.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id PAA77976; Fri, 5 Apr 2002 15:24:34 -0600 (CST) Received: from stout.americas.sgi.com (stout.americas.sgi.com [128.162.187.5]) by poppy-e185.americas.sgi.com (980427.SGI.8.8.8/SGI-server-1.7) with ESMTP id PAA34378; Fri, 5 Apr 2002 15:24:33 -0600 (CST) Subject: Re: block device issues (fwd) From: Eric Sandeen To: "Brandon D. Valentine" Cc: linux-xfs@thebarn.com In-Reply-To: References: Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/1.0.2 Date: 05 Apr 2002 15:24:33 -0600 Message-Id: <1018041873.7110.36.camel@stout.americas.sgi.com> Mime-Version: 1.0 Sender: owner-linux-xfs@thebarn.com Precedence: bulk Hi Brandon - On Fri, 2002-04-05 at 15:02, Brandon D. Valentine wrote: > [Not subscribed so please keep me in the Cc list] > > Greetings, > > This past weekend one of our Linux 2.4/XFS fileservers crashed pretty > badly. I am attempting to diagnose the cause of the crash so that I may > prevent it from recurring. My analysis so far follows. I am hoping > that a few of you out there might have seen this before or have ideas on > its cause. > void ll_rw_block(int rw, int nr, struct buffer_head * bhs[]) > { > ... > if (buffer_delay(bh) || !buffer_mapped(bh)) > BUG(); > ... > } > presently running RedHat 7.1 XFS (using SGI's install ISO) and the > kernel is a known good copy of 2.4.7/XFS pulled from SGI's CVS at the > time that this fileserver was setup The reason that BUG() is there is that if we get to ll_rw_block, ready to send a buffer to disk, but we have no place to put it (i.e. it's a delalloc buffer, or it's not mapped) then we're in trouble. How you got here, I'm not certain, but going back to debug a 2.4.7 kernel is going to be rough - there have been so many changes since then. We are working on a release for XFS 1.1 (yours was 1.0 or 1.0.1, I think?) and if possible, I would suggest that you upgrade a box or two and see how that goes. If nothing else, the updated kernels based on Red Hat code have some security issues fixed. :) -Eric -- Eric Sandeen XFS for Linux http://oss.sgi.com/projects/xfs sandeen@sgi.com SGI, Inc.