Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g11Icmc15557 for linux-xfs-outgoing; Fri, 1 Feb 2002 10:38:48 -0800 Received: from rj.sgi.com (rj.sgi.com [204.94.215.100]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g11Ichd15534 for ; Fri, 1 Feb 2002 10:38:43 -0800 Received: from zeus-e8.americas.sgi.com (zeus-e8.americas.sgi.com [128.162.8.103]) by rj.sgi.com (8.11.4/8.11.4/linux-outbound_gateway-1.1) with ESMTP id g11HcZY22705 for ; Fri, 1 Feb 2002 09:38:35 -0800 Received: from daisy-e185.americas.sgi.com (daisy-e185.americas.sgi.com [128.162.185.214]) by zeus-e8.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id LAA40491; Fri, 1 Feb 2002 11:37:19 -0600 (CST) Received: from jen.americas.sgi.com (jen.americas.sgi.com [128.162.187.49]) by daisy-e185.americas.sgi.com (SGI-8.9.3/SGI-server-1.7) with ESMTP id LAA30276; Fri, 1 Feb 2002 11:37:18 -0600 (CST) Received: by jen.americas.sgi.com (8.11.6/SGI-client-1.7) id g11HYqp18821; Fri, 1 Feb 2002 11:34:52 -0600 Subject: RE: nfsd lockups with xfs during SPEC SFS testing From: Steve Lord To: "HABBINGA,ERIK ""(HP-Loveland,ex1)" Cc: "'linux-xfs@oss.sgi.com'" In-Reply-To: References: Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/1.0.2 Date: 01 Feb 2002 11:34:52 -0600 Message-Id: <1012584892.26363.391.camel@jen.americas.sgi.com> Mime-Version: 1.0 Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk On Thu, 2002-01-31 at 16:38, HABBINGA,ERIK (HP-Loveland,ex1) wrote: > Steve, > A co-worker of mine sent me a patch containing the CVS bits at 3:21pm > Mountain time 1/30/02, he started the CVS download at 2:37pm Mountain time > 1/30/02 and ended it at 2:38pm Mountain time 1/30/02. I've been working on > a patch to remove the BKL from the nfsd process, and have been seeing > lockups in the xfs code. I saw these lockups with XFS CVS downloads from > 1/10/02 and 1/18/02. I finally started running SPEC tests with out my > nfsd-BKL-removal patch and still got lockups in the XFS code. So, I don't > think this is a regression. > > I ran another test this morning, and got a different profile of lockups. > I've attached the decoded output from alt-sysrq. kupdated is locked up in > xlog_grant_log_space, and all the nfsd processes are locked up either in: > > - fh_lock (all the nfsd3_proc_create->stext_lock->__down_failed->__down > cases) > - nfsd_sync (the nfsd_commit->stext_lock->__down_failed->__down cases) > - pagebuf_grab_lock (the > _pagebuf_find_lockable_buffer->stext_lock->__down_failed->__down cases) > - lock_wait (the xfs_access->mraccessf->lock_wait cases) > - xlog_grant_log_space > > I'll help anyway I can to track this problem down. > > Erik > Are you using anything unusual as the block device here? I would be suspicious of it not coming back with I/O completions. Basically all the places threads are waiting are places we wait blocked for a read or a write to complete. If you have other filesystems on the same device, can you do I/O to them? Steve -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com