X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p064wuRY016102 for ; Wed, 5 Jan 2011 22:59:02 -0600 X-ASG-Debug-ID: 1294290063-4bd200cb0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 8218B4F4832 for ; Wed, 5 Jan 2011 21:01:04 -0800 (PST) Received: from mail.internode.on.net (bld-mail13.adl6.internode.on.net [150.101.137.98]) by cuda.sgi.com with ESMTP id Lq4JLgD2v2FRLXvo for ; Wed, 05 Jan 2011 21:01:04 -0800 (PST) Received: from dastard (unverified [121.44.135.206]) by mail.internode.on.net (SurgeMail 3.8f2) with ESMTP id 52136344-1927428 for multiple; Thu, 06 Jan 2011 15:31:03 +1030 (CDT) Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1Pahy6-0004ql-07; Thu, 06 Jan 2011 16:00:58 +1100 Date: Thu, 6 Jan 2011 16:00:57 +1100 From: Dave Chinner To: yuji_touya@yokogawa-digital.com Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: 2.6.27.30 fc10, some processes stuck in D state Subject: Re: 2.6.27.30 fc10, some processes stuck in D state Message-ID: <20110106050057.GF8322@dastard> References: <8529A87D856C184491994079B5F87B68C1A8289FCC@EXMAIL03.jp.ykgw.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8529A87D856C184491994079B5F87B68C1A8289FCC@EXMAIL03.jp.ykgw.net> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: bld-mail13.adl6.internode.on.net[150.101.137.98] X-Barracuda-Start-Time: 1294290065 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.51561 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Thu, Jan 06, 2011 at 01:18:27PM +0900, yuji_touya@yokogawa-digital.com wrote: > Hello folks, > > We need to save a bunch of transport-stream(TS) data(4MB/sec, 300GB/day), and > are using xfs formatted hardware RAID system to save TS data. > Some processes (pdflush, kswapd, our own services etc) stuck in D-state and > our system stops saving and down-converting TS data. Everything is waiting for log space to be freed. Typically a sign that metadata has not been flushed or that IO completion has not occurred so the tail is not moving forward. > It rarely happens (3 times in recent 3 months), but it's quite serious for us. > How can we avoid this? What did you change 3 months ago? Or did this always happen? > One more thing, in that situation when I run "ls /mnt/raid/foo" command, > all stuck processes suddenly wake up and continue running. Very strange... > (/mnt/raid is where we mount xfs) So doing new read IOs starts stuff moving again? That sounds like an IO completion has not arrived from the lower layers until a new IO is issued and completes. Perhaps the hardware RAID is not issuing an interrupt when it should? What type of RAID controller/storage hardware are you using? Is it all running the latest firmware, appropriate drivers, etc? Cheers, Dave. -- Dave Chinner david@fromorbit.com