[Top] [All Lists]

Re: I/O hang, possibly XFS, possibly general

To: Paul Anderson <pha@xxxxxxxxx>
Subject: Re: I/O hang, possibly XFS, possibly general
From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Thu, 2 Jun 2011 20:42:47 -0400
Cc: xfs-oss <xfs@xxxxxxxxxxx>
In-reply-to: <BANLkTim_BCiKeqi5gY_gXAcmg7JgrgJCxQ@xxxxxxxxxxxxxx>
References: <BANLkTim_BCiKeqi5gY_gXAcmg7JgrgJCxQ@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Jun 02, 2011 at 10:42:46AM -0400, Paul Anderson wrote:
> This morning, I had a symptom of a I/O throughput problem in which
> dirty pages appeared to be taking a long time to write to disk.
> The system is a large x64 192GiB dell 810 server running from
> kernel.org - the basic workload was data intensive - concurrent large
> NFS (with high metadata/low filesize), rsync/lftp (with low
> metadata/high file size) all working in a 200TiB XFS volume on a
> software MD raid0 on top of 7 software MD raid6, each w/18 drives.  I
> had mounted the filesystem with inode64,largeio,logbufs=8,noatime.

A few comments on the setup before trying to analze what's going on in
detail.  I'd absolutely recommend an external log device for this setup,
that is buy another two fast but small disks, or take two existing ones
and use a RAID 1 for the external log device.  This will speed up
anything log intensive, which both NFS, and resync workloads are lot.

Second thing if you can split the workloads into multiple volumes if you
have two such different workloads, so thay they don't interfear with
each other.

Second a RAID0 on top of RAID6 volumes sounds like a pretty worst case
for almost any type of I/O.  You end up doing even relatively small I/O
to all of the disks in the worst case.  I think you'd be much better
off with a simple linear concatenation of the RAID6 devices, even if you
can split them into multiple filesystems

> The specific symptom was that 'sync' hung, a dpkg command hung
> (presumably trying to issue fsync), and experimenting with "killall
> -STOP" or "kill -STOP" of the workload jobs didn't let the system
> drain I/O enough to finish the sync.  I probably did not wait long
> enough, however.

It really sounds like you're simply killloing the MD setup with a
log of log I/O that does to all the devices.

<Prev in Thread] Current Thread [Next in Thread>