On Thu, Apr 19, 2007 at 03:36:58PM +0100, Burbidge, Simon A wrote:
> Hi Dave,
> Thanks for the response.
> No I/O errors reported in the message log or on the RAID box.
> It's an Infortrend SATA RAID5 array, with a fibre channel connection to
> the server.
> The filesystem is build on an LVM volume.
> Kernel is 2.6.13-15-smp running on an x86_64 dual CPU Xeon server with
> hyper-threading enabled.
That's a relatively old kernel. It's possible that what you are seeing
has been fixed since that kernel was released.
> The most significant feature of the load is that it is part of an HPC
> cluster, and has a large number of nodes NFS mounting the filesystem
> across Gigabit ethernet.
Not uncommon - we do that all the time ;)
> I did notice that in the first incident, a user had a directory with
> 700000 files in it, and xfs_repair found fault with that directory. The
> user has revised their workflow since and removed the files.
> Very difficult to spot common traits in the workload between the 2
Ok, so that makes it kind of hard to start tracking this down. If it
keeps occurring and you can't isolate the workload that is causing
the problem, you might want to upgrade to a more recent kernel and
see if that helps.....
SGI Australian Software Group