I'm using 2.4.18-xfs on a dual 1.2GHz Athlon with 1gig of RAM. The machine
has a 3ware 7850 with 8 160G drives (RAID5) with an XFS file system
(1.0TB). It also has a SCSI software raid root volume and a IDE scratch
disk both with ext3. The XFS filesystem is ~40% full with an average file
size of ~4-5G
The 1TB array is a recent addition to a previously stable system. The RAID
volume seemed fine in my initial burn in and stress testing, but now that I
have live data on the array I've been having stability problems. In the
last <2 months I've had 3 crashes...
The first crash was a strange OOM type problem after the box had been up
for a few weeks. It started with a series of 'eth0: memory shortage' on a
box that was mostly doing NFS, followed by several 'Unable to handle kernel
NULL pointer dereference at virtual address 00000030' in kswapd, then
klogd, shortly after that it oppsed and wedged.
In the last two weeks I've had it die twice, both times within a minute of
starting to mv ~800M from an SGI O2 to the linux box via NFS3. The O2
reported NFS timeout errors, the linux box would respond to pings and some
things that didn't depend on diskio continued to work. dmesg and df would
hang and couldn't be interrupted. In both cases I was forced to reset.
I'm not sure if the problems I'm seeing have anything to do with XFS, but
my 2 most recent crashes occurred shortly after starting to move data to
the XFS volume on an otherwise idle system.
Any ideas / debugging tips?