On 12/10/03 14:58, Norman Zhang wrote:
I'm seeing some irregulars halts on one of my XFS volume (/srv). I
can only use umount -l to dismount the volume or do a hot reboot.
Dec 9 15:47:25 smbserver kernel: xfs_force_shutdown(md(9,5),0x8)
called from line 1039 of file xfs_trans.c. Return address =
Dec 9 15:47:25 smbserver kernel: Corruption of in-memory data
detected. Shutting down filesystem: md(9,5)
Dec 9 15:47:25 smbserver kernel: Please umount the filesystem, and
rectify the problem(s)
I'm not sure if the disk has problems, but during boot up there's
no error found by fsck. The stall sometime occurs in weeks and
sometimes few times per day. So I really doubt if this a disk
problem. Is there any way I can trace or perhaps fix this? BTW, if
I want to manually force a disk check
I think you might have a memory problem. Try memtest86. Some people
don't see the problems with other filesystems. I have seen a number
of cases already where bad memory only showed up with XFS
I did try memtest86, but found no problem. I even swapped brand new
RAM. Is there more info I can provide?
How long did you let memtest86 run (# of passes, hours)? It could
also be a CPU cache problem, or even the motherboard going flaky.
I've also seen some cases of power supply problems being exhibited as
memory issues (insufficient power delivery, or unstable power
delivery, spikes, fluctuations outside spec etc).
My last memtest86 ran about a good 1/2 to 1 hour, I think it did 3 cycles. I
Three tests or three passes? A test is not the same as a pass in
memtest86. Unless you have an exceedingly fast box, or very little RAM,
i'd be really surprised it you completed three passes in under an hour.
Normally it takes roughly 2-3 hours for a single pass on a very fast
box with 2GB of RAM.
hope it is not HW, as I have 2 other boxes that run exact config except with
HW RAID. I'm running Intel SE7500WV2S motherboard, with dual P4 1.8GHz,
2x256MB PC2100 DDR ECC Reg. I did a top on the halted system, but found not
much CPU and swap usage. The system also runs with dual hot-swap 400W
power-supply on UPS.
Well, that box is moderately fast, and doesn't have much memory, so
perhaps memtest86 finished. Its still possible that you've got bad
cache on a CPU, or even the mobo. And a bad power supply could still
introduce odd behavior.
L. Friedman netllama@xxxxxxxxxxxxx
Linux Step-by-step & TyGeMo: http://netllama.ipfox.com
4:00pm up 3 days, 20:49, 1 user, load average: 0.16, 0.20, 0.10