On Tue, 2001-11-13 at 09:54, Jim Eshleman wrote:
FWIW me too, on an 8-way 8.5GB (64GB HIGHMEM enabled) IBM Netfinity
x370 (8500R) which functions as a production mail server. I
currently run 2.4.9 with XFS and it stays up for about a week under
heavy load. 2.4.13 lasted about 4 hours under light load until all
memory was consumed by cache then it became unresponsive.
2.4.13 on a 2-way 1GB (64GB HIGHMEM enabled) Netfinity x350 test
box with the same kernel config and XFS works fine even under
stress, so perhaps our problem is similar to the discussion on l-k
"Google's mm problems"...
Update: I'm unable to make 2.4.14 fail on the test box (running
Cerberus, bonnie++ against two XFS volumes, and LTP simultaneously)
but it melts-down just as 2.4.13 does on the big production box. A
short time after all memory is eaten by file cache, and under light
load, the machine becomes unresponsive. It took about five minutes
to login at the console. No error messages on the console or in
syslog. Here's some info, it's obvious in the vmstat output where
the melt-down occurs:
kernel config: http://www.lehigh.edu/~jce0/2.4.14-config
bootup messages: http://www.lehigh.edu/~jce0/2.4.14-messages
vmstat 60 output: http://www.lehigh.edu/~jce0/2.4.14-vmstat
ver_linux output: http://www.lehigh.edu/~jce0/ver_linux.out
This is linus 2.4.14 patched with linux-2.4.14-xfs-2001-11-06.patch
and LVM 0.9.1_beta6, compiled with egcs-2.91.66. It's a RH 7.1 system.
I know Andrea and Marcelo? were testing and fixing some HIGHMEM
things. Were there any patches and did they make it into the Linus
Any assistance greatly appreciated.
Going through my old email - I think I just fixed this - there was a bug
in the delayed allocation handling in XFS which caused a memory leak
due to a buffer_head reference count leak. The latest cvs tree (2.4.16
based) has the fix in it.
This bug was introduced around the time the new VM showed up in 2.4.10.