Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id fBHGtp525226 for linux-xfs-outgoing; Mon, 17 Dec 2001 08:55:51 -0800 Received: from rain.CC.Lehigh.EDU (rain.CC.Lehigh.EDU [128.180.39.20]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id fBHGtgo25202 for ; Mon, 17 Dec 2001 08:55:42 -0800 Received: from Lehigh.EDU (hooch.CC.Lehigh.EDU [128.180.3.11]) by rain.CC.Lehigh.EDU (8.12.1/8.12.1) with ESMTP id fBHFsWQG024755 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT); Mon, 17 Dec 2001 10:54:38 -0500 Message-ID: <3C1E1538.4000609@Lehigh.EDU> Date: Mon, 17 Dec 2001 10:54:32 -0500 From: Jim Eshleman User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901 X-Accept-Language: en-us MIME-Version: 1.0 To: Steve Lord CC: Jason Allen , linux-xfs@oss.sgi.com Subject: Re: 2.4.13 Mem Related Hangs References: <3BE6C909.6070308@Lehigh.EDU> <3BF14253.1060008@Lehigh.EDU> <1008256442.14210.0.camel@jen.americas.sgi.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk Steve Lord wrote: > On Tue, 2001-11-13 at 09:54, Jim Eshleman wrote: > >>> FWIW me too, on an 8-way 8.5GB (64GB HIGHMEM enabled) IBM Netfinity >>>x370 (8500R) which functions as a production mail server. I currently >>>run 2.4.9 with XFS and it stays up for about a week under heavy load. >>>2.4.13 lasted about 4 hours under light load until all memory was >>>consumed by cache then it became unresponsive. >>> >>> 2.4.13 on a 2-way 1GB (64GB HIGHMEM enabled) Netfinity x350 test box >>>with the same kernel config and XFS works fine even under stress, so >>>perhaps our problem is similar to the discussion on l-k "Google's mm >>>problems"... >>> >> >> Update: I'm unable to make 2.4.14 fail on the test box (running >>Cerberus, bonnie++ against two XFS volumes, and LTP simultaneously) but >>it melts-down just as 2.4.13 does on the big production box. A short >>time after all memory is eaten by file cache, and under light load, the >>machine becomes unresponsive. It took about five minutes to login at >>the console. No error messages on the console or in syslog. Here's >>some info, it's obvious in the vmstat output where the melt-down occurs: >> >> kernel config: http://www.lehigh.edu/~jce0/2.4.14-config >> bootup messages: http://www.lehigh.edu/~jce0/2.4.14-messages >> vmstat 60 output: http://www.lehigh.edu/~jce0/2.4.14-vmstat >> ver_linux output: http://www.lehigh.edu/~jce0/ver_linux.out >> >> This is linus 2.4.14 patched with linux-2.4.14-xfs-2001-11-06.patch >>and LVM 0.9.1_beta6, compiled with egcs-2.91.66. It's a RH 7.1 system. >> >> I know Andrea and Marcelo? were testing and fixing some HIGHMEM >>things. Were there any patches and did they make it into the Linus tree? >> >> Any assistance greatly appreciated. >> >>Jim >> >> > > Going through my old email - I think I just fixed this - there was a bug > in the delayed allocation handling in XFS which caused a memory leak > due to a buffer_head reference count leak. The latest cvs tree (2.4.16 > based) has the fix in it. > > This bug was introduced around the time the new VM showed up in 2.4.10. > > Steve This would make my millennium. I shall test as soon as a new 2.4.16 patch set is available. Or 2.4.17, whichever comes first :-) Thanks Steve. Jim