xfs
[Top] [All Lists]

Re: XFS/NFS server oops ..... any ideas.

To: "Ian D. Hardy" <i.d.hardy@xxxxxxxxxxx>
Subject: Re: XFS/NFS server oops ..... any ideas.
From: Austin Gonyou <austin@xxxxxxxxxxxxxxx>
Date: 16 Jan 2002 14:39:10 -0600
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <3C45E220.B8CE3257@xxxxxxxxxxx>
References: <3C45E220.B8CE3257@xxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
What disks are you using? What is your subsystem like?

e.g. MegaRAID 500 attatched to 12 SCSI disks. 5 LVM Volumes + XFS. 

And then what applications is your system running. Something sounds odd
here. 

On Wed, 2002-01-16 at 14:27, Ian D. Hardy wrote:
> Hi,
> 
> I've been looking at this further today.
> 
> If Steve Lord is correct in his assessment that my Oops was due to a
> 'out of
> memory condition' (I'm sure he is, looks sensible) and I've correctly
> interpreted memory usage (via 'vmstat') then it would appear that the
> kernel is running out of available memory, with resultant problems for
> XFS because all of the memory is been used, most of it to cache
> filesystem
> data. Currently 'top' is showing:
> 
>   7:03pm  up  7:19,  2 users,  load average: 0.00, 0.00, 0.00
> 108 processes: 107 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states:  0.1% user,  0.5% system,  0.0% nice, 99.2% idle
> Mem:   898848K av,  895792K used,    3056K free,       0K shrd,   13392K
> buff
> Swap: 1028120K av,    4780K used, 1023340K free                  820396K
> cached
> 
> Confirming that the majority of the memory is been used for cache and
> that
> there is currently just under 3Mbytes of RAM free, so I can see how a
> sudden
> kernel demand for memory may result in a failure before 'kswapd' has
> chance
> to free some memory up.
> 
> Isn't this a common problem? wouldn't this be the normal state for 
> a fileserver with more 'active' data than memory - filesystem cache will
> (under Linux) grow to use as much memory as possible, hence 'free'
> memory
> will be at a minimum? From this reasoning adding more memory is unlikely
> to help - as the system will just use more cache until there is the same
> minimum amount of memory?
> 
> It would appear that XFS has the potential to make heavier demands
> on kernel memory and/or does not check that memory was allocated (I
> assume that its memory allocation calls are such that it expects memory
> to be allocated from immediately available physical memory?), hence
> this problem (I believe a number of people have reported problems that
> have been attributed to memory allocation errors?).
> 
> ..... have I missed something?
> 
> I then looked to try to find out how to decrease the maximum amount of
> memory that the kernel would use as FS cache (or increase the minimum
> amount of memory that the kernel would try to keep free). Thanks to 
> harri.haataja@xxxxxxxxxxxxxx for pointing me in the direction of
> 'Documentation/sysctl/vm.txt' in the kernel. Though as was noted this
> document is out of date referring to the 2.2 series kernels. It does
> however point to '/proc/sys/vm/feepages' as giving the number of pages
> at which 'kswapd' will start to free pages. This seems to exist in the
> 2.4 series kernels upto 2.4.9 (it may be 10 or 11?) after that the VM
> changed and this parameter went away (though the documentation hasn't 
> changed...... So I'm completely lost with the later kernels!). Anyway
> back to 2.4.9:
> 
>             # cat /proc/sys/vm/freepages
>             383     766     1149
> 
> Which means that the kernel will try to maintain 1149 pages (~4.5Mbytes)
> free memory, will try even harder to free memory at 766 pages and will
> stop allocating memory other than to root if free memory falls below
> 383 pages (~1.5Mbytes). This would seem to agree with 'vmstat'/'top'
> which tend to show ~3Mbytes free memory. I then tried to increase
> these values, however these appear to be read only values (tried by
> writing directly to the file and using 'sysctl'. Indeed a comment
> in the kernel file 'mm/page_alloc.c' confirms that the 'freepages'
> array is not writable due to potential conflicts with different memory
> zones. So I'm stuck again. I've not been able to find any obvious
> place in the kernel source to change these values at kernel compilation
> time either.
> 
> Any ideas? (either for 2.4.9 or ideally in latter/current
> kernels, I have reproduced what looked like a similar failure with
> 2.4.16 and 2.4.17 but unfortunately did not get the Oops details to
> confirm that it was the same problem, I'll try to setup a test to
> get this info, is there any more info that would help). FYI: test
> environment is a server and a number (~6) client machines running
> a mixture of 'bonnie' runs and back-to-back tar's copying the local
> /usr to the shared filesystem from the server (last time I looked
> at this it lasted ~ 24hours).
> 
> Many thanks for your time.
> 
> Ian
> 
>   
> 
> "Ian D. Hardy" wrote:
> > 
> > >
> > > On Tue, 2002-01-15 at 13:33, Ian D. Hardy wrote:
> > > > Hi,
> > > >
> > > > For some time we've been having problem with a server, which is
> acting
> > > > as a master/control node and NFS server for a computational
> cluster
> > > > (~180 client nodes). The server will crash after anywhere between
> > > > a few hours and 10 days operation. We've tried various kernels and
> > > > XFS patch versions from 2.4.9 kernel with XFS
> patch-2.4.9-xfs-2001-08-17
> > > > up to and including 2.4.16 kernel with the xfs-2.4.16-all-i386
> patch,
> > > > if anything the 2.4.9 kernel has proved the most reliable (it
> normally
> > > > lasts between 4 and 10 days! - 2.4.16 lasted less than 24hrs).
> >      .... more details deleted
> > > >
> > >
> > > Almost certainly this is an out of memory condition, just from
> looking
> > > at the code in the function you oopsed in. Would you say your system
> is
> > > stressed when it comes to memory?
> > >
> > > Steve
> > >
> > > Steve Lord                                      voice:
> +1-651-683-3511
> > > Principal Engineer, Filesystem Software         email: lord@xxxxxxx
> > >
> > 
> > I'd not regard the server as short of memory as its using ~660Mbytes
> as
> > file system cache, though interestingly it does appear to be using
> some
> > swap space. Is it possible that XFS is having problems when there is
> not
> > memory immediately available, I've included some 'vmstat' output:
> > 
> >  vmstat 10 10
> >    procs                      memory    swap          io     system
> cpu
> >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs
> us  sy  id
> >  0  1 28  11116   3784  42300 674656   0   0    37    35   19     3
> 0   2  23
> >  0  1 28  11116   3480  42300 674960   0   0     0     0  167   194
> 0   0 100
> >  0  1 28  11116   3176  42300 675264   0   0     0     0  139   107
> 0   0 100
> >  0  1 28  11116   3056  42300 675380   0   0     0     2  152   142
> 0   0 100
> > 
> > In Irix I'd tune the kernel parameters 'min_free_pages'... to ensure
> that
> > there was always physical memory available, is there any equivalent in
> > Linux (sorry if this is a silly/obvious question).
> > 
> > Many thanks.
> > 
> > Ian
> > 
> > --
> > 
> >
> ////////////////////////////////////////////////////////////////////////
> ////
> > Ian Hardy                                   Tel: 023 80593577
> > Research Services                           Fax: 023 80593131
> > Computing Services                          email: idh@xxxxxxxxxxx
> > Southampton University
> i.d.hardy@xxxxxxxxxxx
> > Southampton  S017 1BJ, UK.
> >
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
> \\\\
> 
> -- 
> 
> /////////////Technical Coordination, Research
> Services////////////////////
> Ian Hardy                                   Tel: 023 80 593577
> Computing Services                          Mobile: 0709 2127503    
> Southampton University                      email: idh@xxxxxxxxxxx
> Southampton  S017 1BJ, UK.                         i.d.hardy@xxxxxxxxxxx
> \\'BUGS: The notion of errors is ill-defined' (IRIX man page for
> netstat)\
-- 
Austin Gonyou
Systems Architect, CCNA
Coremetrics, Inc.
Phone: 512-698-7250
email: austin@xxxxxxxxxxxxxxx

"It is the part of a good shepherd to shear his flock, not to skin it."
Latin Proverb

Attachment: signature.asc
Description: This is a digitally signed message part

<Prev in Thread] Current Thread [Next in Thread>