xfs
[Top] [All Lists]

Re: hangs running dbench with 16 clients

To: Steve Lord <lord@xxxxxxx>
Subject: Re: hangs running dbench with 16 clients
From: Steven Pratt <slpratt@xxxxxxxxxxxxxx>
Date: Thu, 19 Jun 2003 10:28:42 -0500
Cc: linux-xfs@xxxxxxxxxxx
References: <3EF1C932.4080706@austin.ibm.com> <1056034152.1772.92.camel@jen.americas.sgi.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01
Steve Lord wrote:

On Thu, 2003-06-19 at 09:31, Steven Pratt wrote:


While trying to do performance regression testing on 2.5.xx kernels, we are seeing hangs when runnuing xfs with 16 clients quite often. No error messages, just that dbench does not complete. XFS is hte only filesystem the causes this behavior with dbench, It is very reproducibly on the 2.5.7x kernels. Has anyone else seen this? What kind of debug information would you need.

Steve



Hi Steve,

Is this the performance regression tests which get run on 2.5 kernels?
I have seen the results, never a comment about hangs before.

Yes, this is the nightly regression runs that we have been doing on the 2.5 kernel series. We have been more worried about changes in performance than functional issues to this point (that is our mission), Since I am running multiple runs of dbench and averaging results, having one run die still allows us to get numbers. I just looked over the results again and I see problem both on single client and on 16 client runs. I seem to remember that we have problem more often on 64 client runs but dropped that version for lack of runtime. Should also not that this may not be as reproducible as I first mentioned, seems more like 1 out of 8 runs or so dies.

Unfortunately, we have been somewhat tied up with 2.4 things and
2.5 is not getting as much air time - and kdb does not work which
makes debugging much more interesting. Let me try it out and see
what happens, and details about the setup you see this in would
be appreciated.

Understand about 2.5 work. Here is a link to the test machine setup information.
http://ltcperf.ncsa.uiuc.edu/data/2.5.71/2.4.20-vs-2.5.71/HardwareSoftware.html


We also have a bunch of code with fixes here we need
to sync to Linus, it is always possible you are hitting one of
those, did you ever try pulling from the cvs tree on oss.sgi.com?
The xfs in there is more upto date, if that makes the problem
go away, then I need to get an update to Linus sooner rather than
later.


No, we are not doing anything except standard trees at the moment. This is 99% automated so anything like CVS does not fit well. If you had a patch against any recent version of 2.5 I could give that a go.

Steve


<Prev in Thread] Current Thread [Next in Thread>