[Top] [All Lists]

Re: XFS performance tracking and regression monitoring

To: Mark Goodwin <markgw@xxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>
Subject: Re: XFS performance tracking and regression monitoring
From: Mark Goodwin <markgw@xxxxxxx>
Date: Fri, 24 Oct 2008 17:12:13 +1000
In-reply-to: <20081024035411.GH18495@disturbed>
Organization: SGI Engineering
References: <490108E6.7060502@xxxxxxx> <20081024035411.GH18495@disturbed>
Reply-to: markgw@xxxxxxx
User-agent: Thunderbird (Windows/20080914)

Dave Chinner wrote:
On Fri, Oct 24, 2008 at 09:29:42AM +1000, Mark Goodwin wrote:
We're about to deploy a system+jbod dedicated for performance
regression tracking. The idea is to build the XFS dev branch
nightly, run a bunch of self contained benchmarks, and generate
a progressive daily report - date on the X-axis, with (perhaps)
wallclock runtime on the y-axis.

wallclock runtime is not indicative of relative performance
for many benchmarks. e.g. dbench runs for a fixed time and
then gives a throughput number as it's output. It's the throughput
you want to compare.....

either, or. Both are differential. I want to keep this really simple,
just provide high level tracking on *when* a performance regression
may have been introduced but only with broad indicators. I don't
think anyone is regularly tracking this for XFS and we should be.

The aim is to track relative XFS performance on a daily basis
for various workloads on identical h/w. If each workload runs for
approx the same duration, the reports can all share the same
generic y-axis. THe long term trend should have a positive

If you are measuring walltime, then you should see a negative
gradient as an indication of improvement....

yes :)  what I ment, but was thinking "positively"

Regressions can be date correlated with commits.

For the benchmarks to be useful as regression tests, then the
harness really needs to be profiling and gathering statistics at the
same time so that we might be able to determine what caused the

I would regard that as follow-up once an issue has been identified.
My proposal is too simple to be useful for diagnosis, but it should
be enough to provide heads-up. That's the aim to start with. The same
h/w can also be set up for more sophisticated measurements in the
longer term.

Comments, benchmark suggestions?

The usual set - bonnie++, postmark, ffsb, fio, sio, etc.

Then some artificial tests that stress scalability like speed of
creating 1m small files with long names in a directory, the speed of
a cold cache read of the directory, the speed of a hot-cache read of
the directory, time to stat all the files (cold and hot cache),
time to remove all the files, etc. And then how well it scales
as you do this with more threads and directories in parallel...

yeah OK, bits and pieces of the the above, enough to provide broad

ANyone already running this?
Know of a test harness and/or report generator?

Perhap you might want to look more closely at FFSB - it has a
fairly interesting automated test harness. e.g. it was used to
produce these:


And you can probably set up custom workloads to cover all the things
that the standard benchmarks do.....

I'll poke around on those pages for some ideas.

Thanks for the reply.

<Prev in Thread] Current Thread [Next in Thread>