xfs
[Top] [All Lists]

Re: LWN.net article: creating 1 billion files -> XFS looses

To: Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>
Subject: Re: LWN.net article: creating 1 billion files -> XFS looses
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 7 Sep 2010 08:04:10 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20100906154254.5542426c@xxxxxxxxxxxxxxxxxxxx>
References: <201008191312.49346@xxxxxx> <20100906154254.5542426c@xxxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Mon, Sep 06, 2010 at 03:42:54PM +0200, Emmanuel Florac wrote:
> Le Thu, 19 Aug 2010 13:12:45 +0200
> Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx> écrivait:
> 
> > The subject is a bit harsh, but overall the article says:
> > XFS is slowest on creating and deleting a billion files
> > XFS fsck needs 30GB RAM to fsck that 100TB filesystem.
> 
> Just to go on this subject : a colleague (following my suggestion :)
> tried to create 1 billion files in the same XFS directory.
> Unfortunately the directories themselves don't scale well that far :
> after 1 million files in the first 30 minutes, file creation slows down
> gradually, so after 100 hours we had about 230 million files. The
> directory size at that point was 5,3 GB.

Oh, that's larger than I've every run before ;)

Try using:

# mkfs.xfs -d size=64k

Will speed up large directory operations by at least an order of
magnitude.

> Now we're starting afresh with 1000 directories with 1 million files
> each :)

Which is exactly the test that was used to generate the numbers that
were published.

> (Kernel version used : vanilla 2.6.32.11 x86_64 smp)

Not much point in testing that kernel - delayed logging is where the
future is for this sort of workload, which is what I'm testing.

FWIW, I'm able to create 50 million inodes in under 14 minutes with
delayed logging and 8 threads using directories of 100k entries.

The run to 1 billion inodes that I started late last night (10 hours
in) has just passed 700M inodes on a 16TB filesystem.  It's running
at about 25,000 creates/s, but it is limited by bad shrinker
behaviour causing the dentry cache to be completely trashed causing
~3000 read iops to reload dentries that are still necessary for
operation. It should be running about 3-4x faster than that.

FYI, The reason I'm taking a while to get numbers is that parallel
create workloads of this scale are showing significant problems (VM
livelocks, shrinker misbehaviour, lock contention in IO completion
processing, buffer cache hash scaling issues, etc) and I'm trying to
fix them as I go - these metadata workloads are completely
unexplored territory....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>