Google is currently in the middle of upgrading from ext2 to a more up
to date file system. We ended up choosing ext4. This thread touches
upon many of the issues we wrestled with, so I thought it would be
interesting to share. We should be sending out more details soon.
The driving performance reason to upgrade is that while ext2 had been "good
enough" for a very long time the metadata arrangement on a stale file
system was leading to what we call "read inflation". This is where we
end up doing many seeks to read one block of data. In general latency
from poor block allocation was causing performance hiccups.
We spent a lot of time with unix standard benchmarks (dbench, compile
bench, et al) on xfs, ext4, jfs to try to see which one was going to
perform the best. In the end we mostly ended up using the benchmarks
to validate our assumptions and do functional testing. Larry is
completely right IMHO. These benchmarks were instrumental in helping
us understand how the file systems worked in controlled situations and
gain confidence from our customers.
For our workloads we saw ext4 and xfs as "close enough" in performance
in the areas we cared about. The fact that we had a much smoother
upgrade path with ext4 clinched the deal. The only upgrade option we
have is online. ext4 is already moving the bottleneck away from the
storage stack for some of our most intensive applications.
It was not until we moved from benchmarks to customer workload that we
were able to make detailed performance comparisons and find bugs in
"Iterate often" seems to be the winning strategy for SW dev. But when
it involves rebooting a cloud of systems and making a one way
conversion of their data it can get messy. That said I see benchmarks
as tools to build confidence before running traffic on redundant live
PS for some reason "dbench" holds mythical power over many folks I
have met. They just believe it's the most trusted and standard
benchmark for file systems. In my experience it often acts as a random
number generator. It has found some bugs in our code as it exercises
the VFS layer very well.