On Wed, 2010-09-22 at 16:44 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> Under heavy multi-way parallel create workloads, the VFS struggles
> to write back all the inodes that have been changed in age order.
> The bdi flusher thread becomes CPU bound, spending 85% of it's time
> in the VFS code, mostly traversing the superblock dirty inode list
> to separate dirty inodes old enough to flush.
> We already keep an index of all metadata changes in age order - in
> the AIL - and continued log pressure will do age ordered writeback
> without any extra overhead at all. If there is no pressure on the
> log, the xfssyncd will periodically write back metadata in ascending
> disk address offset order so will be very efficient.
> Hence we can stop marking VFS inodes dirty during transaction commit
> or when changing timestamps during transactions. This will keep the
> inodes in the superblock dirty list to those containing data or
> unlogged metadata changes.
This looks good. There is a minor typo I'll highlight
below in case you want to fix it.
> However, the timstamp changes are slightly more complex than this -
> there are a couple of places that do unlogged updates of the
> timestamps, and the VFS need to be informed of these. Hence add a
> new function xfs_trans_inode_chgtime() for transactional changes,
You actually used the name "xfs_trans_ichgtime".
> and leave xfs_ichgtime() for the non-transactional changes.
I haven't updated my cscope database, but it looks to me like
this leaves just one spot where xfs_ichtime() is still used.
Namely, xfs_setattr(), when truncating a zero-length file.
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Alex Elder <aelder@xxxxxxx>
. . .
> diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
> index b1fc2a6..37918f4 100644
> --- a/fs/xfs/linux-2.6/xfs_iops.c
> +++ b/fs/xfs/linux-2.6/xfs_iops.c
> @@ -96,40 +96,63 @@ xfs_mark_inode_dirty(
. . .
> + if (xfs_ichgtime_int(ip, flags))
> + * Transactional inode timestamp update. requires inod to be locked and
s/inod /inode /
> + * to the transaction supplied. Relies on the transaction subsystem to track
. . .