xfs
[Top] [All Lists]

Re: [PATCH 5/5] xfs: log file size updates at I/O completion time

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: [PATCH 5/5] xfs: log file size updates at I/O completion time
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 18 Nov 2011 14:36:56 +1100
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20111117192548.GI29840@xxxxxxx>
References: <20111115201407.038216766@xxxxxxxxxxxxxxxxxxxxxx> <20111115201427.299361653@xxxxxxxxxxxxxxxxxxxxxx> <20111117192548.GI29840@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Nov 17, 2011 at 01:25:48PM -0600, Ben Myers wrote:
> Hey Christoph,
> 
> On Tue, Nov 15, 2011 at 03:14:12PM -0500, Christoph Hellwig wrote:
> > Do not use unlogged metadata updates and the VFS dirty bit for updating
> > the file size after writeback.  In addition to causing various problems
> > with updates getting delayed for far too log this also drags in the
> > unscalable VFS dirty tracking, and is one of the few remaining unlogged
> > metadata updates.
> > 
> > Note that we allocate a new transaction from the I/O completion handler.
> > While this sounds fairly dangerous it isn't an issue in practice given
> > that any appending write alreay had to start a transaction in writepages
> > to allocate blocks, and we'll start throtteling there if we run low on
> > log space or memory.
> > 
> > We could still occasionally stall in the completion handler, but given
> > that we have per-filesystems workqueues for the I/O completions,
> > and completions that do not have to either convert unwritten extents
> > or update the file size are processed from interrupt context we do not
> > have to worry about this stalling a system to death.
> > 
> > In addition to that implementing log reservations from ->writepage that
> > are only released by a different thread requires a lot of work, and even
> > with that wasn't quite doable in a deadlock-free manner.
> > 
> > Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> > 
> > ---
> >  fs/xfs/xfs_aops.c |   49 ++++++++++++++++++++++++++++---------------------
> >  fs/xfs/xfs_file.c |   36 +++++++++++++++++++++++++++++++++++-
> >  2 files changed, 63 insertions(+), 22 deletions(-)
> > 
> > Index: linux-2.6/fs/xfs/xfs_aops.c
> > ===================================================================
> > --- linux-2.6.orig/fs/xfs/xfs_aops.c        2011-11-15 18:43:04.183113001 
> > +0100
> > +++ linux-2.6/fs/xfs/xfs_aops.c     2011-11-15 18:43:05.059779662 +0100
> > @@ -26,6 +26,7 @@
> >  #include "xfs_bmap_btree.h"
> >  #include "xfs_dinode.h"
> >  #include "xfs_inode.h"
> > +#include "xfs_inode_item.h"
> >  #include "xfs_alloc.h"
> >  #include "xfs_error.h"
> >  #include "xfs_rw.h"
> > @@ -114,22 +115,39 @@ static inline bool xfs_ioend_is_append(s
> >   * not extend all the way to the valid file size then restrict this update 
> > to
> >   * the end of the write.
> >   */
> > -STATIC void
> > +STATIC int
> >  xfs_setfilesize(
> >     struct xfs_ioend        *ioend)
> >  {
> >     struct xfs_inode        *ip = XFS_I(ioend->io_inode);
> > +   struct xfs_mount        *mp = ip->i_mount;
> > +   struct xfs_trans        *tp;
> >     xfs_fsize_t             isize;
> > +   int                     error = 0;
> >  
> >     xfs_ilock(ip, XFS_ILOCK_EXCL);
> >     isize = xfs_new_eof(ip, ioend->io_offset + ioend->io_size);
> > -   if (isize) {
> > -           trace_xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> > -           ip->i_d.di_size = isize;
> > -           xfs_mark_inode_dirty(ip);
> > +   xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +
> > +   if (!isize)
> > +           return 0;
> > +
> > +   trace_xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> > +
> > +   tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> > +   error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0);
> > +   if (error) {
> > +           xfs_trans_cancel(tp, 0);
> > +           return error;
> >     }
> >  
> > -   xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +   xfs_ilock(ip, XFS_ILOCK_EXCL);
> > +
> > +   ip->i_d.di_size = isize;
> 
> Make this assignment above, before dropping the ilock, so that multiple
> updates to di_size cannot be reordered while allocating a transaction.

Which then makes it a non-transactional change. i.e. it can be
written to disk without having been logged in a transaction. That
defeats the purpose of this change....

I think what the code needs to do here is recheck whether the size
update needs to be done, and if not cancel the transaction and do
nothing.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>