On Wed, Nov 16, 2011 at 03:03:47PM -0600, Ben Myers wrote:
> Hey Christoph,
> On Tue, Nov 15, 2011 at 03:14:11PM -0500, Christoph Hellwig wrote:
> > If we convert and unwritten extent past the current i_size log the size
> > update
> > as part of the extent manipulation transactions instead of doing an unlogged
> > metadata update later.
> > Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> > ---
> > fs/xfs/xfs_aops.c | 11 ++++++-----
> > fs/xfs/xfs_iomap.c | 19 ++++++++++++++++++-
> > 2 files changed, 24 insertions(+), 6 deletions(-)
> > Index: linux-2.6/fs/xfs/xfs_iomap.c
> > ===================================================================
> > --- linux-2.6.orig/fs/xfs/xfs_iomap.c 2011-11-08 08:02:50.234386118
> > +0100
> > +++ linux-2.6/fs/xfs/xfs_iomap.c 2011-11-08 08:14:04.319888994 +0100
> > @@ -31,6 +31,7 @@
> > #include "xfs_ialloc_btree.h"
> > #include "xfs_dinode.h"
> > #include "xfs_inode.h"
> > +#include "xfs_inode_item.h"
> > #include "xfs_btree.h"
> > #include "xfs_bmap.h"
> > #include "xfs_rtalloc.h"
> > @@ -645,6 +646,7 @@ xfs_iomap_write_unwritten(
> > xfs_trans_t *tp;
> > xfs_bmbt_irec_t imap;
> > xfs_bmap_free_t free_list;
> > + xfs_fsize_t i_size;
> > uint resblks;
> > int committed;
> > int error;
> > @@ -705,7 +707,22 @@ xfs_iomap_write_unwritten(
> > if (error)
> > goto error_on_bmapi_transaction;
> > - error = xfs_bmap_finish(&(tp), &(free_list), &committed);
> > + /*
> > + * Log the updated inode size as we go. We have to be careful
> > + * to only log it up to the actual write offset if it is
> > + * halfway into a block.
> > + */
> > + i_size = XFS_FSB_TO_B(mp, offset_fsb + count_fsb);
> Do you intend to log the new inode size based upon the entire request?
> I discussed this a bit with Alex, and I think we agreed that it might be
> better to update the size based upon the length of the extent that was
It probably doesn't matter because we've already written data to the
entire range - we only really need to update the inode size once per
IO. All we are discussing here is how the failure case behaves -
whether the EOF reflects the entire IO but not the data that was
written, or the EOF reflects the converted range but not the data IO
However, I'm not sure this is even relevant - if we've got multiple
extents to write to and then convert in a single high level IO, they
would have been mapped into separate IOs, bios and ioends. That's
definitely the case for buffered IO, and AFAICT, the same for direct
IO (Christoph - can you confirm this?).
Hence I -think- we now only ever convert a single extent at a time
in this function and the loop is redundant, and so the code as
written behaves as per expected - the file size is updated once per
unwritten extent conversion that occurs beyong EOF.
Chrisptoph, if this "only one extent per ioend" condition is really
true, then is there any need for this loop at all? i.e. do we ever
do a partial extent conversion for a completed IO and therefore have
to do a second transaction to convert the rest of the extent?