On Mon, 2011-07-04 at 15:27 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> When inodes are marked stale in a transaction, they are treated
> specially when the iinode log item is being inserted into the AIL.
> It trieѕ to avoid moving the log item forward in the AIL due to a
> race condition with the writing the underlying buffer back to disk.
> The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes in
> the AIL").
> To avoid moving the item forward, we return a LSN smaller than the
> commit_lsn of the completing transaction, thereby trying to trick
> the commit code into not moving the inode forward at all. I'm not
> sure this ever worked as intended - it assumes the inode is already
> in the AIL, but I don't think the returned LSN would have been small
> enough to prevent moving the inode. It appears that the reason it
> worked is that the lower LSN of the inodes meant they were inserted
> into the AIL and flushed before the inode buffer (which was moved to
> the commit_lsn of the transaction).
> The big problem is that with delayed logging, the returning of the
> different LSN means insertion takes the slow, non-bulk path. Worse
> yet is that insertion is to a position -before- the commit_lsn so it
> is doing a AIL traversal on every insertion, and has to walk over
> all the items that have already been inserted into the AIL. It's
> To compound the matter further, with delayed logging inodes are
> likely to go from clean to stale in a single checkpoint, which means
> they aren't even in the AIL at all when we come across them at AIL
> insertion time. Hence these were all getting inserted into the AIL
> when they simply do not need to be as inodes marked XFS_ISTALE are
> never written back.
> Transactional/recovery integrity is maintained in this case by the
> other items in the unlink transaction that were modified (e.g. the
> AGI btree blocks) and committed in the same checkpoint.
> So to fix this, simply unpin the stale inodes directly in
> xfs_inode_item_committed() and return -1 to indicate that the AIL
> insertion code does not need to do any further processing of these
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
I suggest one comment update, which I can do for
you or it can be done at another time.
But this looks good. I'll send it to Linus
Reviewed-by: Alex Elder <aelder@xxxxxxx>
> fs/xfs/xfs_inode_item.c | 14 ++++++++------
> fs/xfs/xfs_trans.c | 2 +-
> 2 files changed, 9 insertions(+), 7 deletions(-)
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 09983a3..b1e88d5 100644
. . .
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 7c7bc2b..3744337 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -1474,7 +1474,7 @@ xfs_trans_committed_bulk(
> lip->li_flags |= XFS_LI_ABORTED;
> item_lsn = IOP_COMMITTED(lip, commit_lsn);
> - /* item_lsn of -1 means the item was freed */
> + /* item_lsn of -1 means the item needs no further processing */
Probably should update the corresponding comment in
xfs_trans_item_committed() too. I have done this in
my local copy.
> if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)