xfs
[Top] [All Lists]

Re: Warning from unlock_new_inode

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Warning from unlock_new_inode
From: Jan Kara <jack@xxxxxxx>
Date: Wed, 29 Feb 2012 11:24:44 +0100
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, Jan Kara <jack@xxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20120229014906.GX3592@dastard>
References: <20120222220137.GB3650@xxxxxxxxxxxxx> <20120228083444.GB22995@xxxxxxxxxxxxx> <20120229005351.GV3592@dastard> <20120229014906.GX3592@dastard>
User-agent: Mutt/1.5.20 (2009-06-14)
On Wed 29-02-12 12:49:06, Dave Chinner wrote:
> On Wed, Feb 29, 2012 at 11:53:51AM +1100, Dave Chinner wrote:
> > On Tue, Feb 28, 2012 at 03:34:44AM -0500, Christoph Hellwig wrote:
> > > On Wed, Feb 22, 2012 at 11:01:37PM +0100, Jan Kara wrote:
> > > >   Hello,
> > > > 
> > > >   while running fsstress on XFS partition with 3.3-rc4 kernel + my 
> > > > freeze
> > > > fixes (they do not touch anything relevant AFAICT) I've got the 
> > > > following
> > > > warning:
> > > 
> > > That's stressing including freezes or without?  Do you have a better
> > > description of te workload?
> > > 
> > > Either way it's an odd one, I can't see any obvious way how this would
> > > happen.
> > 
> > FWIW, I'm trying to track down exactly the same warning on a RHEL6.2
> > kernel being triggered by NFS filehandle lookup. The problem is
> > being being reproduced reliably by a well known NFS benchmark, but
> > this gives more a bit more information on where a race condition in
> > the inode lookup may exist.
> > 
> > That is, the only common element here in these two lookup paths is
> > that they are the only two calls to xfs_iget() with
> > XFS_IGET_UNTRUSTED set in the flags. I doubt this is a coincidence.
> 
> And it isn't.
> 
> Jan, can you try the (untested) patch below?
  Sure, I can include it in my testing. Just I've seen the warning just
once in a week of testing so reliability of my confirmation is rather low.

                                                                Honza

> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 
> xfs: fix inode lookup race
> 
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> When we get concurrent lookups of the same inode that is not in the
> per-AG inode cache, there is a race condition that triggers warnings
> in unlock_new_inode() indicating that we are initialising an inode
> that isn't in a the correct state for a new inode.
> 
> When we do an inode lookup via a file handle or a bulkstat, we don't
> serialise lookups at a higher level through the dentry cache (i.e.
> pathless lookup), and so we can get concurrent lookups of the same
> inode.
> 
> The race condition is between the insertion of the inode into the
> cache in the case of a cache miss and a concurrently lookup:
> 
> Thread 1                      Thread 2
> xfs_iget()
>   xfs_iget_cache_miss()
>     xfs_iread()
>     lock radix tree
>     radix_tree_insert()
>                               rcu_read_lock
>                               radix_tree_lookup
>                               lock inode flags
>                               XFS_INEW not set
>                               igrab()
>                               unlock inode flags
>                               rcu_read_unlock
>                               use uninitialised inode
>                               .....
>     lock inode flags
>     set XFS_INEW
>     unlock inode flags
>     unlock radix tree
>   xfs_setup_inode()
>     inode flags = I_NEW
>     unlock_new_inode()
>       WARNING as inode flags != I_NEW
> 
> This can lead to inode corruption, inode list corruption, etc, and
> is generally a bad thing to occur.
> 
> Fix this by setting XFS_INEW before inserting the inode into the
> radix tree. This will ensure any concurrent lookup will find the new
> inode with XFS_INEW set and that forces the lookup to wait until the
> XFS_INEW flag is removed before allowing the lookup to succeed.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_iget.c |   17 +++++++++++------
>  1 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
> index 05bed2b..2467ab7 100644
> --- a/fs/xfs/xfs_iget.c
> +++ b/fs/xfs/xfs_iget.c
> @@ -350,9 +350,19 @@ xfs_iget_cache_miss(
>                       BUG();
>       }
>  
> -     spin_lock(&pag->pag_ici_lock);
> +     /* These values _must_ be set before inserting the inode into the radix
> +      * tree as the moment it is inserted a concurrent lookup (allowed by the
> +      * RCU locking mechanism) can find it and that lookup must see that this
> +      * is an inode currently under construction (i.e. that XFS_INEW is set).
> +      * The ip->i_flags_lock that protects the XFS_INEW flag forms the
> +      * memory barrier that ensures this detection works correctly at lookup
> +      * time.
> +      */
> +     xfs_iflags_set(ip, XFS_INEW);
> +     ip->i_udquot = ip->i_gdquot = NULL;
>  
>       /* insert the new inode */
> +     spin_lock(&pag->pag_ici_lock);
>       error = radix_tree_insert(&pag->pag_ici_root, agino, ip);
>       if (unlikely(error)) {
>               WARN_ON(error != -EEXIST);
> @@ -360,11 +370,6 @@ xfs_iget_cache_miss(
>               error = EAGAIN;
>               goto out_preload_end;
>       }
> -
> -     /* These values _must_ be set before releasing the radix tree lock! */
> -     ip->i_udquot = ip->i_gdquot = NULL;
> -     xfs_iflags_set(ip, XFS_INEW);
> -
>       spin_unlock(&pag->pag_ici_lock);
>       radix_tree_preload_end();
>  
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR

<Prev in Thread] Current Thread [Next in Thread>