xfs
[Top] [All Lists]

Re: 3.2.9 and locking problem

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: 3.2.9 and locking problem
From: Arkadiusz Miśkiewicz <arekm@xxxxxxxx>
Date: Mon, 12 Mar 2012 14:43:58 +0100
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maven.pl; s=maven; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-type:content-transfer-encoding:message-id; bh=/MgKNuq7yiPpEZsBOqiRWJTWDs1OuEXU+0AzABr/eQw=; b=FJEqZOYwzBoPBN/FofPjlyNiLvbFEcp9gfEMdy4DTwDNy3Fpzu7mpa5l/48/BAl6x4 B9y7kSVhpPJgaznAUgUXFqxjwf6bcy53P3q6n4cMqJlRjJ/AAwykcqcuZawKFS+3tyMP jyNa0GLLeABFpL/qQsf+4iND4kSmelzvbu3/o=
In-reply-to: <20120312005325.GX5091@dastard>
References: <201203092028.47177.arekm@xxxxxxxx> <20120312005325.GX5091@dastard>
User-agent: KMail/1.13.7 (Linux/3.3.0-rc7; KDE/4.8.1; x86_64; ; )
On Monday 12 of March 2012, Dave Chinner wrote:
> On Fri, Mar 09, 2012 at 08:28:47PM +0100, Arkadiusz Miśkiewicz wrote:
> > Are there any bugs in area visible in tracebacks below? I have a system
> > where one operation (upgrade of single rpm package) causes rpm process
> > to hang in D-state, sysrq-w below:
> > 
> > [  400.755253] SysRq : Show Blocked State
> > [  400.758507]   task                        PC stack   pid father
> > [  400.758507] rpm             D 0000000100005781     0  8732   8698
> > 0x00000000 [  400.758507]  ffff88021657dc48 0000000000000086
> > ffff880200000000 ffff88025126f480 [  400.758507]  ffff880252276630
> > ffff88021657dfd8 ffff88021657dfd8 ffff88021657dfd8 [  400.758507] 
> > ffff880252074af0 ffff880252276630 ffff88024cb0d005 ffff88021657dcb0 [ 
> > 400.758507] Call Trace:
> > [  400.758507]  [<ffffffff8114b22a>] ? kmem_cache_free+0x2a/0x110
> > [  400.758507]  [<ffffffff8114d2ed>] ? kmem_cache_alloc+0x11d/0x140
> > [  400.758507]  [<ffffffffa00df3c7>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
> > [  400.758507]  [<ffffffff8148b78a>] schedule+0x3a/0x50
> > [  400.758507]  [<ffffffff8148d25d>] rwsem_down_failed_common+0xbd/0x150
> > [  400.758507]  [<ffffffff8148d303>] rwsem_down_write_failed+0x13/0x20
> > [  400.758507]  [<ffffffff812652a3>]
> > call_rwsem_down_write_failed+0x13/0x20 [  400.758507] 
> > [<ffffffff8148c8ed>] ? down_write+0x2d/0x40
> > [  400.758507]  [<ffffffffa00cf97c>] xfs_ilock+0xcc/0x120 [xfs]
> > [  400.758507]  [<ffffffffa00d4ace>] xfs_setattr_nonsize+0x1ce/0x5b0
> > [xfs] [  400.758507]  [<ffffffff81265502>] ?
> > __strncpy_from_user+0x22/0x60 [  400.758507]  [<ffffffffa00d52ab>]
> > xfs_vn_setattr+0x1b/0x40 [xfs] [  400.758507]  [<ffffffff8117c1a2>]
> > notify_change+0x1a2/0x340
> > [  400.758507]  [<ffffffff8115ed80>] chown_common+0xd0/0xf0
> > [  400.758507]  [<ffffffff8115fe4c>] sys_chown+0xac/0x1a0
> > [  400.758507]  [<ffffffff81495112>] system_call_fastpath+0x16/0x1b
> 
> I can't see why we'd get a task stuck here - it's waiting on the
> XFS_ILOCK_EXCL. The only reason for this is if we leaked an unlock
> somewhere. It appears you can reproduce this fairly quickly, 

linux vserver patch [1] seems to be messing with locking. Would be nice if you 
could make a quick look at it to see if it can be considered guilty part?

On the other hand I wasn't able to reproduce on 3.0.22. vserver patch for .22 
[2] is doing the same thing as vserver patch for 3.2.9.

> so
> running an event trace via trace-cmd for all the xfs_ilock trace
> points and posting the report output might tell us what inode is
> blocked and where we leaked (if that is the cause).

Will try to get more information but this will take some time (most likely 
weeks) to get this machine down for debugging.

> Cheers,
> Dave.

1. http://vserver.13thfloor.at/Experimental/patch-3.2.9-vs2.3.2.7.diff
2. http://vserver.13thfloor.at/Experimental/patch-3.0.22-vs2.3.2.3.diff
-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

<Prev in Thread] Current Thread [Next in Thread>