On Tue, Jul 22, 2008 at 04:32:27PM +1000, Lachlan McIlroy wrote:
> This changes xfs_inode_item_push() to use XFS_IFLUSH_ASYNC_NOBLOCK when
> flushing an inode so the flush wont block on inode cluster buffer lock.
> Also change the prototype of the IOP_PUSH operation so that xfsaild_push()
> can bump it's stuck count.
> This change was prompted by a deadlock that would only occur on a debug
> XFS where a thread creating an inode had the buffer locked and was trying
> to allocate space for the inode tracing facility. That recursed back into
> the filesystem to flush data which created a transaction and needed log
> space which wasn't available.
A quick question - shouldn't the allocation use KM_NOFS if it
being called in place that would cause recursion? Anywhere the
inode tracing is called with a an inode lock held outside a
transaction will also be suseptible to this deadlock.
Also, there is the possibility that aborting writeback from
the AIL in this manner could cause stalls or deadlocks
if this item is at the of the log and it doesn't get written
back straight away and the trigger thread then goes to sleep
waiting for the tail to move. In that case, everything subsequent
transaction will then go to sleep without trying to push the
log and only the watchdog timeout on aild will get things
So to fix a deadlock in debug code, I don't think we want to
change the flush semantics of the AIL push on inodes for
production code. Prevent the debug tracing code from recursing,