On Mon, Jul 02, 2012 at 09:33:24AM -0400, Brian Foster wrote:
> On 07/01/2012 08:07 PM, Dave Chinner wrote:
> > On Thu, Jun 28, 2012 at 06:52:56AM -0400, Brian Foster wrote:
> >> xfsaild idle mode logic currently leads to a couple hangs:
> >> 1.) If xfsaild is rescheduled in during an incremental scan
> >> (i.e., tout != 0) and the target has been updated since
> >> the previous run, we can hit the new target and go into
> >> idle mode with a still populated ail.
> >> 2.) A wake up is only issued when the target is pushed forward.
> >> The wake up can race with xfsaild if it is currently in the
> >> process of entering idle mode, causing future wake up
> >> events to be lost.
> >> These hangs have been reproduced and verified as fixed by
> >> running xfstests 273 in a loop on a slightly modified upstream
> >> kernel. The kernel is modified to re-enable idle mode as
> >> previously implemented (when count == 0) and with a revert of
> >> commit 670ce93f, which includes performance improvements that
> >> make this harder to reproduce.
> >> The solution, the algorithm for which has been outlined by
> >> Dave Chinner, is to modify xfsaild to enter idle mode only when
> >> the ail is empty and the push target has not been moved forward
> >> since the last push.
> >> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > Looks OK to me, and hasn't caused any problems here.
> > Final question - did you confirm with powertop that the xfsaild is
> > no longer causing wakeups a minute or two after you stop writing to
> > the filesystem? (I haven't yet)
> I hadn't tested with powertop, but I had some tracepoints hacked in
> around the idle/wake cases to verify the thread was actually scheduling
If you've added tracepoints that were useful for
debugging/verification, then send that as a patch as well. If users
have trouble then simply asking them for event traces is very easy
to do and gives us much better insight into what is happening....
You can't have enough tracepoints when things are going wrong ;)
> FWIW, I just gave powertop a quick test and it appears to work as
> With current upstream on my rhel6.3 VM, I see the following after
> running a 'touch /mnt/file;sync' and letting the fs idle for a bit:
> 0.5% ( 19.9) xfsaild/vdb1 : xfsaild (process_timeout)
> and this drops off completely with the patch applied. Thanks for the tip.
Great, then it is working exactly as expected.