On Fri, Jun 22, 2012 at 12:41:47PM -0400, Christoph Hellwig wrote:
> On Fri, Jun 22, 2012 at 09:24:14AM +1000, Dave Chinner wrote:
> > It may have been - I didn't catch the initial cause of the problem
> > in my log because it hard-hung the VM and it wasn't in the
> > scrollback buffer on the console. All I saw was a corruption error,
> > a shutdown and the stack blowing up.
> > Still, I think there is a real problem here - any persistent device
> > error on IO submission can cause this problem to occur....
> Yes, I was just trying to ask what actually happened as your original
> explanation didn't seem to be possible.
> I think the patch below should be enough as a minimal fix to avoid the
> stack overflow for 3.5. We'll need a much bigger overhaul of the buffer
> error handling after that, though.
> Index: xfs/fs/xfs/xfs_buf.c
> --- xfs.orig/fs/xfs/xfs_buf.c 2012-06-22 14:20:46.696568355 +0200
> +++ xfs/fs/xfs/xfs_buf.c 2012-06-22 14:21:37.733234717 +0200
> @@ -1255,7 +1255,7 @@ xfs_buf_iorequest(
> atomic_set(&bp->b_io_remaining, 1);
> - _xfs_buf_ioend(bp, 0);
> + _xfs_buf_ioend(bp, 1);
Hmmmm. How often do we get real io completion occurring before we
call _xfs_buf_ioend() here? I can't see that it is common, so this
is probably fine, but perhaps a few numbers might help here? If it
is rare as we think it is, then yeah, that would work....