On Sun, Jul 18, 2010 at 1:28 AM, Ilia Mirkin <imirkin@xxxxxxxxxxxx> wrote:
> On Sun, Jul 18, 2010 at 12:57 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>> On Sun, Jul 18, 2010 at 11:20:33AM +1000, Dave Chinner wrote:
>>> So, back to the situation with the WARN_ON(). You're running
>>> applications that are doing something that:
>>> a) is not supported;
>>> b) compromises data integrity guarantees;
>>> c) is not reliably reported; and
>>> d) might be causing hangs
>>> Right now I'm not particularly inclined to dig into this further;
>>> it's obvious the applications are doing something that is not
>>> supported (by XFS or the generic page cache code), so this is the
>>> first thing you really need to care about getting fixed if you value
>>> your backups...
>> While it's slightly crazy it's also a pretty easy way for users to shoot
>> themselve in their feet. Unlike the generic filesystems with their
>> simplistic i_mutex locking we have a way to assure this works properly
>> in XFS with the shared/exclusive iolock, so I'm willing to look into
>> this further.
>> Ilia, would you be willing to test patches for this?
> If by "this" you mean the WARN_ON's, no problem. It should be easy to
> repro in a non-critical setup, although I haven't tried. If you mean
> the hang, it will not be so easy to reproduce, as it has only happened
> once so far.
> I would also be happy to share the details of our setup, if you'd like
> to be able to play with it directly yourself. With our setup, multiple
> WARN_ON's happen every time we run a backup (last time I checked, it
> was ~50, but I'm sure it varies).
> As a last thought on the matter, I'm sure that this is all brought on
> by our use of innodb_flush_method=O_DIRECT which tells mysql to use
> direct io. Leaving that at its default which will just open the files
> regularly will probably make all these issues go away. I do not have a
> sufficient understanding of the details of how mysql uses direct io,
> and how it interacts with xtrabackup (which claims to work fine with
> direct io, but who knows) to be able to declare that it's safe, so I'm
> happy to accept Dave's advice to Not Do That.
In case you guys are interested, I've also opened a bug at
https://bugs.launchpad.net/percona-xtrabackup/+bug/606981. I never
quite bothered to _really_ understand all of the details of O_DIRECT
vs mmap vs read, so if I've misrepresented reality in the bug, feel
free to correct it there, or let me know and I'll try to straighten