On Tue, Nov 29, 2011 at 02:17:26PM -0500, Paul Anderson wrote:
> Hi all,
> 18.104.22.168 (x64 intel, in todays case a 40TiByte SAN volume) appears to
> have a bug whereby not all active metadata will be flushed even on a
> quiescent machine (one that has nonetheless in the past been under
> very high load).
> We have tried several variations of clean shutdowns, combined with for
> example the "echo 3 >/proc/sys/vm/drop_caches" trick to no avail - we
> still get lost files (well, 0 length files).
> We have several big servers scheduled to go down shortly, and I was
> wondering if there are other ideas besides just coping all recent data
> to another server.
I'd really love to debug this. We had a few reports of this issue
before, but I've neber been able to pinpoint it. Do you remember
anything specific to the workload touching these files?
To be save I'd rsync data off the first one going down. Can you try
to do an explicit fsync for every file, like
find | xargs /usr/sbin/xfs_io -c 'fsync'
and see if that helps? Answering that question would help us greatly
to pinpoint down the issue.