> I've mostly ruled out the problem being per-page since as the test in
> 198 shows the corruption is the size of give write. In the case of
> doublewrite 16k.
> What appears to be happening as the file is grown out of order in
> relation to the writes that are happening every once in a while a 16k
> write is simply forgotten about.
> In other words write 16k at 16k then write 16k at 0k.
> So grow to 32k starting at 16k then go back and fill in
> the 0-16k block and sometimes the 0-16k block is simply not written
> to disk.
> I'm going to keep looking at this i_sem not locked theory a bit longer
> before I give up on yet another theory.
> If you have any observations theory let me know it might be helpful.
It appears my corruption pattern is different (but not unrelated).
I finally confirmed that when a file is corrupted, it is corrupted
with data occurring from another write. For example, Temperature gets
overwritten when height, or velocity is overwritten with humidity.
Unless there is something really wrong in caching, the corruption
must becoming from another client process. In one case, process
3 showed corruption in file 8. The corruption in file 8 matches
data from file 14 (files are written in order). Processes
are all started at the same time, but they do get out of sync
because of IO and other load issues.
It seems like buffers are overwritten, or if the IO is async
that a buffer is being used before it is actually no longer in