This patch series builds on top the current patch queue I posted
yesterday. This series replaces the struct xfs-dabuf with an xfs_buf
that can serve the same purpose.
Directory buffers may be made up of multiple extents, but are
currently formed by creating individual buffers and then copying the
data out of them into a linear memory region in a dabuf structure.
All dabuf operations then require walking all the underlying buffers
to change the state of the underlying buffers, and once a dabuf is
modified the contents need to be copied back to the underlying
buffers before they are logged.
All of these operations can be done on a normal xfs_buf, but the
normal xfs_buf does not support multiple disk block ranges or doing
multiple disjoint I/Os to read or write a buffer. Supporting
multiple disk block ranges is not difficult - we simply need to
attach an iovec-like array to the buffer rather than just using a
single block number and length.
Splitting the buffer up into multiple IOs for read and write is not
difficult, either. We already track the number of IO remaining to
complete an IO, so this can be used to wait for the multiple IO
dispatched to complete (for both read and write).
The only interesting twist to this is logging the changes. We can
treat the discontiguous buffer as a single buffer for most purposes
except for formatting the changes into the log. When formatting, we
need to split the changes into a format item per underlying region
so that recovery does not need to know about compound buffers and
can recover each segment of a directory block indivdually as it does
now. The fact that recovery will replay all or none of the
transaction ensures this process is still atomic from a change
recovery point of view.
Further, even though log recovery doesn't use discontiguous buffers,
there will be no confusion between a short buffer written by
recovery and a discontiguous buffer read by the directory code after
mount because the lengths of the buffer will be different. hence we
need no changes to mount or log recovery processing as we already
ensure that all log recovery changes hve been written to disk before
we finish the mount process.
The reason for making this changes is that we can now use a buffer
cache callback to do all the metadata CRC calculations and
verifications across both contiguous and discontiguous directory
blocks. It greatly simplifies the implementation of this code and
makes it consistent with all other metadata buffers. It should also
provide a performance improvement because it avoids double copying
and reduces the number of cached buffers.
I've tested this on 4k/4k (FSB/DB sizes), 512b/64k, and 4k/64k
combinations with xfstests and some dbench, fsmark and compilebench
stress loads. More testing is welcome....