On Wed, Apr 20, 2011 at 11:21:31AM -0400, Christoph Hellwig wrote:
> How do you want to union the existance of an extent with a state
> on disk, with a pending modification to it that is still in-memory
> and not flushed out to disk yet? This is looking into an uncertain
> future, as the extent map might change in various other ways before
> the transaction to conver the unwritten extents goes to disk.
So for example, suppose you have a single unwritten extent on disk,
but there are 3 regions within that extent range's that have unwritten
pages, you return 3 or 4 fiemap_extent structures, reflecting the
state if the unwritten pages were pushed out to disk at the time of
the fiemap ioctl --- but without actually doing the expensive sync
operation. The one case where you can't do that is in the case of
delayed allocation blocks, since you won't know where on disk they
would be going, necessarily --- but hey, conveniently we have a
DELALLOC bit already defined....
> And if we do this it would need to be a new option to FIEMAP, as
> it changes the semantics from the existing one that returns the
> actual state on disk (plus the magic delalloc bit).
Well, we seem to have inconsistent semantics right now, because we
never defined the semantics clearly enough from the beginning. So no
matter which choice we choose, including "the on-disk extent state
only, and nuke the delalloc bit", we will be changing semantics. I'm
not sure we can get around that.
> And even if you find semantics that take pending unwrittent extent
> conversions into account and still make sense how do you plan to
> implement them? For buffered writes into unwritten extents it could
> be done by walking the pagecache and buffers after adding a new
> flag for an already converted unwritten extent to the buffer head
> state. But there's no easy way to do that for direct I/O.
If the file is being actively modified (for example with direct I/O),
there will be inevitably race conditions. If only some of the pending
conversions have been taken into account, that seems like it's
reasonable result. If a file is actively being modified by many DIO
writes, even using FIEMAP_FLAG_SYNC isn't going to help you get a
coherent view of the file, so this seems to be a previously unsolved
> > In the case of #1 and #2, we really need to implement support for
> > SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
> > this information.
> We need to do that anyway, as fiemap is a horrible interface for
> tools that just want to skip holes.
I agree that implementing SEEK_HOLE/SEEK_DATA is a good thing
regardless of which choice we end up choosing.