On Tue, Jun 05, 2007 at 10:00:12AM +0200, Iustin Pop wrote:
> On Mon, Jun 04, 2007 at 07:21:15PM +1000, David Chinner wrote:
> > > allocated on an available AG and when you remove the originals, the
> > > to-be-shrinked AGs become free. Yes, utterly non-optimal, but it was the
> > > simplest way to do it based on what I knew at the time.
> > Not quite that simple, unfortunately. You can't leave the
> > AGs locked in the same way we do for a grow because we need
> > to be able to use the AGs to move stuff about and that
> > requires locking them. Hence we need a separate mechanism
> > to prevent allocation in a given AG outside of locking them.
> > Hence we need:
> > - a transaction to mark AGs "no-allocate"
> > - a transaction to mark AGs "allocatable"
> > - a flag in each AGF/AGI to say the AG is available for
> > allocations (persistent over crashes)
> > - a flag in the per-ag structure to indicate allocation
> > status of the AG.
> > - everywhere we select an AG for allocation, we need to
> > check this flag and skip the AG if it's not available.
> > FWIW, the transactions can probably just be an extension of
> > xfs_alloc_log_agf() and xfs_alloc_log_agi()....
> A question: do you think that the cost of having this in the code
> (especially the last part, check that flag in every allocation function)
> is acceptable? I mean, let's say one would write the patch to implement
> all this. Does it have a chance to be accepted? Or will people say it's
> only bloat? ...
Lots of ppl ask for shrink capability on XFS, so if it's implemented
and reviewed and passes QA tests, then I see no reason why it wouldn't
> > Yeah, 1) and 4) are separable parts of the problem and can be done
> > in any order. 2) can be implemented relatively easily as stated
> > above.
> > 3) is the hard one - we need to find the owner of each block
> > (metadata and data) remaining in the AGs to be removed. This may be
> > a directory btree block, a inode extent btree block, a data block,
> > and extended attr block, etc. Moving the data blocks is easy to
> > do (swap extents), but moving the metadata blocks is a major PITA
> > as it will need to be done transactionally and that will require
> > a bunch of new (complex) code to be written, I think. It will be
> > of equivalent complexity to defragmenting metadata....
> > If we ignore the metadata block problem then finding and moving the
> > data blocks should not be a problem - swap extents can be used for
> > that as well - but it will be extremely time consuming and won't
> > scale to large filesystem sizes....
> So given these caveats, is there a chance that a) this will be actually
> useful and b) will this be accepted?
Look at it this way - if we get to the point where 3 is a problem, then
we've got most of a useful shrinker. That's way ahead of what we
have now and in a lot of cases it will just work.
The corner cases are the hard bit, but we can work on them incrementally
once the rest is done, and in doing so we'll also be introducing the
means by which to defragment metadata. IOWs, we kill two birds with
one stone at that point in time.
Likewise for the shrink case that needs to move the log - we've got hooks for
userspace tools to move the log, just no implementation. Implementing log moving
for shrink will also enable us to do online log resize and internal/external
log switching. Once again, two birds with one stone.
Hence I don't see these issues as showstoppers at all - getting to
the point of a full shrink implementation will give us other features
that we need to have anyway....
SGI Australian Software Group