On Fri, Jun 08, 2007 at 06:03:18PM +0200, Iustin Pop wrote:
> On Sat, Jun 09, 2007 at 01:12:23AM +1000, David Chinner wrote:
> > > I took a look at both items since this discussion started. And honestly,
> > > I think 1) is harder that 4), so you're welcome to work on it :) The
> > > points that make it harder is that, per David's suggestion, there needs
> > > to be:
> > > - define two new transaction types
> > one new transaction type:
> > XFS_TRANS_AGF_FLAGS
> > and and extension to xfs_alloc_log_agf(). Is about all that is
> > needed there.
> > See the patch here:
> > http://oss.sgi.com/archives/xfs/2007-04/msg00103.html
> Ah, I see now. I was wondering how one can enable the new bits (CVS
> xfs_db shows the btreeblks but 'version' cmd doesn't allow to change
> them), it seems that manual xfs_db work + xfs_repair allows them.
The xfs_db work needs to be wrapped up in xfs_admin. That's relatively
simple to do, but the repair stage is needed to count the btree blocks
and update the counter in eah AGF. That could probably also be wrapped
up in an xfs_db script so conversion wouldn't require you to run
> > For an example of a very simlar transaction to what is needed
> > (look at xfs_log_sbcount()) and very similar addition to
> > the AGF (xfs_btreeblks).
> Just a question: why do you think this per-ag-bit to be persistent?
Shrinking is not the only reason why you might want to prevent
allocation within an AG. While we might be able to get away with a
totally in memory flag for a shrink, I really don't want to have
multiple mechanisms for doing roughly the same thing.
e.g. Think of fault tolerance - you detect a free space btree
corruption, so you prevent allocation and freeing in that AG (by
setting the relevant bits) until you can come along and repair it.
If you want to do online repair of this sort of corruption, then you
need to be able to stop the trees from being used between the time
that the corruption is detected and the time it is repair. That may
be longer than the filesystem is currently mounted...
> just curious. When I first thought about this, I was thinking more like
> this should be an in-core flag only, like the freeze flag is for the
> filesystem. The idea being that you don't need to recover this state
> after a crash
But a freeze is different - it's not modifying the filesystem,
just bringing it down into a consistent state. A shrink is a
modification operation, and so if it crashes half way though,
we need to ensure that recovery doesn't do silly things. Hence
it is best to have all the state associated with the shrink
journalled and recoverable. i.e. persistent.
> - there is no actual state, just restart the shrink
> operation if you want. And no actual filesystem state (e.g. space
> allocation or such) is happenning when you toggle the AGs not
> allocatable. This would allow a much simpler implementation of the
> 'no-alloc' part.
True, but much it would be much more limited in it's potential use.
> > > - update the ondisk-format (!), if we want persistence of these flags;
> > > luckily, there are two spare fields in the AGF structure.
> > Better to expand, I think. The AGF is a sector in length - we can
> > expand the structure as we need to this size without fear, esp. as
> > the part of the sector outside the structure is guaranteed to be
> > zero. i.e. we can add a fields flag to the end of the AGF
> > structure - old filesystems simple read as "no flags set" and
> > old kernels never look at those bits....
> Yes, makes sense. Just to make sure: the xfs_agf_t, xfs_agi_t and
> xfs_sb_t structures as defined in xfs_sb.h and xfs_ag.h are what
> actually is on-disk, right? Adding to them, defining the new bits i.e.
> XFS_AGF_FLAGS and bumping up XFS_AGF_ALL_BITS should take care of the
> on-disk part?
Don't forget to modify xfs_alloc_log_agf() as well ;)
SGI Australian Software Group