On Wed, Jan 05, 2011 at 11:07:35PM +0100, Michael Monnerie wrote:
> On Mittwoch, 5. Januar 2011 Lukas Czerner wrote:
> > If we
> > notice that we are running out of space in advance (how much in
> > advance?), we can start trimming smaller chunks, until we reach
> > reasonable a reasonable pool of reclaimed space, or until we trim
> > the whole device.
> Would it be possible that all blocks that have been in use since the
> last FITRIM run can be logged? Like this, we would only need to clean
> those. If you have a 2TB volume, probably only 25% of it have been
> rewritten (=500GB) since the last run, and of that maybe 80% are still
> in use at the time we run FITRIM, so only 100GB would need the cleanup.
> Maybe each AG could store a bitmap of written blocks, that are reset by
> a FITRIM run. That could be an asynchronous written bitmap and shouldn't
> disturb performance too much. Maybe it's even only needed to store a bit
> per sunit*swidth blocks, to keep that table small. A mount option could
> be used to enable that feature, so only those which use thin
> provisioning or SSDs or similar devices enable it at wish.
Not easily. It would need a second set of free space btrees for
tracking freed but untrimmed extents. The idea of the background
trim is that it doesn't need all that complexity because all the
status information on where the trim process is up to can be kept
This is basically the same mode of functioning as the period
background xfs_fsr defragmentation mode - run it for an hour every
couple of nights,and it will slowly work it way through the entire
filesystem over a period of weeks. No state or additional on-disk
structures are needed for xfs_fsr to do it's work....
The background trim is intended to enable even the slowest of
devices to be trimmed over time, while introducing as little runtime
overhead and complexity as possible. Hence adding complexity and
runtime overhead to optimise background trimming tends to defeat the
primary design goal....
> Especially for 100TB size devices that seems like something that should
> be thought of, as maybe if you run FITRIM once a week there, only <10TB
> have been rewritten, if at all, and such a table would boost a FITRIM
> run a lot.
If we want optimised, only-trim-what-we-free behaviour, we need to
hook into the transaction subsystem and issue TRIM commands at the
time extents are actually freed. That is much more complex to
implement but much easier to optimise because it doesn't require
persistent state on disk. However, most devices are simply not ready
to handle the flood of TRIM commands this generates, with
performance degrading by ~10-20% for the best of devices and
_10-100x_ for the worst...