[Top] [All Lists]

Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

To: Andreas Dilger <adilger@xxxxxxxxxxxxx>, linux-ext4@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Subject: Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
From: Timothy Shimmin <tes@xxxxxxx>
Date: Mon, 16 Apr 2007 18:01:17 +1000
Cc: hch@xxxxxxxxxxxxx
In-reply-to: <20070412110550.GM5967@schatzie.adilger.int>
References: <20070412110550.GM5967@schatzie.adilger.int>
Sender: xfs-bounce@xxxxxxxxxxx
Hi Andreas,

--On 12 April 2007 5:05:50 AM -0600 Andreas Dilger <adilger@xxxxxxxxxxxxx> 

I'm interested in getting input for implementing an ioctl to efficiently
map file extents & holes (FIEMAP) instead of looping over FIBMAP a billion

I had come up with a plan independently and was also steered toward XFS_IOC_GETBMAP* ioctls which are in fact very similar to my original plan, though I think the XFS structs used there are a bit bloated.

They certainly seem to be (combining entries and header).

struct fibmap_extent {
        __u64 fe_start;                 /* starting offset in bytes */
        __u64 fe_len;                   /* length in bytes */

struct fibmap {
        struct fibmap_extent fm_start;  /* offset, length of desired mapping */
        __u32 fm_extent_count;          /* number of extents in array */
        __u32 fm_flags;                 /* flags (similar to XFS_IOC_GETBMAP) */
        __u64 unused;
        struct fibmap_extent fm_extents[0];

# define FIEMAP_LEN_MASK                0xff000000000000
# define FIEMAP_LEN_HOLE        0x01000000000000
# define FIEMAP_LEN_UNWRITTEN   0x02000000000000

All offsets are in bytes to allow cases where filesystems are not going
block-aligned/sized allocations (e.g. tail packing).  The fm_extents array
returned contains the packed list of allocation extents for the file,
including entries for holes (which have fe_start == 0, and a flag).

The ->fm_extents[] array includes all of the holes in addition to
allocated extents because this avoids the need to return both the logical
and physical address for every extent and does not make processing any

Well, that's what stood out for me. I was wondering where the "fe_block" field had gone - the "physical address". So is your "fe_start; /* starting offset */" actually the disk location (not a logical file offset) _except_ in the header (fibmap) where it is the desired logical offset. Okay, looking at your example use below that's what it looks like. And when you refer to fm_start below, you mean fm_start.fe_start? Sorry, I realise this is just an approximation but this part confused me. So you get rid of all the logical file offsets in the extents because we report holes explicitly (and we know everything is contiguous if you include the holes).


Caller works something like:

        char buf[4096];
        struct fibmap *fm = (struct fibmap *)buf;
        int count = (sizeof(buf) - sizeof(*fm)) / sizeof(fm_extent);
        fm->fm_extent.fe_start = 0; /* start of file */
        fm->fm_extent.fe_len = -1;   /* end of file */
        fm->fm_extent_count = count; /* max extents in fm_extents[] array */
        fm->fm_flags = 0;            /* maybe "no DMAPI", etc like XFS */

        fd = open(path, O_RDONLY);

        /* The last entry will have less extents than the maximum */
        while (fm->fm_extent_count == count) {
                rc = ioctl(fd, FIEMAP, fm);
                if (rc)

                /* kernel filled in fm_extents[] array, set fm_extent_count
                 * to be actual number of extents returned, leaves fm_start
                 * alone (unlike XFS_IOC_GETBMAP). */

                for (i = 0; i < fm->fm_extent_count; i++) {
                        __u64 len = fm->fm_extents[i].fe_len & FIEMAP_LEN_MASK;
                        __u64 fm_next = fm->fm_start + len;
                        int hole = fm->fm_extents[i].fe_len & FIEMAP_LEN_HOLE;
                        int unwr = fm->fm_extents[i].fe_len & 

                                fm->fm_start, fm_next - 1,
                                hole ? 0 : fm->fm_extents[i].fe_start,
                                hole ? 0 : fm->fm_extents[i].fe_start +
                                           fm->fm_extents[i].fe_len - 1,
                                len, hole ? "(hole) " : "",
                                unwr ? "(unwritten) " : "");

                        /* get ready for printing next extent, or next ioctl */
                        fm->fm_start = fm_next;

<Prev in Thread] Current Thread [Next in Thread>