> Andi Kleen wrote:
> > On Thu, Aug 31, 2000 at 03:44:34PM -0600, Davida, Joe wrote:
> > > Is there any information on the 2 TBytes file system
> > > limitation in Linux which is also inherited by the
> > > linux port of xfs? What imposes this limitation?
> > > Are the current efforts aimed at removing this
> > > limitation? I hear ReiserFS has a 16 Terabyte
> > > file system size.
> > The 2TB limitation is caused by the interface between block drivers
> > and file systems. On 32bit machines a 32bit value is used to pass the
> > sector number, with the sector being in 512byte units. Limiting yourself
> > to 31bits is probably safer to protect against signedness bugs in the
> > driver.
> > That gives you a 2^31 * 2^9 = 2^40bits block device size limit for all
> > file systems and raw devices.
> > On 64bit systems the limit is higher, assuming there are no 32bit limitatio
> > in the driver.
> Also note the 16TB limit is true for all linux file systems at the moment.
> The block interface on linux indexes on the block size of the file system, i
> almost all cases it's the PAGE_SIZE aka 4k.
> 2^32 * 4096 / TB's = 16TB
I would have to disagree with Russell here - the lower layers of the I/O
path map all addresses to 512 byte units, this line in ll_rw_block is the
bh->b_rsector = bh->b_blocknr * (bh->b_size>>9);
and b_rsector is defined as
unsigned long b_rsector; /* Real buffer location on disk */
The kiobuf based I/O path also moves the disk address through a 32 bit
variable which is in terms of 512 byte sectors.
So to break the 2 Tbyte limit (or possibly 1 Tbyte if there are signed
variables used somewhere) on device addressability on 32 bit platforms
will take some work.
[ So while I was writing this, Russell pointed out that LVM is mapping
down to individual devices from the one logical device, so in theory
it could concatenate smaller devices into a larger single logical
device. There would still be some work involved here, but probably
a smaller amount. ]
> XFS has additional problems with inode numbers overflowing the
> standard 32bit container once the file system its self goes over 2TB.
> Fixing this either requires changing linux to support 64bit inode numbers
> not terribly difficult but time consuming in that the linux community
> has to all agree as to what needs to be done, or change the way XFS
> deals with inode numbers. This does require some fundamental changes
> in XFS... up until this point we have made very few fundamental changes
> to XFS internals. Which as worked out very well since we know the code
> base is solid.
This is also not strictly true, the inode number visible out side of XFS
is a 32 bit number, it is possible to rework the way XFS and iget are
working together to avoid having the XFS inode number go through this
32 bit field. There are two caveats to this:
1. NFS will require work - it explicitly uses the 32 bit inode
number in file handles.
2. Dealing with getting 64 bits of inode information up to user
space, interfaces like stat64 are not quite doing the right
thing, since they only pass 32 bits of inode number.
Andi Kleen said: (excuse the formatting here!)
>> > > Fixing this either requires changing linux to support 64bit inode
>> numbers > not terribly difficult but time consuming in that the linux
>> community > has to all agree as to what needs to be done, or change
>> the way XFS > deals with inode numbers. This does require some
>> fundamental changes > in XFS... up until this point we have made very
>> few fundamental changes > to XFS internals. Which as worked out very
>> well since we know the code > base is solid.
>> Linux needs to do it generally for the NFSv3 client anyways, so it may
>> make more sense to change Linux.
I still think making the contents of the NFS file handle opaque to NFS
itself is a better way to go. It has a length field, and the file systems
can do the best they can with the space available.