On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> > Hi,
> > We have been able to repeatably produce xfs internal errors
> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
> > xfs drive. The copy gets about 96% of the way through and we get the
> > following messages:
> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> > Caller 0xffffffff8837446f
> Interesting. That's a corrupted inode extent btree - I haven't seen
> one of them for a long while. Were there any errors (like IO errors)
> reported before this?
> However, the first step is to determine if the error is on disk or an
> in-memory error. Can you post output of:
> - xfs_info <mntpt>
> - xfs_repair -n after a shutdown
> Can you upgrade xfsprogs (i.e. xfs_repair) to the latest version
> (3.1.2) before you do this as well?
We have upgraded the xfsprogs to 3.1.2 and in the process of
collecting the required infomation.
> > We have reproduced the condition 3 times and each time we have been
> > able to remount the drive ( to replay the transaction log ) and then
> > preform and xfs_repair.
> > We are just using cp to copy the file.
> > Some further details about the system:
> > Software:
> > - Fresh install of CentOS 5.5 64bit all patches up to date
> > - Kernel 2.6.18-194.3.1.el5.centos.plus
> I've got no idea exactly what version of XFS that has in it, so I
> can't say off the top of my head whether this is a fixed bug or not.
> Dave Chinner
During other testing we have also been able to reproduce the issue by
copying a self generated 248Gig file from another system disk to the
XFS disk. The file was generated using dd with an input of /dev/zero.
All the existing data (~6TB ) was successfully copied onto the storage
with out have the error. The thing to note is that all the existing
files are much smaller than the one that we are trying to copy in (
248Gig ). And since we have been having the shutdown we have copied
many smaller files ( files < 30Gig in size ) onto the storage area
with out issue