On Fri, Jul 22, 2011 at 2:04 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> [Amit - please don't top post. If you are going to quote the
> previous email, please reply after the quoted text. ]
> [Alex - I haven't seen either of the previous two emails in this
> thread from Amit - has SGI made it onto a spam-blocking RBL again? ]
> On Fri, Jul 22, 2011 at 12:29:21PM +0530, Amit Sahrawat wrote:
>> On Fri, Jul 22, 2011 at 10:53 AM, Amit Sahrawat
>> <amit.sahrawat83@xxxxxxxxx> wrote:
>> > On Fri, Jul 22, 2011 at 10:22 AM, Amit Sahrawat
>> > <amit.sahrawat83@xxxxxxxxx> wrote:
>> >> Dear All,
>> >> Target : ARM
>> >> Recently I encountered a corruption on XFS for RC-3. While the
>> >> DIRECT-IO for a file was in operation (Write operation) there was a
>> >> power reset - Only one file at a time is being written to the disk
>> >> using DIO.. After reboot on mounting I just tried to remove the file
>> >> and encountered the below mentioned corruption. The hard disk is not
>> >> able to mount after this, only after clearing logs (xfs_repair –L) –
> Lots of weird characters in your email...
>> >> disk is able to mount
>> >> XFS mounting filesystem sda1
>> >> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1535 of file
>> >> fs/xfs/xfs_alloc.c. Caller 0xc0152c04
>> >> Backtrace:
>> >> [<c0023000>] (dump_backtrace+0x0/0x110) from [<c02dd680>]
>> >> (dump_stack+0x18/0x1c)
>> >> r6:00000000 r5:c0152c04 r4:00000075 r3:e3ec1c88
>> >> [<c02dd668>] (dump_stack+0x0/0x1c) from [<c0176bd0>]
>> >> (xfs_error_report+0x4c/0x5c)
>> >> [<c0176b84>] (xfs_error_report+0x0/0x5c) from [<c01510d4>]
>> >> (xfs_free_ag_extent+0x400/0x600)
>> >> [<c0150cd4>] (xfs_free_ag_extent+0x0/0x600) from [<c0152c04>]
>> >> (xfs_free_extent+0x8c/0xa4)
>> >> [<c0152b78>] (xfs_free_extent+0x0/0xa4) from [<c015ffa8>]
>> >> (xfs_bmap_finish+0x108/0x194)
>> >> r7:e3ec1e10 r6:00000000 r5:e3737870 r4:e373e000
>> >> [<c015fea0>] (xfs_bmap_finish+0x0/0x194) from [<c017e840>]
>> >> (xfs_itruncate_finish+0x1dc/0x30c)
>> >> [<c017e664>] (xfs_itruncate_finish+0x0/0x30c) from [<c0197dc8>]
>> >> (xfs_inactive+0x20c/0x40c)
>> >> [<c0197bbc>] (xfs_inactive+0x0/0x40c) from [<c01a3da0>]
>> >> (xfs_fs_clear_inode+0x50/0x60)
>> >> r9:e3ec0000 r8:c001f128 r7:00000000 r6:e4671a80 r5:c0312454
>> >> r4:e4667300
>> >> [<c01a3d50>] (xfs_fs_clear_inode+0x0/0x60) from [<c00bdd84>]
>> >> (clear_inode+0x8c/0xe8)
>> >> r4:e4667420 r3:c01a3d50
>> >> [<c00bdcf8>] (clear_inode+0x0/0xe8) from [<c00be584>]
>> >> (generic_delete_inode+0xdc/0x178)
>> >> r4:e4667420 r3:ffffffff
>> >> [<c00be4a8>] (generic_delete_inode+0x0/0x178) from [<c00be640>]
>> >> (generic_drop_inode+0x20/0x68)
>> >> r5:00000000 r4:e4667420
>> >> [<c00be620>] (generic_drop_inode+0x0/0x68) from [<c00bd368>]
>> >> (iput+0x6c/0x7c)
>> >> r4:e4667420 r3:c00be620
>> >> [<c00bd2fc>] (iput+0x0/0x7c) from [<c00b4c40>] (do_unlinkat+0xfc/0x154)
>> >> r4:e4667420 r3:00000000
>> >> [<c00b4b44>] (do_unlinkat+0x0/0x154) from [<c00b4cb0>]
>> >> (sys_unlink+0x18/0x1c)
>> >> r7:0000000a r6:00000000 r5:00000000 r4:be90299b
>> >> [<c00b4c98>] (sys_unlink+0x0/0x1c) from [<c001ef80>]
>> >> (ret_fast_syscall+0x0/0x30)
>> >> xfs_force_shutdown(sda1,0x8) called from line 4047 of file
>> >> fs/xfs/xfs_bmap.c. Return address = 0xc015ffec
>> >> Filesystem "sda1": Corruption of in-memory data detected. Shutting
>> >> down filesystem: sda1
>> >> Please umount the filesystem, and rectify the problem(s)
> I've asked this before: please trim/paste your stack traces so they
> don't line wrap and are human readable.
I compose my mails in the Word and then Copy+paste - that might be the
reason which resulted in wrong formatting. Will take care of this next
> [<c0023000>] (dump_backtrace+0x0/0x110)
> [<c02dd668>] (dump_stack+0x0/0x1c)
> [<c0176b84>] (xfs_error_report+0x0/0x5c)
> [<c0150cd4>] (xfs_free_ag_extent+0x0/0x600)
> [<c0152b78>] (xfs_free_extent+0x0/0xa4)
> [<c015fea0>] (xfs_bmap_finish+0x0/0x194)
> [<c017e664>] (xfs_itruncate_finish+0x0/0x30c)
> [<c0197bbc>] (xfs_inactive+0x0/0x40c)
> [<c01a3d50>] (xfs_fs_clear_inode+0x0/0x60)
> [<c00bdcf8>] (clear_inode+0x0/0xe8)
> [<c00be4a8>] (generic_delete_inode+0x0/0x178)
> [<c00be620>] (generic_drop_inode+0x0/0x68)
> [<c00bd2fc>] (iput+0x0/0x7c)
> [<c00b4b44>] (do_unlinkat+0x0/0x154)
> [<c00b4c98>] (sys_unlink+0x0/0x1c)
> So, you powered off an active machine while writing to it, and after
> it started back up it hit a free space between corruption. And then
> you couldn't mount it because log replay was trying to replay the
> last committed transaction to the log. That transaction shows inode
> 132 being unlinked, added to the AGI unliked list, and then being
> inactivated. There is an EFI committed for 1 extent. There is no EFD
> committed, so the shutdown occurred during that operation. Log
> replay then hits the corruption repeatedly by trying to replay the
> EFI to complete the extent free operation.
Yes, it happened exactly as you mentioned above. But the problem
happened much earlier.
While writing the 'reset' happened. At the next reboot - the
filesystem mounted even though there
was corruption - the file and journal did not match. And it did not
show any error either.
The moment the issue happened and then if checked through xfs_logprint
- the blocks free in the allocation group does not match up properly.
I checked by taking the print from xfs_free_ag_extent - it showed the
wrong blocks contigous to that file.
Original file which was to be deleted is of '255' blocks and started
from physical block number - 12. While the free space tree has free
blocks larger than what it should be. it returns the length starting
from block number and the length which is last committed to the log.
As I mentioned in the last mail - the details about the observation.
> So, the log and the repair output are useless for determining what
> caused the problem - you need the log from the mount *before* the
> first shutdown occurred, and to have run repair *before* you
> tried to unlink anything.
It is only after 'first shutdown' I came to know about the corruption.
Since, there was no error shown at the mount time, it did not seemed
reasonable enough to run "repair" on the disk.
But at the same time I checked with the case(reset while direct-IO),
when there was no issue - I tried to check the logs the same way as
- reset - log_print - check using xfs_repair and then mount - the
behaviour was as per the logs - The number of commit transactions and
the files created was ok.
Logs are attached for xfs_logprint output.
mount after taking the log prints.
#mount /dev/sdb1 /media/b
XFS mounting filesystem sda1
Starting XFS recovery on filesystem: sda1 (logdev: internal)
Ending XFS recovery on filesystem: sda1 (logdev: internal)
#> ls -li /media/b
131 -rwxr-xr-x 1 root 0 2887184 Jan 1 00:00 test_code
132 -rw-r--r-- 1 root 0 1044480 Jan 1 00:00
133 -rw-r--r-- 1 root 0 819200 Jan 1 00:00
I doubt if the journalling works correctly with the direct IO although
I need to check for proofs to say this. But, the logs and the
behaviour in the two scenarios shows the same indication.
IOWs, if you are doing power fail testing,
> you need to test the validity of your filesystems before you do
> anything else on them. e.g. Once powered back up, copy the log
> before mounting the filesystem, then mount it to replay the log,
> then unmount it and run xfs_repair -n to check it. That way you'll
> catch any problem caused by the power loss and have some hope of
> determining what caused it because you preserved the original
> However, seeing as this was due to power failure, I have to ask the
> obvious question: does you hardware correctly support barriers/cache
> flush/FUA operations and are they turned on?
I need to check and make sure of the above things you mentioned.
> Dave Chinner
Description: Binary data