xfs
[Top] [All Lists]

Re: Still seeing hangs in xlog_grant_log_space

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Still seeing hangs in xlog_grant_log_space
From: Juerg Haefliger <juergh@xxxxxxxxx>
Date: Sat, 5 May 2012 09:44:35 +0200
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=nJymm17AsGe/fgfTohycqGuTqRa1G24aQF6PwAvXxI8=; b=vm5iEd3qssHZAbj5xNh8v6rZ+0j9ccYBYdGamf4Ow1oLPAE/hzSqzGLeJrWV3dY/Ya FZRANkgL4K88Mqc7pWpQCYlJf1/bqaRoPzyJn8IB/hhaNC0eez7EraYHcIjqE7esu70q G+dYNcaLLy78gaTISsgwj82YPHDuMncw+Q9/JyJBG0mVbQlcV0jCIqWV6CkQTY9AiJaX qY8zmqDQ8NcnMW/Zy5+JK9vbUaUIt06xMSBaBmjuvNdMJkjtVHMeEu9bM4FLqSKWmjzC 5nHZilKkw9jBDV4Z0m11oDHGLiSimcHHr+36EXziFewNaeJ+7W+DMXJEggQDR3DrP13X GVnw==
In-reply-to: <CADLDEKtUHAGcOPT1jtcvyJVk+zsoL5_thYFtHJYs+w=6EGuVSA@xxxxxxxxxxxxxx>
References: <20120423235840.GQ9541@dastard> <CADLDEKsfckBw2oVYFfaaTbpe8Ri+rYJr2e5SB7-pM0BU9nRUeA@xxxxxxxxxxxxxx> <20120424120731.GT9541@dastard> <CADLDEKs01GnxgYh2UTt1waVDUXHbB_RcBcUTBr5REFg5aD5jHA@xxxxxxxxxxxxxx> <20120425223845.GX9541@dastard> <CADLDEKvYkpUnMrqdMyqCmsYrZcUtiJ6ZRhrRu_ERTjn=r7M3Pg@xxxxxxxxxxxxxx> <20120426224412.GA9541@dastard> <CADLDEKs6oMDA-6OhmcFxyRoBVpduKtSput=53TQGn9NCAOXC1Q@xxxxxxxxxxxxxx> <20120426230738.GB9541@dastard> <CADLDEKuKLeYiqhQW0E9g_bS0VXoxPGPOck3N004Pxg4_Opbzow@xxxxxxxxxxxxxx> <20120427110922.GF9541@dastard> <CADLDEKtUHAGcOPT1jtcvyJVk+zsoL5_thYFtHJYs+w=6EGuVSA@xxxxxxxxxxxxxx>
>>> >> >> > I'm assuming it is the event trace
>>> >> >> > that is causing it to blow out? If so, just the 30-60s either side 
>>> >> >> > of
>>> >> >> > the hang first showing up is probaby necessary, and that should cut
>>> >> >> > the size down greatly....
>>> >> >>
>>> >> >> Can I shorten the existing trace.dat?
>>
>> Looks like you can - the "trace-cmd split" option.
>>
>>> >> >
>>> >> > No idea, but that's likely the problem - I don't want the binary
>>> >> > trace.dat file. I want the text output of the report command
>>> >> > generated from the binary trace.dat file...
>>> >>
>>> >> Well yes. I did RTFM :-) trace.dat is 15GB.
>>> >
>>> > OK, that's a lot larger than I expected for a hung filesystem....
>>> >
>>> >> >> I stopped the trace
>>> >> >> automatically 10 secs after the the xlog_... trace showed up in syslog
>>> >> >> so effectively some 130+ secs after the hang occured.
>>> >
>>> > Can you look at the last timestamp in the report file, and trim off
>>> > anything from the start that is older than, say, 180s before that?
>>>
>>> Cut the trace down to 180 secs which brought the filesize down to
>>> 93MB: 
>>> https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-24-180secs.tgz
>>
>> I see the problem - the trace.dat file is hosted on an XFS
>> filesystem, so all the writes to the trace.dat file are causing
>> events to be logged, which causes writes to the trace.dat file....
>>
>> taking out al the trace-cmd events:
>>
>> $ grep -v trace-cmd trace_report_180secs.txt > t.t
>> $ ls -l trace_report_180secs.txt t.t
>> -rw-r--r-- 1 dave dave 2136941443 Apr 27 18:52 trace_report_180secs.txt
>> -rw-r--r-- 1 dave dave    3280629 Apr 27 20:12 t.t
>>
>> Brings the event trace for that 180s down ifrom 2.1GB to 3.2MB,
>> which is much more like I'd expect from a hung filesystem....
>>
>> Ok, so it looks like there's lots of noise from other XFS
>> filesystems to, and from the info.log, the xfs-hang filesystem is on
>> device 252:2 (/dev/vg00/tmp):
>>
>> $ grep "dev 252:2" t.t
>> $
>>
>> And there are no events from that filesystem in the log at all. Ok,
>> so what you need to do is find out if there are *any* events from
>> that device in the larger log file you have.....
>>
>> If not, then it is time for advanced trace-cmd mojo. We can tell it
>> to filter events only from the PID of the test script and all it's
>> children using:
>>
>> # trace-cmd record -e xfs\* -P <parent-pid> -c
>>
>> But better would be to use the device number of the relevant
>> filesystem to filter the events. The device is 252:2, which means in
>> kernel terms is it:
>>
>>        dev = (major << 20) | minor
>>            = 0xfc00002
>>
>> So you should be able to get just the xfs-hang events via:
>>
>> # trace-cmd record -e xfs\* -d 'dev == 0xfc00002'
>>
>> and as long as you don't host log files on /xfs-hang, it'll only
>> record the workload running on the xfs-hang filesystem.
>
> Third try: 
> https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-27-180secs.tgz
> Filtered by device, trace events go to a different filesystem.

Did anybody have a chance to look at the data?

Thanks
...Juerg


>
>> BTW, how often do you see this sort of thing:
>>
>> [  220.571551] ------------[ cut here ]------------
>> [  220.571562] WARNING: at fs/inode.c:280 drop_nlink+0x49/0x50()
>> [  220.571564] Hardware name: SE2170s
>> [  220.571565] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler 
>> ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat 
>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT 
>> xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables 
>> bridge 8021q garp stp coretemp ghash_clmulni_intel aesni_intel cryptd usbhid 
>> i7core_edac lp edac_core hid aes_x86_64 parport serio_raw microcode xfs igb 
>> hpsa dca
>> [  220.571594] Pid: 4637, comm: copy-files Not tainted 3.4.0-rc4 #2
>> [  220.571595] Call Trace:
>> [  220.571603]  [<ffffffff810508cf>] warn_slowpath_common+0x7f/0xc0
>> [  220.571605]  [<ffffffff8105092a>] warn_slowpath_null+0x1a/0x20
>> [  220.571607]  [<ffffffff81193319>] drop_nlink+0x49/0x50
>> [  220.571628]  [<ffffffffa00701ef>] xfs_droplink+0x2f/0x60 [xfs]
>> [  220.571640]  [<ffffffffa0072d58>] xfs_remove+0x2e8/0x3c0 [xfs]
>> [  220.571645]  [<ffffffff8163aeee>] ? _raw_spin_lock+0xe/0x20
>> [  220.571656]  [<ffffffffa0068248>] xfs_vn_unlink+0x48/0x90 [xfs]
>> [  220.571659]  [<ffffffff8118684f>] vfs_unlink+0x9f/0x100
>> [  220.571662]  [<ffffffff811893ef>] do_unlinkat+0x1af/0x1e0
>> [  220.571668]  [<ffffffff810a8eab>] ? sys_futex+0x7b/0x180
>> [  220.571670]  [<ffffffff8118a9a6>] sys_unlink+0x16/0x20
>> [  220.571675]  [<ffffffff816431a9>] system_call_fastpath+0x16/0x1b
>>
>> You might want to run xfs-repair over your filesystems to find out
>> how many inodes have bad link counts....
>
> First time I saw it was when I started using 3.4-rc4. I repaired the
> fs before I rerun the test that produced the above data.
>
> ...Juerg
>
>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>