xfs
[Top] [All Lists]

Re: Storage server, hung tasks and tracebacks

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Storage server, hung tasks and tracebacks
From: Brian Candler <B.Candler@xxxxxxxxx>
Date: Mon, 21 May 2012 10:58:30 +0100
Cc: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date:from:to :cc:subject:message-id:references:mime-version:content-type :in-reply-to; s=sasl; bh=fdf2/n+9KRWu2tX7ugeyuC42vfU=; b=YTWcMtv d9MQvy9XRXvAVJEKXKSkvd3GEQW7HrdO/UDWHwGmqiQMXttzS5dn/4yUxWimBAzi g3fOuOBCsXNWKCX/qlfSvX9vmR+5lU/mX1Nk+sKRK9rFW7oTKVh+oudFjc7g6x3j RVC2HvXMmhsa6j0RTJWXo+7WbEio6WEZKxKQ=
Domainkey-signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:from:to:cc :subject:message-id:references:mime-version:content-type :in-reply-to; q=dns; s=sasl; b=HYohquLi4JdEOyAdJlK4DavOoWPPNA4J6 qgaBaPXFz6Y0rjH9EniQeNOsVoDgKjNHp5bvLkvcZAy42poxIlq9EI7MSLpSEiGR GTBwaXW6OKZ9f27bv4nUDkksTi792SacUOWHyyfw+PQSpqMK4FhVGVC17xxz8cJg V0cgeH7JM0=
In-reply-to: <20120520235903.GB25351@dastard>
References: <20120502184450.GA2557@xxxxxxxx> <4FA27EF8.6040002@xxxxxxxxxxxxxxxxx> <20120503204157.GC4387@xxxxxxxx> <4FA3047D.8060908@xxxxxxxxxxxxxxxxx> <20120504163237.GA6128@xxxxxxxx> <4FA4C321.2070105@xxxxxxxxxxxxxxxxx> <20120515140237.GA3630@xxxxxxxx> <20120520235903.GB25351@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, May 21, 2012 at 09:59:03AM +1000, Dave Chinner wrote:
> You need to provide the output of sysrq-W at this point ('echo w >
> /proc/sysrq-trigger') so we can see where these are hung. the entire
> dmesg would also be useful....

Thank you for this advice Dave.

Attached is the full dmesg output after another hang. The sysrq output is
near the end, at timestamp 250695.

For this test, I built a fresh XFS filesystem (you can see this at timestamp
246909) - I forgot to mount with "inode64" this time, but it doesn't seem to
have made a difference.  I also did "swapoff" before starting the test, to
ensure that swapping to sda was not part of the problem.

A quick status summary from the hung system:

    root@storage3:~# free
                 total       used       free     shared    buffers     cached
    Mem:       8156224    8052824     103400          0       4808    7399112
    -/+ buffers/cache:     648904    7507320
    Swap:            0          0          0
    root@storage3:~# uptime
     10:46:47 up 2 days, 21:43,  1 user,  load average: 10.00, 9.81, 8.16
    root@storage3:~# ps auxwww | grep -v ' S'
    root        34  2.9  0.0      0     0 ?        D    May18 122:14 [kswapd0]
    root      1387  0.0  0.0  15976   504 ?        Ds   May18   0:18 
/usr/sbin/irqbalance
    root      5242  0.0  0.0      0     0 ?        D    09:39   0:02 
[xfsaild/md127]
    tomi      6249  4.2  0.0 378860  3844 pts/1    D+   09:40   2:48 bonnie++ 
-d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6251  4.1  0.0 378860  3836 pts/2    D+   09:40   2:44 bonnie++ 
-d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6253  4.1  0.0 378860  3848 pts/3    D+   09:40   2:46 bonnie++ 
-d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6255  4.0  0.0 378860  3840 pts/4    D+   09:40   2:40 bonnie++ 
-d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    root      7795  0.1  0.0      0     0 ?        D    10:27   0:02 
[kworker/0:3]
    root      8517  0.0  0.0  16876  1272 pts/0    R+   10:46   0:00 ps auxwww
    root     24420  0.0  0.0      0     0 ?        D    00:50   0:00 
[kworker/3:0]
    root@storage3:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
    [raid10] 
    md127 : active raid0 sds[17] sdx[22] sdj[8] sdt[18] sdk[9] sdc[1] sdb[0]
    sdh[6] sdu[19] sdi[7] sdn[12] sdo[13] sdv[20] sdm[11] sdq[15] sdp[14]
    sdl[10] sdw[21] sdg[5] sdr[16] sde[3] sdy[23] sdd[2] sdf[4]
          70326362112 blocks super 1.2 1024k chunks
          
    unused devices: <none>
    root@storage3:~# mount
    /dev/sda1 on / type ext4 (rw,errors=remount-ro)
    proc on /proc type proc (rw,noexec,nosuid,nodev)
    sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
    none on /sys/fs/fuse/connections type fusectl (rw)
    none on /sys/kernel/debug type debugfs (rw)
    none on /sys/kernel/security type securityfs (rw)
    udev on /dev type devtmpfs (rw,mode=0755)
    devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
    tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
    none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
    none on /run/shm type tmpfs (rw,nosuid,nodev)
    rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
    /dev/md127 on /disk/scratch type xfs (rw)
    root@storage3:~# df
    Filesystem       1K-blocks      Used   Available Use% Mounted on
    /dev/sda1        967415188  16754388   902241696   2% /
    udev               4069104         4     4069100   1% /dev
    tmpfs              1631248       380     1630868   1% /run
    none                  5120         0        5120   0% /run/lock
    none               4078112         0     4078112   0% /run/shm
    /dev/md127     70324275200 258902416 70065372784   1% /disk/scratch
    root@storage3:~# 

(Aside: when the test started the load average was just above 4, for the
four bonnie++ processes)

"iostat 5" shows zero activity to the MD RAID.

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.00    0.00    0.00    0.05    0.00   99.95

    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    sda               0.20         1.60         0.00          8          0
    sdf               0.00         0.00         0.00          0          0
    sde               0.00         0.00         0.00          0          0
    sdd               0.00         0.00         0.00          0          0
    sdc               0.00         0.00         0.00          0          0
    sdg               0.00         0.00         0.00          0          0
    sdh               0.00         0.00         0.00          0          0
    sdp               0.00         0.00         0.00          0          0
    sdj               0.00         0.00         0.00          0          0
    sdq               0.00         0.00         0.00          0          0
    sdk               0.00         0.00         0.00          0          0
    sdb               0.00         0.00         0.00          0          0
    sdl               0.00         0.00         0.00          0          0
    sdo               0.00         0.00         0.00          0          0
    sdm               0.00         0.00         0.00          0          0
    sdn               0.00         0.00         0.00          0          0
    sdi               0.00         0.00         0.00          0          0
    sdr               0.00         0.00         0.00          0          0
    sdu               0.00         0.00         0.00          0          0
    sdv               0.00         0.00         0.00          0          0
    sdw               0.00         0.00         0.00          0          0
    sdy               0.00         0.00         0.00          0          0
    sdx               0.00         0.00         0.00          0          0
    sds               0.00         0.00         0.00          0          0
    sdt               0.00         0.00         0.00          0          0
    md127             0.00         0.00         0.00          0          0

Anything you can determine from this info much appreciated!

Regards,

Brian.

Attachment: storage3-dmesg.txt.gz
Description: application/gunzip

<Prev in Thread] Current Thread [Next in Thread>