xfs
[Top] [All Lists]

testcase 011 trips and ASSERT in x86_64 too

To: XFS Mailing List <xfs@xxxxxxxxxxx>
Subject: testcase 011 trips and ASSERT in x86_64 too
From: Chandra Seetharaman <sekharan@xxxxxxxxxx>
Date: Fri, 11 Mar 2011 19:06:06 -0800
Organization: IBM
Reply-to: sekharan@xxxxxxxxxx
Hello,

A while back I reported that the test case 011 trips an ASSERT on POWER
architecture, but not in x86_64.

I started comparing the code and quickly realized that the problem is
_not_ arch specific, but could make the test case 011 fail, with reduced
log on x86_64. But, I could make the POWER not fail by simply increasing
the file system size to 100G (from 20G).

After some debug I found that I get into this racy situation when the
free threshold drops and we flush the log buffer to the disk.
i.e in function xlog_grant_push_ail(), if we return at

       if (free_blocks >= free_threshold)
                return;
we do not get into the race that trips the ASSERT.

Then I started comparing the behavioral difference bet the two ARCHs,
and I found that in POWER I see more number of threads at a time (max of
4 threads) in the function xlog_grant_log_space(), whereas in x86_64 I
see max of only two and mostly it is only one.

I also noted that in POWER test case 011 takes about 8 seconds whereas
in x86_64, it takes about 165 seconds.

So, I ventured into the core of test case 011, dirstress, and found that
simply creating 1000s of files under a directory takes very long time in
x86_64 compare to POWER(1 min 15s Vs 2s)
Note: Attached is the source file (stripped version of dirstress.c) for
the program b.
------------------POWER----------------------------------
root@test135 chandra]# uname -a 
Linux test135.beaverton.ibm.com 2.6.38-rc7 #1 SMP Fri Mar 4 09:36:14 PST
2011 ppc64 ppc64 ppc64 GNU/Linux
[root@test135 chandra]# grep -e xfs -e home /proc/mounts
none /selinux selinuxfs rw,relatime 0 0
/dev/mapper/vg_test135-lv_home /home ext4
rw,seclabel,relatime,barrier=1,data=ordered 0 0
/dev/sda8 /mnt/xfsMntPt xfs rw,seclabel,relatime,attr2,noquota 0 0
[root@test135 chandra]# ###### Run test on XFS filesystem
[root@test135 chandra]# time ./b /mnt/xfsMntPt/dir 10000 1
i 0

real    0m2.055s
user    0m0.011s
sys     0m0.732s
[root@test135 chandra]# ###### Run test of ext4 filesystem
[root@test135 chandra]# time ./b /home/dir 10000 1
i 0

real    0m0.355s
user    0m0.009s
sys     0m0.304s
--------------------x86_64----------------------------------------
[root@test27 chandra]# uname -a
Linux test27 2.6.38-rc7 #4 SMP Wed Mar 9 08:37:32 PST 2011 x86_64 x86_64
x86_64 GNU/Linux
[root@test27 chandra]# grep -e xfs -e home /proc/mounts
none /selinux selinuxfs rw,relatime 0 0
/dev/sdc3 /home ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
/dev/sdb1 /mnt/xfsMntPt xfs rw,seclabel,relatime,attr2,noquota 0 0
[root@test27 chandra]# ###### Run test on XFS filesystem
[root@test27 chandra]# time ./b /mnt/xfsMntPt/dir 10000 1
i 0

real    1m15.700s
user    0m0.030s
sys     0m1.679s
[root@test27 chandra]# ###### Run test of ext4 filesystem
[root@test27 chandra]# time ./b /home/dir 10000 1
i 0

real    0m0.317s
user    0m0.010s
sys     0m0.306s
-------------------------------------------------------------------

After quite an amount of debug I found that I can make it trip the
ASSERT in x86_64 also, if I start sufficient of threads accessing the
file system. Basically, "./b /mnt/xfsMntPt/dir 100 100" trips the
ASSERT.

I have two questions:

1. Does anybody have any explanation why x86_64 is so slow, compared
with POWER ?

2. Any suggestions on how to debug and fix the race condition ? 

Thanks & Regards,

chandra

Attachment: b.c
Description: Text Data

<Prev in Thread] Current Thread [Next in Thread>