From david@fromorbit.com Sun May 1 02:58:19 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p417wJGW163639 for ; Sun, 1 May 2011 02:58:19 -0500 X-ASG-Debug-ID: 1304236912-074c02750000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E6B271571DB5 for ; Sun, 1 May 2011 01:01:52 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id O2tyFgjSjfAuIFWB for ; Sun, 01 May 2011 01:01:52 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhAEAEYQvU15LHHJgWdsb2JhbACmGBUBARYmJcQiDoVyBJ0t Received: from ppp121-44-113-201.lns20.syd6.internode.on.net (HELO dastard) ([121.44.113.201]) by ipmail06.adl6.internode.on.net with ESMTP; 01 May 2011 17:31:50 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGRbB-0008Di-LR; Sun, 01 May 2011 18:01:49 +1000 Date: Sun, 1 May 2011 18:01:49 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110501080149.GD13542@dastard> References: <20110427022655.GE12436@dastard> <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304236913 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62443 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, Apr 29, 2011 at 05:17:53PM -0700, Christian Kujau wrote: > On Fri, 29 Apr 2011 at 22:17, Markus Trippelsdorf wrote: > > I could be the hrtimer bug again. Would you try to reproduce the issue > > with this patch applied? > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > With that patch applied, the OOm killer still kicks in, this time the OOM > messages were written to the syslog agian: > > http://nerdbynature.de/bits/2.6.39-rc4/oom/ > (The -9 files are the current ones) > > Also, this time xfs did not show up in the backtrace: > > ssh invoked oom-killer: gfp_mask=0x44d0, order=2, oom_adj=0, oom_score_adj=0 > Call Trace: > [c22bfae0] [c0009d30] show_stack+0x70/0x1bc (unreliable) > [c22bfb20] [c009cd3c] T.545+0x74/0x1d0 > [c22bfb70] [c009cf6c] T.543+0xd4/0x2a0 > [c22bfbb0] [c009d3b4] out_of_memory+0x27c/0x360 > [c22bfc00] [c00a199c] __alloc_pages_nodemask+0x6f8/0x708 > [c22bfca0] [c00a19c8] __get_free_pages+0x1c/0x44 > [c22bfcb0] [c00d283c] __kmalloc_track_caller+0x1c0/0x1dc > [c22bfcd0] [c036ff1c] __alloc_skb+0x74/0x140 > [c22bfd00] [c0369b08] sock_alloc_send_pskb+0x23c/0x37c > [c22bfd70] [c03e8974] unix_stream_sendmsg+0x354/0x478 > [c22bfde0] [c0364118] sock_aio_write+0x170/0x180 > [c22bfe50] [c00d580c] do_sync_write+0xb8/0x144 > [c22bfef0] [c00d68d0] vfs_write+0x1b8/0x1c0 > [c22bff10] [c00d6a10] sys_write+0x58/0xc8 > [c22bff40] [c00127d4] ret_from_syscall+0x0/0x38 > --- Exception: c01 at 0x2044cc14 Doesn't need to have XFS in the stack trace - the inode cache is consuming all of low memory. Indeed, I wonder if that is the problem - this is a highmem configuration where there is 450MB of highmem free, and very little lowmem free which is considered "all unreclaimable". The lowmem zone: Apr 29 15:59:10 alice kernel: [ 3834.754358] DMA free:64704kB min:3532kB low:4412kB high:5296kB active_anon:0kB inactive_anon:0kB active_file:132kB inactive_file:168kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:780288kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:639680kB slab_unreclaimable:41652kB kernel_stack:1128kB pagetables:1788kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:516 all_unreclaimable? yes I really don't know why the xfs inode cache is not being trimmed. I really, really need to know if the XFS inode cache shrinker is getting blocked or not running - do you have those sysrq-w traces when near OOM I asked for a while back? It may be that the zone reclaim is simply fubar because slab cache reclaim is proportional to the number of pages scanned on the LRU. With most of the cached pages in the highmem zone, the lowmem zone scan only scanned 516 pages. I can't see it freeing many inodes (there's >600,000 of them in memory) based on such a low page scan number. Maybe you should tweak /proc/sys/vm/vfs_cache_pressure to make it reclaim vfs structures more rapidly. It might help, but I'm starting to think that this problem is actually a VM zone reclaim balance problem, not an XFS problem as such.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Sun May 1 03:46:16 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p418kFAf165366 for ; Sun, 1 May 2011 03:46:16 -0500 X-ASG-Debug-ID: 1304239776-3abf01a80000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 8231B1573E2C for ; Sun, 1 May 2011 01:49:36 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id zyBv1IGrb8QID72S for ; Sun, 01 May 2011 01:49:36 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqwEAFAevU15LHHJgWdsb2JhbACmGRUBARYmJYhxulQOhXIEnS0 Received: from ppp121-44-113-201.lns20.syd6.internode.on.net (HELO dastard) ([121.44.113.201]) by ipmail06.adl6.internode.on.net with ESMTP; 01 May 2011 18:19:20 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGSL9-0008H6-Av; Sun, 01 May 2011 18:49:19 +1000 Date: Sun, 1 May 2011 18:49:19 +1000 From: Dave Chinner To: Peter Grandi Cc: Linux fs XFS X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem Message-ID: <20110501084919.GE13542@dastard> References: <4DB72084.8020205@inf.ethz.ch> <4DB74331.3030804@hardwarefreak.com> <4DB75C6D.1080901@inf.ethz.ch> <19898.53907.842827.480883@tree.ty.sabi.co.UK> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <19898.53907.842827.480883@tree.ty.sabi.co.UK> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304239777 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62447 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, Apr 29, 2011 at 04:00:35PM +0100, Peter Grandi wrote: > > [ ... ] > > > On my raid-1 ext3, extracting a kernel archive: > [ ... ] > > real 0m21.769s > [ ... ] > > real 2m20.522s > > > This is of course with delaylog enabled. I don't think a > > difference of a factor 7 is normal, given that writing to a > > raid-0 (xfs numbers) is supposed to be faster than writing to > > raid-1 (ext3 numbers) > > Indeed, and as some other commenters have tried to explain, in > most cases the wrong number is the one for 'ext3' on RAID1 (way > too small). Even the number for XFS and RAID0 'delaylog' is a > wrong number (somewhat small) in many cases. > > There are 38000 files in 440MB in 'linux-2.6.38.tar', ~40% of > them are smaller than 4KiB and ~60% smaller than 8KiB. Also you > didn't flush caches, and you don't say whether the filesystems > are empty or full or at the same position on the disk. > > Can 'ext3' really commit 1900 small files per second (including > directory updates) to a filesystem on a RAID1 that probably can > do around 100 IOPS? That would be amazing news. Of course it can. Why? Because the allocator is optimised to pack small files written at the same time together on disk, and the elevator will merge them into one large IO when they are finally written to disk. With a typical 512k max IO size, that's 128 <=4k files packed into each IO, In a perfect world, we're talking about ~13000 4k files a second being written to disk @ 100 IOPS. In the real world, writing an order of magnitude less files per second is quite obtainable. Even XFS enables that same optimisation by truncating away speculative allocation when the file is closed so that when writeback comes along delayed allocation packs the data blocks belonging to different files tightly within the AG. Such optimisations are not new - they've been used in some form for as long as spinning media has been around.... > Despite decades of seeing it happen, I keep being astonished by > how many people (some with decades of "experience") just don't > understand IOPS and metadata and commits and caching and who Oh, the irony.... :) Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Sun May 1 03:49:15 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p418nEeH165481 for ; Sun, 1 May 2011 03:49:14 -0500 X-ASG-Debug-ID: 1304239968-1b2b027e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 166DF15D06BF for ; Sun, 1 May 2011 01:52:48 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id HGSQeB6hsmDKF0gI for ; Sun, 01 May 2011 01:52:48 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqwEAFAevU15LHHJgWdsb2JhbACmGRUBARYmJcNFDoVyBJ0t Received: from ppp121-44-113-201.lns20.syd6.internode.on.net (HELO dastard) ([121.44.113.201]) by ipmail06.adl6.internode.on.net with ESMTP; 01 May 2011 18:22:47 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGSOU-0008HT-RP; Sun, 01 May 2011 18:52:46 +1000 Date: Sun, 1 May 2011 18:52:46 +1000 From: Dave Chinner To: Martin Steigerwald Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem Message-ID: <20110501085246.GF13542@dastard> References: <4DB72084.8020205@inf.ethz.ch> <20110427023534.GF12436@dastard> <201104291827.35801.Martin@lichtvoll.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201104291827.35801.Martin@lichtvoll.de> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304239970 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62446 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, Apr 29, 2011 at 06:27:34PM +0200, Martin Steigerwald wrote: > Am Mittwoch, 27. April 2011 schrieb Dave Chinner: > > On Tue, Apr 26, 2011 at 09:44:04PM +0200, Benjamin Schindler wrote: > > > Hi > > > > > > Since upgrading to newer kernels I have serious problems with xfs > > > performance on my root fs. > > > It runs on a software raid 0 with 2 disks. On the same two disks, > > > there are two more partitions running a software raid-1 with ext3. > > > On the ext3 system, I have no issue, so I assume the drives are > > > fine. > > > But on the xfs filesystem, extracting a linux kernel archive takes 5 > > > minutes or more, running ldconfig similarily long. The harddrives are > > > sata-2. > > > I'm running gentoo linux with kernel 2.6.38-gentoo-r1. I'm attaching > > > the kernel config but I guess more info is needed - just let me know > > > what is needed. > > > > more than likely your problem is that barriers have been enabled for > > MD/DM devices on the new kernel, and they aren't on the old kernel. > > XFS uses barriers by default, ext3 does not. Hence XFS performance > > will change while ext3 will not. Check dmesg output when mounting > > the filesystems on the different kernels. > > But didn't 2.6.38 replace barriers by explicit flushes the filesystem has to > wait for - mitigating most of the performance problems with barriers? IIRC, it depends on whether the hardware supports FUA or not. If it doesn't then device cache flushes are used to emulate FUA and so performance can still suck. Christoph will no doubt correct me if I got that wrong ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com From eflorac@intellique.com Sun May 1 04:08:25 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4198Owc166950 for ; Sun, 1 May 2011 04:08:25 -0500 X-ASG-Debug-ID: 1304241117-6db3007b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp3-g21.free.fr (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 391FD1C0E897 for ; Sun, 1 May 2011 02:11:58 -0700 (PDT) Received: from smtp3-g21.free.fr (smtp3-g21.free.fr [212.27.42.3]) by cuda.sgi.com with ESMTP id 3piHDWCyq1BPRQER for ; Sun, 01 May 2011 02:11:58 -0700 (PDT) Received: from galadriel2.home (unknown [82.235.234.79]) by smtp3-g21.free.fr (Postfix) with ESMTP id A597AA6252; Sun, 1 May 2011 11:11:53 +0200 (CEST) Date: Sun, 1 May 2011 11:11:52 +0200 From: Emmanuel Florac To: Stan Hoeppner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: RAID6 r-m-w, op-journaled fs, SSDs Subject: Re: RAID6 r-m-w, op-journaled fs, SSDs Message-ID: <20110501111152.5913b9c5@galadriel2.home> In-Reply-To: <4DBC68DA.1090708@hardwarefreak.com> References: <19900.10868.583555.849181@tree.ty.sabi.co.UK> <20110430180213.6dcfc41c@galadriel2.home> <4DBC68DA.1090708@hardwarefreak.com> Organization: Intellique X-Mailer: Claws Mail 3.7.8 (GTK+ 2.20.1; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: smtp3-g21.free.fr[212.27.42.3] X-Barracuda-Start-Time: 1304241120 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62447 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Le Sat, 30 Apr 2011 14:54:02 -0500 vous =C3=A9criviez: > Just having write back cache isn't magic by itself. The cache=20 > management algorithm and configuration thereof are often as > important, if not more, than the total cache size on the RAID HBA or > SAN controller. >=20 > Poor cache management, I'd guess, is one reason why you see Areca > RAID cards with 1-4GB cache DRAM whereas competing cards w/ similar=20 > price/performance/features from LSI, Adaptec, and others sport 512MB. Yes, probably. To give some meat to the argument : Using XFS mounted nobarrier on an 8 drives RAID-6 array with WB cache : 30000 journal (file creation/deletion) operations/s Using XFS with barriers on the same RAID : 7000 journal operations/s Using XFS nobarrier with WT cache : 700 journal op/s. --=20 ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | | +33 1 78 94 84 02 ------------------------------------------------------------------------ From eflorac@intellique.com Sun May 1 04:10:59 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p419Ax5s167025 for ; Sun, 1 May 2011 04:10:59 -0500 X-ASG-Debug-ID: 1304241272-318f02dd0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp3-g21.free.fr (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 15DAC1C1113E for ; Sun, 1 May 2011 02:14:33 -0700 (PDT) Received: from smtp3-g21.free.fr (smtp3-g21.free.fr [212.27.42.3]) by cuda.sgi.com with ESMTP id 8cqgIA6eAhND0tpP for ; Sun, 01 May 2011 02:14:33 -0700 (PDT) Received: from galadriel2.home (unknown [82.235.234.79]) by smtp3-g21.free.fr (Postfix) with ESMTP id 66274A6334; Sun, 1 May 2011 11:14:28 +0200 (CEST) Date: Sun, 1 May 2011 11:14:26 +0200 From: Emmanuel Florac To: Michael Monnerie Cc: xfs@oss.sgi.com, Stan Hoeppner X-ASG-Orig-Subj: Re: RAID6 r-m-w, op-journaled fs, SSDs Subject: Re: RAID6 r-m-w, op-journaled fs, SSDs Message-ID: <20110501111426.2ea5ac37@galadriel2.home> In-Reply-To: <201104302350.32287@zmi.at> References: <19900.10868.583555.849181@tree.ty.sabi.co.UK> <20110430180213.6dcfc41c@galadriel2.home> <4DBC68DA.1090708@hardwarefreak.com> <201104302350.32287@zmi.at> Organization: Intellique X-Mailer: Claws Mail 3.7.8 (GTK+ 2.20.1; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: smtp3-g21.free.fr[212.27.42.3] X-Barracuda-Start-Time: 1304241275 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0001 1.0000 -2.0205 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62447 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Le Sat, 30 Apr 2011 23:50:31 +0200 vous =C3=A9criviez: > Just for documentation if someone sees slow I/O on Areca. More > spindles rock. That server had 8x 10krpm WD Raptor 150G drives by the > time. As a side note, VMs typically creates lots of random small IOs, and perform quite poorly on RAID-6 arrays. --=20 ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | | +33 1 78 94 84 02 ------------------------------------------------------------------------ From david@fromorbit.com Sun May 1 04:24:28 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_41 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p419OSdP167502 for ; Sun, 1 May 2011 04:24:28 -0500 X-ASG-Debug-ID: 1304242081-3abe03360000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 1547B15719D8 for ; Sun, 1 May 2011 02:28:01 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id OIF1gfe3cM89OP6u for ; Sun, 01 May 2011 02:28:01 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqwEAFwlvU15LHHJgWdsb2JhbACmGRUBARYmJYhxHLoADoVyBJ0t Received: from ppp121-44-113-201.lns20.syd6.internode.on.net (HELO dastard) ([121.44.113.201]) by ipmail06.adl6.internode.on.net with ESMTP; 01 May 2011 18:58:00 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGSwY-0008KI-N2; Sun, 01 May 2011 19:27:58 +1000 Date: Sun, 1 May 2011 19:27:58 +1000 From: Dave Chinner To: Peter Grandi Cc: Linux fs XFS , Linux fs JFS X-ASG-Orig-Subj: Re: op-journaled fs, journal size and storage speeds Subject: Re: op-journaled fs, journal size and storage speeds Message-ID: <20110501092758.GG13542@dastard> References: <19900.8703.214676.218477@tree.ty.sabi.co.UK> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <19900.8703.214676.218477@tree.ty.sabi.co.UK> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304242083 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.92 X-Barracuda-Spam-Status: No, SCORE=-1.92 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_SC0_SA085, BSF_SC5_SA210e X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62449 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 BSF_SC0_SA085 Custom Rule SA085 0.00 BSF_SC5_SA210e Custom Rule SA210e X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sat, Apr 30, 2011 at 03:51:43PM +0100, Peter Grandi wrote: > Been thinking about journals and RAID6s and SSDs. > > In particular for file system designs like JFS and XFS that do > operation journaling (while ext[34] do block journaling). XFS is not an operation journalling filesystem. Most of the metadata is dirty-region logged via buffers, just like ext3/4. Perhaps you need to read some documentation like this: http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Operation_Based_Logging > The issue is: journal size? > > It seems to me that adopting as guideline a percent of the > filesystem is very wrong, and so I have been using a rule of > thumb like one second of expected transfer rate, so "in flight" > updates are never much behind. How do you know what "one second" of "in flight" operations is going to be? I had to deal with this in XFS when implementing the delayed logging code. It uses a number of operations or a percentage of log space to determine when to checkpoint the modifications, and that is typically load dependent as to when it triggers. And then you've got the problem of concurrency - one second of a single threaded workload is much different to one second of the same workload spread across 20 CPU cores. You need to have limits that work well in both cases, and structures that scale to that level of concurrency. In reality, there's not much point in trying to calculate what one second's worth of metadata is going to be - more often that not you'll hit some other limitation in the journal subsystem, run out of memory or have to put limits in place anyway to avoid latency problems. Easiest and most reliable method seems to be to size your journal appropriatly in the first place and have you algorithms key off that.... > But even at a single disk *sequential* transfer rate of say > 80MB/s average, a journal that contains operation records could > conceivably hold dozens if not hundreds of thousands of pending > metadata updates, probably targeted at very widely scattered > locations on disk, and playing a journal fully could take a long > time. 17 minutes is my current record by crashing a VM during a chmod -R operation over a 100 million inode filesystem. That was on a ~2GB log (maximum supported size). http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Reducing_Recovery_Time > So the idea would be that the relevant transfer rate would be > the *random* one, and since that is around 4MB/s per single > disk, journal sizes would end up pretty small. But many people > allocate very large (at least compared to that) journals. > > This seems to me a fairly bad idea, because then the journal > becomes a massive hot spot on the disk and draws the disk arm > like black hole. I suspect that operations should not stay on That's why you can configure an external log.... > the journal for a long time. However if the journal is too small > processes that do metadata updates start to hang on it. Well, yes. The journal needs to be large enough to hold all the transaction reservations for the active transactions. XFS, in the worse case for a default filesystem config, needs about 100MB of log space per 300 concurrent transactions. Increasing transaction concurrency was the main reason we increased the log size... > So some questions for which I have guesses but not good answers: > > * What should journal size be proportional to? Your workload. > * What is the downside of a too small journal? Performance sucks. > * What is the downside of a too large journal other than space? Recovery times too long, lots of outstanding metadata pinned in memory (hello OOM-killer!), and other resource management related scalability issues. > Again I expect answers to be very different for ext[34] but I am > asking for operation-journaling file system designs like JFS and > XFS. > BTW, another consideration is that for filesystems that are > fairly journal-intensive, putting the journal on a low traffic > storage device can have large benefits. Yeah, nobody ever thought of an external log before.... :) > But if they can be pretty small, I wonder whether putting the > journals of several filesystems on the same storage device then > becomes a sensible option as the locality will be quite narrow > (e.g. a single physical cylinder) or it could be wortwhile like > the database people do to journal to battery-backed RAM. Got a supplier for the custom hardware you'd need? Just use a PCIe SSD.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Sun May 1 04:33:27 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p419XRmu167927 for ; Sun, 1 May 2011 04:33:27 -0500 X-ASG-Debug-ID: 1304242621-2af3036d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 9D47015D06EB for ; Sun, 1 May 2011 02:37:01 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id SD6x9iHvwHuVhTxH for ; Sun, 01 May 2011 02:37:01 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqwEANsovU15LHHJgWdsb2JhbACmGRUBARYmJYhxuhcOhXIEnS0 Received: from ppp121-44-113-201.lns20.syd6.internode.on.net (HELO dastard) ([121.44.113.201]) by ipmail06.adl6.internode.on.net with ESMTP; 01 May 2011 19:07:01 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGT5H-0008Lh-Hk; Sun, 01 May 2011 19:36:59 +1000 Date: Sun, 1 May 2011 19:36:59 +1000 From: Dave Chinner To: Peter Grandi Cc: Linux RAID , Linux fs XFS , Linux fs JFS X-ASG-Orig-Subj: Re: RAID6 r-m-w, op-journaled fs, SSDs Subject: Re: RAID6 r-m-w, op-journaled fs, SSDs Message-ID: <20110501093659.GH13542@dastard> References: <19900.10868.583555.849181@tree.ty.sabi.co.UK> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <19900.10868.583555.849181@tree.ty.sabi.co.UK> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304242622 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62450 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sat, Apr 30, 2011 at 04:27:48PM +0100, Peter Grandi wrote: > Regardless, op-journaled file system designs like JFS and XFS > write small records (way below a stripe set size, and usually > way below a chunk size) to the journal when they queue > operations, XFS will write log-stripe-unit sized records to disk. If the log buffers are not full, it pads them. Supported log-sunit sizes are up to 256k. > even if sometimes depending on design and options > may "batch" the journal updates (potentially breaking safety > semantics). Also they do small write when they dequeue the > operations from the journal to the actual metadata records > involved. > > How bad can this be when the journal is say internal for a > filesystem that is held on wide-stride RAID6 set? I suspect very > very bad, with apocalyptic read-modify-write storms, eating IOPS. Not bad at all, because the journal writes are sequential, and XFS can have multiple log IOs in progress at once (up to 8 x 256k = 2MB). So in general while metadata operations are in progress, XFS will fill full stripes with log IO and you won't get problems with RMW. > Where are studies or even just impressions of anedoctes on how > bad this is? Just buy decent RAID hardware with a BBWC and journal IO does not hurt at all. > Are there instrumentation tools in JFS or XFS that may allow me > to watch/inspect what is happening with the journal? For Linux > MD to see what are the rates of stripe r-m-w cases? XFS has plenty of event tracing, including all the transaction reservation and commit accounting in it. And if you know what you are looking for, you can see all the log IO and transaction completion processing in the event traces, too. Cheers, Dave. -- Dave Chinner david@fromorbit.com From pg_mh@sabi.co.UK Sun May 1 08:30:29 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,J_CHICKENPOX_28, J_CHICKENPOX_31,J_CHICKENPOX_43 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41DUTG1177575 for ; Sun, 1 May 2011 08:30:29 -0500 X-ASG-Debug-ID: 1304256843-1f4001140000-ps1ADW X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hermes2.dur.ac.uk (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id CB82B1E19727 for ; Sun, 1 May 2011 06:34:03 -0700 (PDT) Received: from hermes2.dur.ac.uk (hermes2.dur.ac.uk [129.234.248.2]) by cuda.sgi.com with ESMTP id GAr2ZLB9WSj2tNQC for ; Sun, 01 May 2011 06:34:03 -0700 (PDT) Received: from smtphost1.dur.ac.uk (smtphost1.dur.ac.uk [129.234.252.1]) by hermes2.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41DXf3c004467; Sun, 1 May 2011 14:33:45 +0100 Received: from ty.sabi.co.UK (o1.phyip3.dur.ac.uk [129.234.186.1]) by smtphost1.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41DXNMh025439 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Sun, 1 May 2011 14:33:24 +0100 Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.71 #1) id 1QGWlw-0004Wd-PQ; Sun, 01 May 2011 14:33:16 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Message-ID: <19901.24858.153461.377763@tree.ty.sabi.co.UK> Date: Sun, 1 May 2011 14:33:14 +0100 Precedence: air-mail To: Linux fs XFS , Linux fs JFS X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem In-Reply-To: <4DB75C6D.1080901@inf.ethz.ch> References: <4DB72084.8020205@inf.ethz.ch> <4DB74331.3030804@hardwarefreak.com> <4DB75C6D.1080901@inf.ethz.ch> X-Mailer: VM 8.0.13 under 23.1.1 (x86_64-pc-linux-gnu) From: pg_xf2@xf2.for.sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-DurhamAcUk-MailScanner: Found to be clean, Found to be clean X-DurhamAcUk-MailScanner-ID: p41DXf3c004467 X-Barracuda-Connect: hermes2.dur.ac.uk[129.234.248.2] X-Barracuda-Start-Time: 1304256844 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=ISO2022JP_CHARSET X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62465 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 ISO2022JP_CHARSET ISO-2022-JP message X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean [ ... ] > I thought I would do a real measurement to have some numbers. > On my raid-1 ext3, extracting a kernel archive: > benjamin@metis ~/software $ time tar xfj > /usr/portage/distfiles/linux-2.6.38.tar.bz2 > real 0m21.769s > user 0m13.905s > sys 0m1.751s That's a "real measurement" of *something*, and it does give "some numbers", but to me the numbers are not that interesting as it is far from clear what they are about. So I happen to have an otherwise totally unused fastish contemporary 500GB disk and laptop for a measurement of something that might be better defined, a bit simplemindedly, but taking care about a few details (see also appended setup details), so that the numbers be about as good as possible (YMMV). First with 'ext3': % mount -t ext3 -o relatime /dev/sdb /mnt/sdb % df -BM /mnt/sdb Filesystem 1M-blocks Used Available Use% Mounted on /dev/sdb 469455M 687M 444922M 1% /mnt/sdb % df -i /mnt/sdb Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb 30531584 38100 30493484 1% /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 12m49.610s user 0m0.990s sys 0m8.610s That's like 570KB/s and 50 files/s, in more or less optimal conditions. Not so good for 'ext3', which indeed is well known for appalling small file/metadata write performance, but the order-of-magnitude of the results is the plausible one. XFS with 'delaylog' does worse, but then it has a difference tradeoff envelope: % mount -t xfs -o relatime,delaylog /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 24m4.282s user 0m1.260s sys 0m14.030s I also tried with JFS and it is faster at 1MB/s and 90 files/s which is pretty good (and I suspect that JFS may be cheating slightly on the semantics, but I know about its on-disk structure and twice as fast as 'ext3' is plausible): % mount -t jfs -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 6m56.508s user 0m1.000s sys 0m7.130s Consolation notes :-) ===================== Naturally the real (and arguably rather more meaningful than others) measurements above will be baffling those described here: [ ... ] many people (some with decades of "experience") just don't understand IOPS and metadata and commits and caching and who think "performance" is whatever number they can get with their clever "benchmarks". So as a consolation prize to them let's rerun with entirely different semantics but still taking a bit of care: % mount -t ext3 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar -no-fsync; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m27.414s user 0m0.270s sys 0m2.430s Oh gosh, it looks like much better "performance"! 'ext3' really rises and shines with contiguous large IOs! :-) And similarly for XFS: % mount -t xfs -o relatime,delaylog /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar -no-fsync; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m33.849s user 0m0.310s sys 0m2.960s % mount -o relatime /dev/sdb /mnt/sdb And JFS is quite similar too: % mount -t jfs -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar -no-fsync; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m35.191s user 0m0.380s sys 0m2.920s Journaling notes ================ So there. I apologize to the readers who "understand IOPS and metadata and commits and caching" (and who may have read the man-page for 'star') who will be bored with the beginner-level nature of the points made above. But I am actually a bit surprised disappointed with the "really" numbers above because I would expected something more like 2-3 minutes duration or 2-4 files/s per IOPS, but I guess such are the horrors of seeking crazily between journal and metadata and data space, so let's try without a journal with 'ext2': % mount -t ext2 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 8m12.196s user 0m1.120s sys 0m6.030s Sure it is better, that's 50% faster than 'ext3'. Let'a also try as a special case 'ext4' (yes, 'ext4' with its many improvements) without a journal: % mkfs.ext4 -O ^has_journal /dev/sdb mke2fs 1.41.11 (14-Mar-2010) /dev/sdb is entire device, not just one partition! Proceed anyway? (y,n) y [ ... ] % mount -t ext4 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m31.119s user 0m0.870s sys 0m6.190s Well, I don't believe that. That looks like a feature or bug in 'ext4' where without a journal it won't honor commits. The same appears to be the case for JFS, but then the manual explicitly says that 'nointegrity' is aptly named, and so it is be;lievable that switching off journaling is not its only effect: % mount -t jfs -o relatime,nointegrity /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m35.820s user 0m0.610s sys 0m5.740s Setup details ============= ULTS10 64b, 2.6.35 kernel, 4GiB RAM, I3-M370 CPU. Quiet except for measurements. Every 'tar' extraction is preceded by a re-'mkfs'. Note the details below (e.g. the archive is uncompressed and stored in in-memory 'tmpfs', the disk is a fairly fast 500GB drive on eSATA). ---------------------------------------------------------------- % dd bs=1M if=/tmp/linux-2.6.38.tar of=/dev/null 420+1 records in 420+1 records out 440483840 bytes (440 MB) copied, 0.159935 s, 2.8 GB/s ---------------------------------------------------------------- % hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 388 MB in 3.01 seconds = 128.98 MB/sec ---------------------------------------------------------------- % lsscsi | grep sdb [4:0:0:0] disk ATA ST3500418AS CC44 /dev/sdb ---------------------------------------------------------------- % mkfs.ext3 /dev/sdb mke2fs 1.41.11 (14-Mar-2010) /dev/sdb is entire device, not just one partition! Proceed anyway? (y,n) y Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 30531584 inodes, 122096646 blocks 6104832 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 3727 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 32 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. ---------------------------------------------------------------- % mkfs.xfs -f /dev/sdb meta-data=/dev/sdb isize=256 agcount=4, agsize=30524162 blks = sectsz=512 attr=2 data = bsize=4096 blocks=122096646, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=59617, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 ---------------------------------------------------------------- From pg_mh@sabi.co.UK Sun May 1 09:43:34 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41EhYmL180159 for ; Sun, 1 May 2011 09:43:34 -0500 X-ASG-Debug-ID: 1304261225-685d03da0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hermes1.dur.ac.uk (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 8BBCC15D0F49 for ; Sun, 1 May 2011 07:47:05 -0700 (PDT) Received: from hermes1.dur.ac.uk (hermes1.dur.ac.uk [129.234.248.1]) by cuda.sgi.com with ESMTP id ZkGRA1D35GYgOzDa for ; Sun, 01 May 2011 07:47:05 -0700 (PDT) Received: from smtphost2.dur.ac.uk (smtphost2.dur.ac.uk [129.234.252.2]) by hermes1.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41Ekoin015482 for ; Sun, 1 May 2011 15:46:55 +0100 Received: from ty.sabi.co.UK (o1.phyip3.dur.ac.uk [129.234.186.1]) by smtphost2.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41EkZB3011973 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Sun, 1 May 2011 15:46:35 +0100 Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.71 #1) id 1QGXn2-0004hA-Gw for ; Sun, 01 May 2011 15:38:28 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Message-ID: <19901.28769.553575.864887@tree.ty.sabi.co.UK> Date: Sun, 1 May 2011 15:38:25 +0100 X-Face: SMJE]JPYVBO-9UR%/8d'mG.F!@.,l@c[f'[%S8'BZIcbQc3/">GrXDwb#;fTRGNmHr^JFb SAptvwWc,0+z+~p~"Gdr4H$(|N(yF(wwCM2bW0~U?HPEE^fkPGx^u[*[yV.gyB!hDOli}EF[\cW*S H&spRGFL}{`bj1TaD^l/"[ msn( /TH#THs{Hpj>)]f> X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem In-Reply-To: <20110501084919.GE13542@dastard> References: <4DB72084.8020205@inf.ethz.ch> <4DB74331.3030804@hardwarefreak.com> <4DB75C6D.1080901@inf.ethz.ch> <19898.53907.842827.480883@tree.ty.sabi.co.UK> <20110501084919.GE13542@dastard> X-Mailer: VM 8.0.13 under 23.1.1 (x86_64-pc-linux-gnu) From: pg_xf2@xf2.for.sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-DurhamAcUk-MailScanner: Found to be clean, Found to be clean X-DurhamAcUk-MailScanner-ID: p41Ekoin015482 X-Barracuda-Connect: hermes1.dur.ac.uk[129.234.248.1] X-Barracuda-Start-Time: 1304261227 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.62 X-Barracuda-Spam-Status: No, SCORE=-1.62 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_SC0_SA085b, ISO2022JP_CHARSET X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62470 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 ISO2022JP_CHARSET ISO-2022-JP message 0.40 BSF_SC0_SA085b Custom Rule SA085b X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean > [ ... ] [ ... Extracting a kernel 'tar' with GNU tar on 'ext3': ] >>> real 0m21.769s [ ... Extracting a kernel 'tar' with GNU tar on XFS: ] >>> real 2m20.522s >> [ ... ] in most cases the wrong number is the one for 'ext3' >> on RAID1 (way too small). Even the number for XFS and RAID0 >> 'delaylog' is a wrong number (somewhat small) in many cases. >> There are 38000 files in 440MB in 'linux-2.6.38.tar', ~40% of >> them are smaller than 4KiB and ~60% smaller than 8KiB. Also you >> didn't flush caches, and you don't say whether the filesystems >> are empty or full or at the same position on the disk. >> >> Can 'ext3' really commit 1900 small files per second (including >> directory updates) to a filesystem on a RAID1 that probably can >> do around 100 IOPS? That would be amazing news. In the real world 'ext3' as reported in my previous message can "really commit" around 50 "small files per second (including directory updates)" in near-optimal conditions to a storage device that can proboably do around 100IOPS; copying here the actual numbers: % mount -t ext3 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 12m49.610s user 0m0.990s sys 0m8.610s .... % df -BM /mnt/sdb Filesystem 1M-blocks Used Available Use% Mounted on /dev/sdb 469455M 687M 444922M 1% /mnt/sdb % df -i /mnt/sdb Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb 30531584 38100 30493484 1% /mnt/sdb As a side note, even 12m49.610s is probably a bit optimistic because of the 1s timestamp resolution of 'ext3': http://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg272253.html > Of course it can. And a pony! Or rather 'O_PONIES' :-). > Why? Because the allocator is optimised to pack small files > written at the same time together on disk, and the elevator > will merge them into one large IO when they are finally > written to disk. With a typical 512k max IO size, that's 128 > <=4k files packed into each IO, This is an argument based on a cunning or distracted or ignorant shift of the goalposts: because this is an argument about purely *writing* the *data* in those small files, while the bigger issue is *committing* the *metadata*, all of it "(including directory updates)". Also, this argument is also based on the assumption that it is permissible to commit 128 small files when the last one gets closed, not when each gets committed. In this discussion it is rather comical to make an argument based on the speed of IO using what is in effect EatMyData as described here: http://talk.maemo.org/showthread.php?t=67901 but here it is: > In a perfect world, we're talking about ~13000 4k files a > second being written to disk @ 100 IOPS. In the real world, > writing an order of magnitude less files per second is quite > obtainable. But in the real world the "quite obtainable" number with 'ext3' for "really commit [ ... ] small files per second (including directory updates)" on storage that "probably can do around 100 IOPS" is around *50* (fifty), not 1,300, never mind 13,000. Sure if one want to look instead at whatever number they can get with their clever "benchmarks" one can get: % mount -t ext3 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar -no-fsync; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m27.414s user 0m0.270s sys 0m2.430s That's a fantastic result, somewhat over 1,300 small files per second (14 commits per nominal IOPS), but "fantastic" (as in fantasy) is the keyword, because it is for completely different and broken semantics, a point that should not be lost on anybody who can "understand IOPS and metadata and commits and caching". It is not as if the difference isn't widely known: http://cdrecord.berlios.de/private/man/star/star.1.html Star is a very fast tar(1) like tape archiver with improved functionality. On operating systems with slow file I/O (such as Linux), it may help to use -no-fsync in addition, but then star is unable to detect all error conditions; so use with care. That GNU 'tar' does not commit files when extracting is pretty old news, and therefore as I wrote in a previous message on a similar detail: There is something completely different: a tradeoff between levels of safety (whether you want committed transactions or not and how finely grained) and time to completion. But when one sees comical "performance" comparisons without even cache flushing, explaining the difference between a performance problem and different safety/speed tradeoffs seems a bit wasted. Again, the fundamental problem is how many committed IOPS the storage system can do given a metadata (and thus journal) intensive load (the answer is "not many" per spinning medium). Plus of course: >> Despite decades of seeing it happen, I keep being astonished by >> how many people (some with decades of "experience") just don't >> understand IOPS and metadata and commits and caching and who > Oh, the irony.... :) Indeed :-). From pg_mh@sabi.co.UK Sun May 1 10:05:58 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41F5wQS180939 for ; Sun, 1 May 2011 10:05:58 -0500 X-ASG-Debug-ID: 1304262572-32d802520000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hermes2.dur.ac.uk (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 6DDB5424694 for ; Sun, 1 May 2011 08:09:32 -0700 (PDT) Received: from hermes2.dur.ac.uk (hermes2.dur.ac.uk [129.234.248.2]) by cuda.sgi.com with ESMTP id HWlIbNG5lzFEuUj1 for ; Sun, 01 May 2011 08:09:32 -0700 (PDT) Received: from smtphost1.dur.ac.uk (smtphost1.dur.ac.uk [129.234.252.1]) by hermes2.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41F9BSl032682 for ; Sun, 1 May 2011 16:09:15 +0100 Received: from ty.sabi.co.UK (o1.phyip3.dur.ac.uk [129.234.186.1]) by smtphost1.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41F8ufd031719 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Sun, 1 May 2011 16:08:56 +0100 Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.71 #1) id 1QGYGP-0004k4-Su for ; Sun, 01 May 2011 16:08:50 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <19901.30588.512362.651640@tree.ty.sabi.co.UK> Date: Sun, 1 May 2011 16:08:44 +0100 X-Face: SMJE]JPYVBO-9UR%/8d'mG.F!@.,l@c[f'[%S8'BZIcbQc3/">GrXDwb#;fTRGNmHr^JFb SAptvwWc,0+z+~p~"Gdr4H$(|N(yF(wwCM2bW0~U?HPEE^fkPGx^u[*[yV.gyB!hDOli}EF[\cW*S H&spRGFL}{`bj1TaD^l/"[ msn( /TH#THs{Hpj>)]f> X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem In-Reply-To: <19901.28769.553575.864887@tree.ty.sabi.co.UK> References: <4DB72084.8020205@inf.ethz.ch> <4DB74331.3030804@hardwarefreak.com> <4DB75C6D.1080901@inf.ethz.ch> <19898.53907.842827.480883@tree.ty.sabi.co.UK> <20110501084919.GE13542@dastard> <19901.28769.553575.864887@tree.ty.sabi.co.UK> X-Mailer: VM 8.0.13 under 23.1.1 (x86_64-pc-linux-gnu) From: pg_xf2@xf2.for.sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-DurhamAcUk-MailScanner: Found to be clean, Found to be clean X-DurhamAcUk-MailScanner-ID: p41F9BSl032682 X-Barracuda-Connect: hermes2.dur.ac.uk[129.234.248.2] X-Barracuda-Start-Time: 1304262573 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0002 1.0000 -2.0195 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62471 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean [ ... ] >> [ ... Extracting a kernel 'tar' with GNU tar on 'ext3': ] >>> real 0m21.769s >> [ ... Extracting a kernel 'tar' with GNU tar on XFS: ] >>> real 2m20.522s >>> [ ... ] in most cases the wrong number is the one for 'ext3' >>> on RAID1 (way too small). Even the number for XFS and RAID0 >>> 'delaylog' is a wrong number (somewhat small) in many cases. [ ... ] > % mount -t ext3 -o relatime /dev/sdb /mnt/sdb > % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' > star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). > real 12m49.610s > user 0m0.990s > sys 0m8.610s [ ... ] > In this discussion it is rather comical to make an argument > based on the speed of IO using what is in effect EatMyData as > described here: > http://talk.maemo.org/showthread.php?t=67901 Just for confirmation here is the fantastic "performance" of 'ext3' with EatMyData: % mount -t ext3 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; eatmydata star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' /bin/star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m28.917s user 0m0.310s sys 0m2.410s Surprise surprise :-) the duration is much the same as GNU tar or 'star' with '-no-fsync'. Well, 'ext3' can do a rate a bit over 1,300 on a 100IOPS sort of drive, but only in the EatMyData (plus 'umount') world not the real world. That is where the 38,100 files are run in effect as a single large commit. From pg_mh@sabi.co.UK Sun May 1 10:28:44 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41FSiZ5181704 for ; Sun, 1 May 2011 10:28:44 -0500 X-ASG-Debug-ID: 1304263939-2608039c0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hermes1.dur.ac.uk (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 15B421E18FD2 for ; Sun, 1 May 2011 08:32:19 -0700 (PDT) Received: from hermes1.dur.ac.uk (hermes1.dur.ac.uk [129.234.248.1]) by cuda.sgi.com with ESMTP id du2xyqD4r3AoZsbG for ; Sun, 01 May 2011 08:32:19 -0700 (PDT) Received: from smtphost4.dur.ac.uk (smtphost4.dur.ac.uk [129.234.252.4]) by hermes1.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41FVvKf029006; Sun, 1 May 2011 16:32:01 +0100 Received: from ty.sabi.co.UK (o1.phyip3.dur.ac.uk [129.234.186.1]) by smtphost4.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41FViQv029149 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Sun, 1 May 2011 16:31:44 +0100 Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.71 #1) id 1QGYcS-0004lc-MU; Sun, 01 May 2011 16:31:36 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <19901.31958.368144.832086@tree.ty.sabi.co.UK> Date: Sun, 1 May 2011 16:31:34 +0100 Precedence: air-mail To: Linux RAID , Linux fs JFS , Linux fs XFS X-ASG-Orig-Subj: Re: RAID6 r-m-w, op-journaled fs, SSDs Subject: Re: RAID6 r-m-w, op-journaled fs, SSDs In-Reply-To: <20110501082717.5116e575@notabene.brown> References: <19900.10868.583555.849181@tree.ty.sabi.co.UK> <20110501082717.5116e575@notabene.brown> X-Mailer: VM 8.0.13 under 23.1.1 (x86_64-pc-linux-gnu) From: pg_xf2@xf2.for.sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-DurhamAcUk-MailScanner: Found to be clean, Found to be clean X-DurhamAcUk-MailScanner-ID: p41FVvKf029006 X-Barracuda-Connect: hermes1.dur.ac.uk[129.234.248.1] X-Barracuda-Start-Time: 1304263940 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62473 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean [ ... ] >> * Can Linux MD do "abbreviated" read-modify-write RAID6 >> updates like for RAID5? [ ... ] > No. (patches welcome). Ahhhm, but let me dig a bit deeper, even if it may be implied in the answer: would it be *possible*? That is, is the double parity scheme used in MS such that it is possible to "subtract" the old content of a page and "add" the new content of that page to both parity pages? [ ... ] > The ideal config for a journalled filesystem is for put the > journal on a separate smaller lower-latency device. e.g. a > small RAID1 pair. > In a previous work place I had good results with: > RAID1 pair of small disks with root, swap, journal > Large RAID5/6 array with bulk of filesystem. Sound reasonable, except that I am allergic to RAID5 (except in two cases) and RAID6 (in general). :-), but would work equally well I guess with RAID10 and its delightful MD implementation. [ ... ] Thanks for the information! From michael.monnerie@is.it-management.at Sun May 1 10:29:25 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41FTOrq181751 for ; Sun, 1 May 2011 10:29:25 -0500 X-ASG-Debug-ID: 1304263978-5677005a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mailsrv14.zmi.at (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 37D8442466E for ; Sun, 1 May 2011 08:32:58 -0700 (PDT) Received: from mailsrv14.zmi.at (mailsrv1.zmi.at [212.69.164.54]) by cuda.sgi.com with ESMTP id 9HLnSSWKdvJnH5hs for ; Sun, 01 May 2011 08:32:58 -0700 (PDT) Received: from mailsrv.i.zmi.at (h081217106033.dyn.cm.kabsi.at [81.217.106.33]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailsrv2.i.zmi.at", Issuer "power4u.zmi.at" (not verified)) by mailsrv14.zmi.at (Postfix) with ESMTPSA id 54760522 for ; Sun, 1 May 2011 17:32:57 +0200 (CEST) Received: from saturn.localnet (saturn.i.zmi.at [10.72.27.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mailsrv.i.zmi.at (Postfix) with ESMTPSA id 8A319401C3A for ; Sun, 1 May 2011 17:32:56 +0200 (CEST) From: Michael Monnerie Organization: it-management http://it-management.at To: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem Date: Sun, 1 May 2011 17:32:51 +0200 User-Agent: KMail/1.13.6 (Linux/2.6.37.1-1.2-desktop; KDE/4.6.0; x86_64; ; ) References: <4DB72084.8020205@inf.ethz.ch> <20110501084919.GE13542@dastard> <19901.28769.553575.864887@tree.ty.sabi.co.UK> In-Reply-To: <19901.28769.553575.864887@tree.ty.sabi.co.UK> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2760581.x8BJke0XJx"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <201105011732.56226@zmi.at> X-Barracuda-Connect: mailsrv1.zmi.at[212.69.164.54] X-Barracuda-Start-Time: 1304263979 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62473 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean --nextPart2760581.x8BJke0XJx Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable On Sonntag, 1. Mai 2011 Peter Grandi wrote: > But when one sees comical "performance" comparisons without > even cache flushing, explaining the difference between a > performance problem and different safety/speed tradeoffs seems > a bit wasted. Before people run aroung peeing each other on the leg, I'd like to bring=20 this back from "benchmarking" to "user experience". The OP didn't=20 benchmark, he just noticed that on ext3 unpacking the kernel source was=20 much faster than on XFS, on his machine. Step back from "benchmarking", and just read the words, forget about=20 benchmarks. With ext3, the user can start "make menuconfig" much earlier=20 than with xfs. In this specific case, the user is not interested if it's=20 safer, or already on disk, or running in the background. The user want's=20 to do his work, period. And that is - for this specific case on his=20 hardware (and probably on every hardware?) - much quicker with ext3 than=20 with xfs. I'd be interested why it is like that, and if there is anything to do=20 about it in xfs to become faster, or as-fast-as ext3, for this specific=20 case? =2D-=20 mit freundlichen Gr=FCssen, Michael Monnerie, Ing. BSc it-management Internet Services: Prot=E9ger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 // ****** Radiointerview zum Thema Spam ****** // http://www.it-podcast.at/archiv.html#podcast-100716 //=20 // Haus zu verkaufen: http://zmi.at/langegg/ --nextPart2760581.x8BJke0XJx Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) iEYEABECAAYFAk29fSgACgkQzhSR9xwSCbRJxQCgmMvPxVYaTFYZKZAgihu9X1Ot CAIAn31dlmYVDPcThwIzgn/Otb7MZaF2 =PTRe -----END PGP SIGNATURE----- --nextPart2760581.x8BJke0XJx-- From pg_mh@sabi.co.UK Sun May 1 11:29:34 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41GTY5D183806 for ; Sun, 1 May 2011 11:29:34 -0500 X-ASG-Debug-ID: 1304267589-5c7902330000-ps1ADW X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hermes1.dur.ac.uk (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 25C55424B3D for ; Sun, 1 May 2011 09:33:09 -0700 (PDT) Received: from hermes1.dur.ac.uk (hermes1.dur.ac.uk [129.234.248.1]) by cuda.sgi.com with ESMTP id fUGo2QTVKKWGGR0F for ; Sun, 01 May 2011 09:33:09 -0700 (PDT) Received: from smtphost2.dur.ac.uk (smtphost2.dur.ac.uk [129.234.252.2]) by hermes1.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41GWrFY014021 for ; Sun, 1 May 2011 17:32:57 +0100 Received: from ty.sabi.co.UK (o1.phyip3.dur.ac.uk [129.234.186.1]) by smtphost2.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41GWZis018868 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Sun, 1 May 2011 17:32:35 +0100 Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.71 #1) id 1QGZZN-0004vI-8F for ; Sun, 01 May 2011 17:32:29 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <19901.35612.784066.862671@tree.ty.sabi.co.UK> Date: Sun, 1 May 2011 17:32:28 +0100 X-Face: SMJE]JPYVBO-9UR%/8d'mG.F!@.,l@c[f'[%S8'BZIcbQc3/">GrXDwb#;fTRGNmHr^JFb SAptvwWc,0+z+~p~"Gdr4H$(|N(yF(wwCM2bW0~U?HPEE^fkPGx^u[*[yV.gyB!hDOli}EF[\cW*S H&spRGFL}{`bj1TaD^l/"[ msn( /TH#THs{Hpj>)]f> X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem In-Reply-To: <4DB75C6D.1080901@inf.ethz.ch> References: <4DB72084.8020205@inf.ethz.ch> <4DB74331.3030804@hardwarefreak.com> <4DB75C6D.1080901@inf.ethz.ch> X-Mailer: VM 8.0.13 under 23.1.1 (x86_64-pc-linux-gnu) From: pg_xf2@xf2.for.sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-DurhamAcUk-MailScanner: Found to be clean, Found to be clean X-DurhamAcUk-MailScanner-ID: p41GWrFY014021 X-Barracuda-Connect: hermes1.dur.ac.uk[129.234.248.1] X-Barracuda-Start-Time: 1304267590 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0001 1.0000 -2.0205 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62477 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean > [ ... ] > This is of course with delaylog enabled. I don't think a > difference of a factor 7 is normal, given that writing to a > raid-0 (xfs numbers) is supposed to be faster than writing to > raid-1 (ext3 numbers) [ ... ] To summarize some previous detailed discussion the actual "performance" difference is either a factor of around 2 (12m for 'ext3', 24m for XFS) in the regular case or a factor of around 1.2 (27s for 'ext3', 33s for XFS) in the EatMyData (plus 'umount') case (the one giving you over 1,300 transactions second). Numbers like a bit over 2m for XFS with 'delaylog' and a bit over 4m without are for intermediate cases between the regular and EatMyData case, depending on how infrequently data and metadata are committed by XFS. If you want best "performance" with XFS, 'exec eatmydata "$SHELL"' might be the top solution ;-). From BATV+56ad5e14cb2cfdede557+2807+infradead.org+hch@bombadil.srs.infradead.org Sun May 1 11:52:18 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41GqH8b184612 for ; Sun, 1 May 2011 11:52:18 -0500 X-ASG-Debug-ID: 1304268953-567f03890000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 51D234247FD for ; Sun, 1 May 2011 09:55:53 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id FEUNIBUbqccnB2jE for ; Sun, 01 May 2011 09:55:53 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QGZvu-0002y4-AI; Sun, 01 May 2011 16:55:46 +0000 Date: Sun, 1 May 2011 12:55:46 -0400 From: Christoph Hellwig To: Dave Chinner Cc: Martin Steigerwald , xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem Message-ID: <20110501165546.GB5391@infradead.org> References: <4DB72084.8020205@inf.ethz.ch> <20110427023534.GF12436@dastard> <201104291827.35801.Martin@lichtvoll.de> <20110501085246.GF13542@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110501085246.GF13542@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304268953 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, May 01, 2011 at 06:52:46PM +1000, Dave Chinner wrote: > > > more than likely your problem is that barriers have been enabled for > > > MD/DM devices on the new kernel, and they aren't on the old kernel. > > > XFS uses barriers by default, ext3 does not. Hence XFS performance > > > will change while ext3 will not. Check dmesg output when mounting > > > the filesystems on the different kernels. > > > > But didn't 2.6.38 replace barriers by explicit flushes the filesystem has to > > wait for - mitigating most of the performance problems with barriers? > > IIRC, it depends on whether the hardware supports FUA or not. If it > doesn't then device cache flushes are used to emulate FUA and so > performance can still suck. Christoph will no doubt correct me if I > got that wrong ;) Mitigating most of the barrier performance issues is a bit of a strong word. Yes, it remove useless ordering requirements, but fundamentally you still have to flush the disk cache to the physical medium, which is always going to be slower than just filling up a DRAM cache like ext3's default behaviour in mainline does (interestingly both SLES and RHEL have patched it to provide safe behaviour by default). Both the old barrier and new flush code will use the FUA bit if available, and those optimize the post-flush for a log write out. Note that currently libata by default always disables FUA support, even if the disk supports it, so you'll need a SAS/FC/iSCSI/etc device to actually see FUA requests, which is quite sad as it should provide a nice speedup epecially for SATA where the cache flush command is not queueable and thus requires us to still drain any outstanding I/O at least for a short duration. From bschindler@inf.ethz.ch Sun May 1 11:53:19 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41GrIsN184657 for ; Sun, 1 May 2011 11:53:19 -0500 X-ASG-Debug-ID: 1304269013-04c101720000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from edge10.ethz.ch (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 537D015D1316 for ; Sun, 1 May 2011 09:56:53 -0700 (PDT) Received: from edge10.ethz.ch (edge10.ethz.ch [82.130.75.186]) by cuda.sgi.com with ESMTP id 2SxXuyYkASFDK7Si for ; Sun, 01 May 2011 09:56:53 -0700 (PDT) Received: from CAS12.d.ethz.ch (172.31.38.212) by edge10.ethz.ch (82.130.75.186) with Microsoft SMTP Server (TLS) id 14.1.289.1; Sun, 1 May 2011 18:56:50 +0200 Received: from [10.0.0.2] (84.227.109.243) by CAS12.d.ethz.ch (172.31.38.212) with Microsoft SMTP Server (TLS) id 14.1.289.1; Sun, 1 May 2011 18:56:51 +0200 Message-ID: <4DBD90A3.6000101@inf.ethz.ch> Date: Sun, 1 May 2011 18:56:03 +0200 From: Benjamin Schindler User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110307 Lightning/1.0b3pre Thunderbird/3.1.9 MIME-Version: 1.0 To: Martin Steigerwald CC: , Dave Chinner X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem References: <201104291828.46420.Martin@lichtvoll.de> In-Reply-To: <201104291828.46420.Martin@lichtvoll.de> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [84.227.109.243] X-Barracuda-Connect: edge10.ethz.ch[82.130.75.186] X-Barracuda-Start-Time: 1304269014 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62478 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hi On 29.04.2011 18:28, Martin Steigerwald wrote: > sorry, forgot to cc. > > Am Mittwoch, 27. April 2011 schrieb Dave Chinner: >> On Tue, Apr 26, 2011 at 09:44:04PM +0200, Benjamin Schindler wrote: >>> Hi >>> >>> Since upgrading to newer kernels I have serious problems with xfs >>> performance on my root fs. >>> It runs on a software raid 0 with 2 disks. On the same two disks, >>> there are two more partitions running a software raid-1 with ext3. >>> On the ext3 system, I have no issue, so I assume the drives are >>> fine. >>> But on the xfs filesystem, extracting a linux kernel archive takes 5 >>> minutes or more, running ldconfig similarily long. The harddrives are >>> sata-2. >>> I'm running gentoo linux with kernel 2.6.38-gentoo-r1. I'm attaching >>> the kernel config but I guess more info is needed - just let me know >>> what is needed. >> >> more than likely your problem is that barriers have been enabled for >> MD/DM devices on the new kernel, and they aren't on the old kernel. >> XFS uses barriers by default, ext3 does not. Hence XFS performance >> will change while ext3 will not. Check dmesg output when mounting >> the filesystems on the different kernels. > > But didn't 2.6.38 replace barriers by explicit flushes the filesystem has to > wait for - mitigating most of the performance problems with barriers? > Well, that doesn't seem to work then may be? As always, I'm willing to do testing and provide info if required Cheers Benjamin p.s. please keep the cc From pg_mh@sabi.co.UK Sun May 1 13:10:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41IADvx187092 for ; Sun, 1 May 2011 13:10:14 -0500 X-ASG-Debug-ID: 1304273627-565000fc0000-ps1ADW X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hermes2.dur.ac.uk (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 49A4A15D1362 for ; Sun, 1 May 2011 11:13:48 -0700 (PDT) Received: from hermes2.dur.ac.uk (hermes2.dur.ac.uk [129.234.248.2]) by cuda.sgi.com with ESMTP id k0aAv51Bi6Po3Shf for ; Sun, 01 May 2011 11:13:48 -0700 (PDT) Received: from smtphost2.dur.ac.uk (smtphost2.dur.ac.uk [129.234.252.2]) by hermes2.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41IDVeJ022836 for ; Sun, 1 May 2011 19:13:35 +0100 Received: from ty.sabi.co.UK (o1.phyip3.dur.ac.uk [129.234.186.1]) by smtphost2.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41IDG2h025047 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Sun, 1 May 2011 19:13:16 +0100 Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.71 #1) id 1QGa4h-00051b-49 for ; Sun, 01 May 2011 18:04:51 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <19901.37553.911967.287089@tree.ty.sabi.co.UK> Date: Sun, 1 May 2011 18:04:49 +0100 X-Face: SMJE]JPYVBO-9UR%/8d'mG.F!@.,l@c[f'[%S8'BZIcbQc3/">GrXDwb#;fTRGNmHr^JFb SAptvwWc,0+z+~p~"Gdr4H$(|N(yF(wwCM2bW0~U?HPEE^fkPGx^u[*[yV.gyB!hDOli}EF[\cW*S H&spRGFL}{`bj1TaD^l/"[ msn( /TH#THs{Hpj>)]f> X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem In-Reply-To: <201105011732.56226@zmi.at> References: <4DB72084.8020205@inf.ethz.ch> <20110501084919.GE13542@dastard> <19901.28769.553575.864887@tree.ty.sabi.co.UK> <201105011732.56226@zmi.at> X-Mailer: VM 8.0.13 under 23.1.1 (x86_64-pc-linux-gnu) From: pg_xf2@xf2.for.sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-DurhamAcUk-MailScanner: Found to be clean, Found to be clean X-DurhamAcUk-MailScanner-ID: p41IDVeJ022836 X-Barracuda-Connect: hermes2.dur.ac.uk[129.234.248.2] X-Barracuda-Start-Time: 1304273629 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62484 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean [ ... ] > [ ... ] With ext3, the user can start "make menuconfig" much > earlier than with xfs. In this specific case, the user is > not interested if it's safer, or already on disk, or ============================= > running in the background. [ ... ] Usually I prefer to assume that the user is merely not aware of the tradeoff between different levels of safety and time, and thus they need to be discussed and expectations driven to realistic levels, even if admittedly often it is time wasted, also because so many people seduce users by selling 'O_PONIES'. Thus sometimes I don't like to assume that the user is a "couldn't care less" type of moron, as that seems condescending to me: > The user want's to do his work, period. I have heard something like that several times from salesmen... Or perhaps this recent Dilbert strip is appropriate: http://www.dilbert.com/strips/comic/2011-04-29/ From pg_mh@sabi.co.UK Sun May 1 13:10:42 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_41 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41IAfav187116 for ; Sun, 1 May 2011 13:10:42 -0500 X-ASG-Debug-ID: 1304273654-375b01430000-KTYTBk X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hermes2.dur.ac.uk (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B17EE424CCF for ; Sun, 1 May 2011 11:14:14 -0700 (PDT) Received: from hermes2.dur.ac.uk (hermes2.dur.ac.uk [129.234.248.2]) by cuda.sgi.com with ESMTP id 6JIGHFZo6Euim2SE for ; Sun, 01 May 2011 11:14:14 -0700 (PDT) Received: from smtphost1.dur.ac.uk (smtphost1.dur.ac.uk [129.234.252.1]) by hermes2.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41IDVBP022835; Sun, 1 May 2011 19:13:35 +0100 Received: from ty.sabi.co.UK (o1.phyip3.dur.ac.uk [129.234.186.1]) by smtphost1.dur.ac.uk (8.13.8/8.13.7) with ESMTP id p41IDE0X010568 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Sun, 1 May 2011 19:13:15 +0100 Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.71 #1) id 1QGb8k-0005Ah-8y; Sun, 01 May 2011 19:13:06 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Message-ID: <19901.41647.606112.243194@tree.ty.sabi.co.UK> Date: Sun, 1 May 2011 19:13:03 +0100 X-Face: SMJE]JPYVBO-9UR%/8d'mG.F!@.,l@c[f'[%S8'BZIcbQc3/">GrXDwb#;fTRGNmHr^JFb SAptvwWc,0+z+~p~"Gdr4H$(|N(yF(wwCM2bW0~U?HPEE^fkPGx^u[*[yV.gyB!hDOli}EF[\cW*S H&spRGFL}{`bj1TaD^l/"[ msn( /TH#THs{Hpj>)]f> Cc: Linux fs XFS , Linux fs JFS X-ASG-Orig-Subj: Re: op-journaled fs, journal size and storage speeds Subject: Re: op-journaled fs, journal size and storage speeds In-Reply-To: <20110501092758.GG13542@dastard> References: <19900.8703.214676.218477@tree.ty.sabi.co.UK> <20110501092758.GG13542@dastard> X-Mailer: VM 8.0.13 under 23.1.1 (x86_64-pc-linux-gnu) From: pg_mh@sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-DurhamAcUk-MailScanner: Found to be clean, Found to be clean X-DurhamAcUk-MailScanner-ID: p41IDVBP022835 X-Barracuda-Connect: hermes2.dur.ac.uk[129.234.248.2] X-Barracuda-Start-Time: 1304273655 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_SC5_SA210e, ISO2022JP_CHARSET X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62483 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 ISO2022JP_CHARSET ISO-2022-JP message 0.00 BSF_SC5_SA210e Custom Rule SA210e X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean >> Been thinking about journals and RAID6s and SSDs. In particular >> for file system designs like JFS and XFS that do operation >> journaling (while ext[34] do block journaling). > XFS is not an operation journalling filesystem. Most of the > metadata is dirty-region logged via buffers, just like ext3/4. Looking at the sources, XFS does operations journaling, in the form of physical ("dirty region") operation logging, instead of logical operation logging like JFS. Both are very different from block journaling. More in details, to me there is a stark contrast between 'jbd.h': http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=include/linux/jbd.h;h=e06965081ba5548f74db935543af84334f58259e;hb=HEAD where I find only a few journal transaction types (blocks) and 'xfs_trans.h' where I find many journal transaction types (ops): http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=fs/xfs/xfs_trans.h;h=c2042b736b81131a780703d8a5907c848793eebb;hb=HEAD Given that in the latter I see transaction types like 'XFS_TRANS_RENAME' or 'XFS_TRANS_MKDIR' it is hard to imagine how one can argue that the XFS journals something other than ops, even if in a buffered way of sorts. Ironically comparing with the 'jfs_logmgr.h': http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=fs/jfs/jfs_logmgr.h;h=9236bc49ae7ff1aed9cad81a2b22c2c54e433ba0;hb=HEAD I see lower level transaction types there (but they are logged as ops rather than "dirty-region"s.). [ ... ] >> It seems to me that adopting as guideline a percent of the >> filesystem is very wrong, and so I have been using a rule of >> thumb like one second of expected transfer rate, so "in flight" >> updates are never much behind. > How do you know what "one second" of "in flight" operations is > going to be? Well, that's what I discuss later, it is a "rule of thumb" based on on *some* rationale, but I have been questioning it. [ ... interesting summary of some of the many issue related to journal sizing ... ] > Easiest and most reliable method seems to be to size your > journal appropriatly in the first place and have you > algorithms key off that.... Sure, but *I* am asking that question :-). [ ... ] > 17 minutes is my current record by crashing a VM during a > chmod -R operation over a 100 million inode filesystem. That > was on a ~2GB log (maximum supported size). Uhhhm I happen to strongly relate to that (on a much smaller scale :->). [ ... ] >> This seems to me a fairly bad idea, because then the journal >> becomes a massive hot spot on the disk and draws the disk arm >> like black hole. I suspect that operations should not stay on > That's why you can configure an external log.... ...and lose barriers :-). But indeed. >> the journal for a long time. However if the journal is too >> small processes that do metadata updates start to hang on it. > Well, yes. The journal needs to be large enough to hold all > the transaction reservations for the active transactions. XFS, > in the worse case for a default filesystem config, needs about > 100MB of log space per 300 concurrent transactions. [ ... ] So something like 300KB per transaction? That seems a pretty extreme worst case. How is that possible? A metadata transaction with a "dirty region" of 300KB sound enormously expensive. It may be about extent maps for a very fragmented file I guess. Also not clear here what concurrent means because the log is sequential. I'll guess that it means "in flight". [ ... ] >> * What should journal size be proportional to? > Your workload. Sure, as a very top level goal. But that's not an answer, it is handwaving. As you argue earlier, it could be proportional in some cases to IO threads; or it could be number of arms, filesystem size, size of each volume, sequential transfer rate, random transfer rate, large IO transfer rate, small IO transfer rate, ... Some tighter guideline might be better than just guessing. >> * What is the downside of a too small journal? > Performance sucks. But why? Without a journal completely performance is better; assuming a one-transaction journal this becomes slower because of writing everything twice, but that happens for any size of journal, as it is unavoidable. When the journal fills up the effect is the same as that of a 1 transaction journal. That's the same for every type of buffer. So the effect of a journal larger than 1 transaction must be felt only when the journal is not full, that is there are pauses in the flow of transactions; and then it does not matter a lot just how large the journal is. So the journal should be large enough to accomodate the highest possible rate of metadata updates for the longest time this happens until there is a pause in the metadata updates. This of course depends on workload, but some rule of thumb based on experience might help. And here my guess is that shorter journals are better than longer ones, because also: >> * What is the downside of a too large journal other than space? > Recovery times too long, lots of outstanding metadata pinned > in memory (hello OOM-killer!), and other resource management > related scalability issues. I would have expected also more seeks, as reading logged but not yet finalized metadata has to go back to the journal, but I guess that's a small effect. >> BTW, another consideration is that for filesystems that are >> fairly journal-intensive, putting the journal on a low traffic >> storage device can have large benefits. > Yeah, nobody ever thought of an external log before.... :) I was just stating the obvious here, in order to contrast it with: >> But if they can be pretty small, I wonder whether putting the >> journals of several filesystems on the same storage device then >> becomes a sensible option as the locality will be quite narrow >> (e.g. a single physical cylinder) or it could be wortwhile like >> the database people do to journal to battery-backed RAM. For example as described in this old paper: http://www.evenenterprises.com/SSDoracl.pdf > Got a supplier for the custom hardware you'd need? There are still a few, for example at different ends of the scale: http://www.ramsan.com/solutions/oracle/ http://www.microdirect.co.uk/home/product/39434/ACARD-RAM-Disk-SSD-ANS-9010B-6X-DDR-II-Slots > Just use a PCIe SSD.... Yes, that's what many people are doing, but mostly for data, rather than specifically journals. As mentioned at the start I have indeed been thinking of SSDs. But they seem to me fundamentally terrible for journals, because of the large erase blocks sizes and the enormous latency of erase operations (lots of read-erase-write cycles for small commits). They seem more oriented to large mostly read-only data sets than very small mostly write ones. The saving grace is the capacitor-backed RAM in SSDs (used to work around erase block size issues as you probably know) which to a significant extent may act as the battery-backed RAM I was mentioning; and similarly as another post says the battery-backed RAM in RAID host adapters would do much the same function. But neither as cleanly as a dedicated unit, not a cache. But as another contributor said a fast/small disk RAID1 might be quite decent in many situations. From markus@trippelsdorf.de Sun May 1 13:26:30 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_DKIM_INVALID autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41IQUwV187585 for ; Sun, 1 May 2011 13:26:30 -0500 X-ASG-Debug-ID: 1304274601-375e01e60000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.ud10.udmedia.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 22581424948 for ; Sun, 1 May 2011 11:30:02 -0700 (PDT) Received: from mail.ud10.udmedia.de (ud10.udmedia.de [194.117.254.50]) by cuda.sgi.com with ESMTP id latbYEaHzhi2X5gp for ; Sun, 01 May 2011 11:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=mail.ud10.udmedia.de; h= date:from:to:cc:subject:message-id:references:mime-version: content-type:content-transfer-encoding:in-reply-to; q=dns/txt; s= beta; bh=Qc1NHFFYoXPxwojZFi172LWYZ6L42cP8apPCQtZzzsQ=; b=Yy8+Pp4 F1hQ7B30mAYVmJeg2mkuhhsf7UiEXKsgnkCIuUSnwlC6s7272AspgU/+SqScmt80 0a1PZMTEFleoanFmfP1PD6H8qHdzMutZ7lu4xh/5gc4lx0EXsqh06jtRpuj+tQ1v sjObzi0ztJ3tJZOE2aNZvOPu4kQpz6MgxvIE= Received: (qmail 11229 invoked from network); 1 May 2011 20:24:43 +0200 Received: from unknown (HELO x4.trippels.de) (ud10?360p3@91.66.182.48) by mail.ud10.udmedia.de with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 1 May 2011 20:24:43 +0200 Date: Sun, 1 May 2011 20:24:42 +0200 From: Markus Trippelsdorf To: Christoph Hellwig Cc: Dave Chinner , xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem Message-ID: <20110501182442.GA1635@x4.trippels.de> References: <4DB72084.8020205@inf.ethz.ch> <20110427023534.GF12436@dastard> <201104291827.35801.Martin@lichtvoll.de> <20110501085246.GF13542@dastard> <20110501165546.GB5391@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20110501165546.GB5391@infradead.org> X-Barracuda-Connect: ud10.udmedia.de[194.117.254.50] X-Barracuda-Start-Time: 1304274603 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62485 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 2011.05.01 at 12:55 -0400, Christoph Hellwig wrote: > On Sun, May 01, 2011 at 06:52:46PM +1000, Dave Chinner wrote: > > > > more than likely your problem is that barriers have been enabled for > > > > MD/DM devices on the new kernel, and they aren't on the old kernel. > > > > XFS uses barriers by default, ext3 does not. Hence XFS performance > > > > will change while ext3 will not. Check dmesg output when mounting > > > > the filesystems on the different kernels. > > > > > > But didn't 2.6.38 replace barriers by explicit flushes the filesystem has to > > > wait for - mitigating most of the performance problems with barriers? > > > > IIRC, it depends on whether the hardware supports FUA or not. If it > > doesn't then device cache flushes are used to emulate FUA and so > > performance can still suck. Christoph will no doubt correct me if I > > got that wrong ;) > > Mitigating most of the barrier performance issues is a bit of a strong > word. Yes, it remove useless ordering requirements, but fundamentally > you still have to flush the disk cache to the physical medium, which > is always going to be slower than just filling up a DRAM cache like > ext3's default behaviour in mainline does (interestingly both SLES > and RHEL have patched it to provide safe behaviour by default). > > Both the old barrier and new flush code will use the FUA bit if > available, and those optimize the post-flush for a log write out. > Note that currently libata by default always disables FUA support, > even if the disk supports it, so you'll need a SAS/FC/iSCSI/etc > device to actually see FUA requests, which is quite sad as it > should provide a nice speedup epecially for SATA where the cache > flush command is not queueable and thus requires us to still > drain any outstanding I/O at least for a short duration. I've recently asked on the IDE list why FUA is disabled by default in libata and this is what Tejun Heo had to say (calling it a misfeature): http://article.gmane.org/gmane.linux.ide/48954 Quote: »The way flushes are used by filesystems is that FUA is usually only used right after another FLUSH. ie. Using FUA replaces FLUSH + commit block write + FLUSH sequence to FLUSH + FUA commit block write. Due to the preceding FLUSH, the cache is already empty, so the only difference between WRITE + FLUSH and FUA WRITE becomes the extra command issue overhead which is usually almost unnoticeable compared to the actual IO. Another thing is that with the recent updates to block FLUSH handling, using FUA might even be less efficient. The new implementation aggressively merges those commit writes and flushes. IOW, depending on timing, multiple consecutive commit writes can be merged as, FLUSH + commit writes + FLUSH or FLUSH + some commit writes + FLUSH + other commit writes + FLUSH and so on, These merges will happen with fsync heavy workloads where FLUSH performance actually matters and, in these scenarios, FUA writes is less effective because it puts extra ordering restrictions on each FUA write. ie. With surrounding FLUSHes, the drive is free to reorder commit writes to maximize performance, with FUA, the disk has to jump around all over the place to execute each command in the exact issue order. I personally think FUA is a misfeature. It's a microoptimization with shallow benefits even when used properly while putting much heavier restriction on actual IO order, which usually is the slow part. That said, if someone can show FUA actually brings noticeable performance benefits, sure, let's do it, but till then, I think it would be best to leave it up in the attic.« -- Markus From sgi-linux-xfs@lo.gmane.org Sun May 1 14:01:32 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p41J1VtF188753 for ; Sun, 1 May 2011 14:01:31 -0500 X-ASG-Debug-ID: 1304276706-375e03220000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from lo.gmane.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B7DC3424C0C for ; Sun, 1 May 2011 12:05:06 -0700 (PDT) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by cuda.sgi.com with ESMTP id OlPZW5QrMFk7LpLn for ; Sun, 01 May 2011 12:05:06 -0700 (PDT) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QGbx3-0006CG-B9 for linux-xfs@oss.sgi.com; Sun, 01 May 2011 21:05:05 +0200 Received: from 121.79-160-103.customer.lyse.net ([79.160.103.121]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 01 May 2011 21:05:05 +0200 Received: from david.brown by 121.79-160-103.customer.lyse.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 01 May 2011 21:05:05 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: linux-xfs@oss.sgi.com From: David Brown X-ASG-Orig-Subj: Re: RAID6 r-m-w, op-journaled fs, SSDs Subject: Re: RAID6 r-m-w, op-journaled fs, SSDs Date: Sun, 01 May 2011 20:32:22 +0200 Lines: 59 Message-ID: References: <19900.10868.583555.849181@tree.ty.sabi.co.UK> <20110501082717.5116e575@notabene.brown> <19901.31958.368144.832086@tree.ty.sabi.co.UK> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 121.79-160-103.customer.lyse.net User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Thunderbird/3.1.10 In-Reply-To: <19901.31958.368144.832086@tree.ty.sabi.co.UK> Cc: linux-raid@vger.kernel.org X-Barracuda-Connect: lo.gmane.org[80.91.229.12] X-Barracuda-Start-Time: 1304276706 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.34 X-Barracuda-Spam-Status: No, SCORE=-1.34 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=OBSCURED_EMAIL, OBSCURED_EMAIL_2 X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62487 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 OBSCURED_EMAIL BODY: Message seems to contain rot13ed address 0.68 OBSCURED_EMAIL_2 BODY: Message seems to contain rot13ed address X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 01/05/11 17:31, Peter Grandi wrote: > [ ... ] > >>> * Can Linux MD do "abbreviated" read-modify-write RAID6 >>> updates like for RAID5? [ ... ] > >> No. (patches welcome). > > Ahhhm, but let me dig a bit deeper, even if it may be implied in > the answer: would it be *possible*? > > That is, is the double parity scheme used in MS such that it is > possible to "subtract" the old content of a page and "add" the > new content of that page to both parity pages? > If I've understood the maths correctly, then yes it would be possible. But it would involve more calculations, and it is difficult to see where the best balance lies between cpu demands and IO demands. In general, calculating the Q parity block for raid6 is processor-intensive - there's a fair amount of optimisation done in the normal calculations to keep it reasonable. Basically, the first parity P is a simple calculation: P = D_0 + D_1 + .. + D_n-1 But Q is more difficult: Q = D_0 + g.D_1 + g².D_2 + ... + g^(n-1).D_n-1 where "plus" is xor, "times" is a weird function calculated over a G(2^8) field, and g is a generator for that field. If you want to replace D_i, then you can calculate: P(new) = P(old) + D_i(old) + D_i(new) Q(new) = Q(old) + g^i.(D_i(old) + D_i(new)) This means multiplying by g_i for whichever block i is being replaced. The generator and multiply operation are picked to make it relatively fast and easy to multiply by g, especially if you've got a processor that has vector operations (as most powerful cpus do). This means that the original Q calculation is fairly efficient. But to do general multiplications by g_i is more effort, and will typically involve cache-killing lookup tables or multiple steps. It is probably reasonable to say that when md raid first implemented raid6, it made little sense to do these abbreviated parity calculations. But as processors have got faster (and wider, with more cores) while disk throughput has made slower progress, it's maybe a different balance. So it's probably both possible and practical to do these calculations. All it needs is someone to spend the time writing the code - and lots of people willing to test it. From david@fromorbit.com Sun May 1 20:20:02 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_41 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p421K2nu201968 for ; Sun, 1 May 2011 20:20:02 -0500 X-ASG-Debug-ID: 1304299414-4cff03870000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C1C521EC68D4 for ; Sun, 1 May 2011 18:23:35 -0700 (PDT) Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net [150.101.137.129]) by cuda.sgi.com with ESMTP id R8CVPQaUXPkljfpq for ; Sun, 01 May 2011 18:23:35 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhMEAJQGvk15LHHJgWdsb2JhbACmGxUBARYmJYhxuDAOgneCewSdLQ Received: from ppp121-44-113-201.lns20.syd6.internode.on.net (HELO dastard) ([121.44.113.201]) by ipmail06.adl2.internode.on.net with ESMTP; 02 May 2011 10:53:18 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGhqs-0001Jp-3d; Mon, 02 May 2011 11:23:06 +1000 Date: Mon, 2 May 2011 11:23:06 +1000 From: Dave Chinner To: Peter Grandi Cc: Linux fs XFS , Linux fs JFS X-ASG-Orig-Subj: Re: op-journaled fs, journal size and storage speeds Subject: Re: op-journaled fs, journal size and storage speeds Message-ID: <20110502012306.GJ13542@dastard> References: <19900.8703.214676.218477@tree.ty.sabi.co.UK> <20110501092758.GG13542@dastard> <19901.41647.606112.243194@tree.ty.sabi.co.UK> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <19901.41647.606112.243194@tree.ty.sabi.co.UK> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl2.internode.on.net[150.101.137.129] X-Barracuda-Start-Time: 1304299416 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0105 1.0000 -1.9526 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.95 X-Barracuda-Spam-Status: No, SCORE=-1.95 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_SC5_SA210e X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62513 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 BSF_SC5_SA210e Custom Rule SA210e X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, May 01, 2011 at 07:13:03PM +0100, Peter Grandi wrote: > > >> Been thinking about journals and RAID6s and SSDs. In particular > >> for file system designs like JFS and XFS that do operation > >> journaling (while ext[34] do block journaling). > > > XFS is not an operation journalling filesystem. Most of the > > metadata is dirty-region logged via buffers, just like ext3/4. > > Looking at the sources, XFS does operations journaling, in the > form of physical ("dirty region") operation logging, Operation logging contains no physical changes - it just indicates the change to be made typically via an intent/done transaction pair. It says what is going to be done, then what has been done, but not the details of the changes made. XFs _always_ logs the details of the changes made, and.... > instead of > logical operation logging like JFS. Both are very different from > block journaling. When you are dirtying entire blocks, then the way the blocks are logged is really no different to ext3/4's block logging... > More in details, to me there is a stark contrast between 'jbd.h': > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=include/linux/jbd.h;h=e06965081ba5548f74db935543af84334f58259e;hb=HEAD > > where I find only a few journal transaction types (blocks) and > 'xfs_trans.h' where I find many journal transaction types (ops): > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=fs/xfs/xfs_trans.h;h=c2042b736b81131a780703d8a5907c848793eebb;hb=HEAD Yeah, so that number goes into the transaction header on disk mainly for debugging purposes - you can identify what operation triggered the transaction in the log just by looking at the log. However, taht is _completely ignored_ for delayed logging - you'll only ever see "checkpoint" transactions with delayed logging as it throws away all the transaction specific metadata in memory... > Given that in the latter I see transaction types like > 'XFS_TRANS_RENAME' or 'XFS_TRANS_MKDIR' it is hard to imagine how > one can argue that the XFS journals something other than ops, even > if in a buffered way of sorts. Why don't you look at the transaction reservations that define what one of those "transaction ops" contains. e.g. MKDIR uses the inode create reservation: /* * For create we can modify: * the parent directory inode: inode size * the new inode: inode size * the inode btree entry: block size * the superblock for the nlink flag: sector size * the directory btree: (max depth + v2) * dir block size * the directory inode's bmap btree: (max depth + v2) * block size * Or in the first xact we allocate some inodes giving: * the agi and agf of the ag getting the new inodes: 2 * sectorsize * the superblock for the nlink flag: sector size * the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize * the inode btree: max depth * blocksize * the allocation btrees: 2 trees * (max depth - 1) * block size */ STATIC uint xfs_calc_create_reservation( struct xfs_mount *mp) { return XFS_DQUOT_LOGRES(mp) + MAX((mp->m_sb.sb_inodesize + mp->m_sb.sb_inodesize + mp->m_sb.sb_sectsize + XFS_FSB_TO_B(mp, 1) + XFS_DIROP_LOG_RES(mp) + 128 * (3 + XFS_DIROP_LOG_COUNT(mp))), (3 * mp->m_sb.sb_sectsize + XFS_FSB_TO_B(mp, XFS_IALLOC_BLOCKS(mp)) + XFS_FSB_TO_B(mp, mp->m_in_maxlevels) + XFS_ALLOCFREE_LOG_RES(mp, 1) + 128 * (2 + XFS_IALLOC_BLOCKS(mp) + mp->m_in_maxlevels + XFS_ALLOCFREE_LOG_COUNT(mp, 1)))); } > > How do you know what "one second" of "in flight" operations is > > going to be? > > Well, that's what I discuss later, it is a "rule of thumb" based > on on *some* rationale, but I have been questioning it. > > [ ... interesting summary of some of the many issue related to > journal sizing ... ] > > > Easiest and most reliable method seems to be to size your > > journal appropriatly in the first place and have you > > algorithms key off that.... > > Sure, but *I* am asking that question :-). And my response is that there is no one correct answer, and that physical limits are usually the issue... > >> This seems to me a fairly bad idea, because then the journal > >> becomes a massive hot spot on the disk and draws the disk arm > >> like black hole. I suspect that operations should not stay on > > > That's why you can configure an external log.... > > ...and lose barriers :-). But indeed. As always, if performance and data safety is your concern, spend a few hundred dollars more and buy a decent HW RAID card with a BBWC.... > >> the journal for a long time. However if the journal is too > >> small processes that do metadata updates start to hang on it. > > > Well, yes. The journal needs to be large enough to hold all > > the transaction reservations for the active transactions. XFS, > > in the worse case for a default filesystem config, needs about > > 100MB of log space per 300 concurrent transactions. [ ... ] > > So something like 300KB per transaction? Yup. And the size is dependent on filesystem block size, filesystem and AG size (max btree depths). So for a 64k block size filesystem, that 300kb transaction reservation blows out to about 3MB.... > That seems a pretty > extreme worst case. How is that possible? A metadata transaction > with a "dirty region" of 300KB sound enormously expensive. It may > be about extent maps for a very fragmented file I guess. It's actually very small. Have you ever looked at how much metadata a directory contains? Rule of thumb is that a directory consumes about 100MB of metadata for every million entries for average length filenames. having a create transaction consume 300KB at maximum for a worst case modification of a directory with a million, 10M or 100M entries makes that 300k look pretty small... > clear here what concurrent means because the log is sequential. > I'll guess that it means "in flight". > > [ ... ] > > >> * What should journal size be proportional to? > > > Your workload. > > Sure, as a very top level goal. But that's not an answer, it is > handwaving. As you argue earlier, it could be proportional in some > cases to IO threads; or it could be number of arms, filesystem > size, size of each volume, sequential transfer rate, random > transfer rate, large IO transfer rate, small IO transfer rate, ... Nice definition of "workload dependent". > Some tighter guideline might be better than just guessing. > > >> * What is the downside of a too small journal? > > > Performance sucks. > > But why? Without a journal completely performance is better; > assuming a one-transaction journal this becomes slower because > of writing everything twice, but that happens for any size of > journal, as it is unavoidable. Why does having a writeback cache improve perfromance? Larger journals enable longer caching of dirty metadata before writeback must occur. > When the journal fills up the effect is the same as that of a 1 > transaction journal. That's the same for every type of buffer. And then you've got the problem of having to wait for those 10 objects to complete IO before you can do another transaction, while if you have a large log, you can push on it before you run out of space to try to ensure it never stalls. And when you have 100,000 metadata objects to write back, you can optimise the IO a whole lot better than when you only have 10 objects. > So the effect of a journal larger than 1 transaction must be > felt only when the journal is not full, Sure, and we've spent years optimising the metadata flushing to ensure we empty the log as fast as possible under sustained workloads. You need enough space in the journal to decouple transactions from the flow of metadata writeback - how much is very workload dependent. > that is there are pauses > in the flow of transactions; and then it does not matter a lot > just how large the journal is. > > So the journal should be large enough to accomodate the highest > possible rate of metadata updates for the longest time this > happens until there is a pause in the metadata updates. We need to be able to sustain hundreds of thousands of transactions per second, every second, 24x7. There are no "pauses" we can take advantage of to "catch up" - metadata writeback must take place simultaneously with new transactions, and the journal must be large enough to decouple these effectively. > This of course depends on workload, but some rule of thumb based > on experience might help. Sure - we encode that experience in the mkfs and kernel default behaviour. > And here my guess is that shorter journals are better than > longer ones, because also: > > >> * What is the downside of a too large journal other than space? > > > Recovery times too long, lots of outstanding metadata pinned > > in memory (hello OOM-killer!), and other resource management > > related scalability issues. > > I would have expected also more seeks, as reading logged but not > yet finalized metadata has to go back to the journal, but I guess > that's a small effect. Say what? Nobody reads from the journal except during recovery. Anything that is in the journal is dirty in memory, so any reads come from the memory objects, not the journal.... > > Got a supplier for the custom hardware you'd need? > > There are still a few, for example at different ends of the scale: > > http://www.ramsan.com/solutions/oracle/ > http://www.microdirect.co.uk/home/product/39434/ACARD-RAM-Disk-SSD-ANS-9010B-6X-DDR-II-Slots Neither of them are what I'd consider "battery backed RAM" - to the filesystem they are simply fast block devices behind a SATA/SAS/FC interface. Effectively no different to a SAS/SATA/FC- or PCIe-based flash SSD. > But as another contributor said a fast/small disk RAID1 might be > quite decent in many situations. Not fast enough for an XFS log - I can push >500MB/s through the XFS journal on a device (12 disk (7200rpm) RAID-0) that will do 700MB/s for sequential data IO. Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Sun May 1 21:47:27 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p422lRqe205163 for ; Sun, 1 May 2011 21:47:27 -0500 X-ASG-Debug-ID: 1304304658-0e8302a20000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 27D2516471AF for ; Sun, 1 May 2011 19:50:59 -0700 (PDT) Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net [150.101.137.129]) by cuda.sgi.com with ESMTP id D0h3UhrD19vCVo9x for ; Sun, 01 May 2011 19:50:59 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsIEAK0bvk15LHHJgWdsb2JhbACmHRUBARYmJYhxHLgSDoMKgmgEnS0 Received: from ppp121-44-113-201.lns20.syd6.internode.on.net (HELO dastard) ([121.44.113.201]) by ipmail06.adl2.internode.on.net with ESMTP; 02 May 2011 12:20:44 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGjDe-0001Qy-Pt; Mon, 02 May 2011 12:50:42 +1000 Date: Mon, 2 May 2011 12:50:42 +1000 From: Dave Chinner To: Peter Grandi Cc: Linux fs XFS X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem Message-ID: <20110502025042.GK13542@dastard> References: <4DB72084.8020205@inf.ethz.ch> <4DB74331.3030804@hardwarefreak.com> <4DB75C6D.1080901@inf.ethz.ch> <19898.53907.842827.480883@tree.ty.sabi.co.UK> <20110501084919.GE13542@dastard> <19901.28769.553575.864887@tree.ty.sabi.co.UK> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <19901.28769.553575.864887@tree.ty.sabi.co.UK> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl2.internode.on.net[150.101.137.129] X-Barracuda-Start-Time: 1304304661 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62518 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, May 01, 2011 at 03:38:25PM +0100, Peter Grandi wrote: > > [ ... ] > > [ ... Extracting a kernel 'tar' with GNU tar on 'ext3': ] > >>> real 0m21.769s > [ ... Extracting a kernel 'tar' with GNU tar on XFS: ] > >>> real 2m20.522s > > >> [ ... ] in most cases the wrong number is the one for 'ext3' > >> on RAID1 (way too small). Even the number for XFS and RAID0 > >> 'delaylog' is a wrong number (somewhat small) in many cases. > > >> There are 38000 files in 440MB in 'linux-2.6.38.tar', ~40% of > >> them are smaller than 4KiB and ~60% smaller than 8KiB. Also you > >> didn't flush caches, and you don't say whether the filesystems > >> are empty or full or at the same position on the disk. > >> > >> Can 'ext3' really commit 1900 small files per second (including > >> directory updates) to a filesystem on a RAID1 that probably can > >> do around 100 IOPS? That would be amazing news. > > In the real world 'ext3' as reported in my previous message can > "really commit" around 50 "small files per second (including > directory updates)" in near-optimal conditions to a storage > device that can proboably do around 100IOPS; copying here the > actual numbers: > > % mount -t ext3 -o relatime /dev/sdb /mnt/sdb > % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' > star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). Oh, you fsync every file. The problem the user reported did not involve fsync at all, so your straw man isn't really relevant to the reported problem. You're redefining the problem to suit your argument. > > Why? Because the allocator is optimised to pack small files > > written at the same time together on disk, and the elevator > > will merge them into one large IO when they are finally > > written to disk. With a typical 512k max IO size, that's 128 > > <=4k files packed into each IO, > > This is an argument based on a cunning or distracted or ignorant > shift of the goalposts: because this is an argument about purely > *writing* the *data* in those small files, while the bigger > issue is *committing* the *metadata*, all of it "(including > directory updates)". Also, this argument is also based on the > assumption that it is permissible to commit 128 small files when > the last one gets closed, not when each gets committed. I haven't confused anything - indeed I explained exactly why the user got the results they did with ext3. You seem to be implying that the only way for data safety to be given is: write file fsync file fsync parent dir write file fsync file fsync parent dir ..... Which is, quite frankly, a load of bollocks. The user doesn't care if the untar is not complete because a crash occurred during it - they are still going to have to redo it from scratch regardless of whether file-by-file fsync is in use or not. Indeed, doing this: write file write file write file write file write file ..... sync Gives the same overall guarantees as your preferred method, but completes much, much faster. Taking 30s to write the files asynchronously and then another second or two for the sync to complete is far more appropriate for this workload than doing a file-by-file fsync. > In this discussion it is rather comical to make an argument > based on the speed of IO using what is in effect EatMyData as > described here: > > http://talk.maemo.org/showthread.php?t=67901 > > but here it is: /me starts laughing uncontrollably. The source: http://www.flamingspork.com/projects/libeatmydata/ for speeding up database testing where fsync is not needed to determine the success of the test or not. The fact is that the dpkg devs went completely nuts with fsync() when ext4 came around because it had problems with losing files when crashes occurred shortly after upgrades. It was excessive and unneccessary and didn't take into account the transactional grouping of updates. This problem has since been fixed - there is now a sync issued at the end of each package install so the data is on disk before the "installation complete" entry is updated in the dpkg database. A single sync rather than a sync-per-file is much, much faster, and matches the intended "transaction grouping" of the dpkg operation. With the recent addition of a "sync a single fs" syscall, it will get faster again.... > That's a fantastic result, somewhat over 1,300 small files per > second (14 commits per nominal IOPS), but "fantastic" (as in > fantasy) is the keyword, because it is for completely different > and broken semantics, a point that should not be lost on anybody > who can "understand IOPS and metadata and commits and caching". Where's the "broken semantics" here? The filesystem did exactly what you asked, and performed in exactly the way we'd expect it to. Atomicity and stability guarantees are application dependent - they are not defined by the filesystem. Fundamentally, untarring a kernel tarball does not require the same data safety semantics of databases nor does it need to deal with safely overwriting files. Sometimes people care more about performance than they do about data safety, and untarring some huge tarball is usually one of those cases. If they care about data safety, that is what sync(1) is for after the untar... > It is not as if the difference isn't widely known: > > http://cdrecord.berlios.de/private/man/star/star.1.html > > Star is a very fast tar(1) like tape archiver with improved > functionality. > On operating systems with slow file I/O (such as Linux), it > may help to use -no-fsync in addition, but then star is > unable to detect all error conditions; so use with care. Ah, quoting Joerg Schilling FUD about Linux. That's a good way to get people to ignore you.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From stan@hardwarefreak.com Sun May 1 23:31:58 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p424Vw6a208923 for ; Sun, 1 May 2011 23:31:58 -0500 X-ASG-Debug-ID: 1304310933-171201f90000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from greer.hardwarefreak.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 1DA6A1EC6C89 for ; Sun, 1 May 2011 21:35:33 -0700 (PDT) Received: from greer.hardwarefreak.com (mo-65-41-216-221.sta.embarqhsd.net [65.41.216.221]) by cuda.sgi.com with ESMTP id 1wPCy2XWKpBExEpI for ; Sun, 01 May 2011 21:35:33 -0700 (PDT) Received: from [192.168.100.53] (gffx.hardwarefreak.com [192.168.100.53]) by greer.hardwarefreak.com (Postfix) with ESMTP id 37CA66C0F2 for ; Sun, 1 May 2011 23:35:33 -0500 (CDT) Message-ID: <4DBE3492.4060303@hardwarefreak.com> Date: Sun, 01 May 2011 23:35:30 -0500 From: Stan Hoeppner User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: op-journaled fs, journal size and storage speeds Subject: Re: op-journaled fs, journal size and storage speeds References: <19900.8703.214676.218477@tree.ty.sabi.co.UK> <20110501092758.GG13542@dastard> In-Reply-To: <20110501092758.GG13542@dastard> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mo-65-41-216-221.sta.embarqhsd.net[65.41.216.221] X-Barracuda-Start-Time: 1304310934 X-Barracuda-Bayes: INNOCENT GLOBAL 0.4814 1.0000 0.0000 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.60 X-Barracuda-Spam-Status: No, SCORE=0.60 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_SC5_MJ1963, RDNS_DYNAMIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62525 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS 0.50 BSF_SC5_MJ1963 Custom Rule MJ1963 X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/1/2011 4:27 AM, Dave Chinner wrote: > Got a supplier for the custom hardware you'd need? Just use a PCIe > SSD.... 50GB OCZ RevoDrive PCIe x4 SSD MLC NAND Dual SandForce 1200 controllers, internal RAID 0 design 70,000 write IOPS, 4KB aligned 350MB/s write sustained $200 USD at Newegg: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227596 Current best value for a PCIe SSD suitable for dedicated log drive use, can fit ~22 maximum size (2GB) XFS logs. Note the MLC NAND. If all your filesystems will sustain constant high rate metadata writes, an SLC based product is more suitable, though price is 10-50x higher for PCIe SLC cards. If you want/need the 10x increase in flash cell life of SLC NAND, go with this Intel SLC SATAII SSD for ~2x the $$ of the Revo. Note it's write IOPS is 'only' 33k, size is 32GB, 18GB less. http://www.newegg.com/Product/Product.aspx?Item=N82E16820167013 -- Stan From lists@nerdbynature.de Sun May 1 23:56:04 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p424u3Ql210425 for ; Sun, 1 May 2011 23:56:04 -0500 X-ASG-Debug-ID: 1304312377-0b6f01000000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 117EB1647674 for ; Sun, 1 May 2011 21:59:37 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id wDMYbSVuHuIYYzFf for ; Sun, 01 May 2011 21:59:37 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id 5C2283DFAF; Mon, 2 May 2011 06:59:36 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 341333DCB8; Mon, 2 May 2011 06:59:35 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 1CD693DB79; Mon, 2 May 2011 06:59:35 +0200 (CEST) Date: Sun, 1 May 2011 21:59:35 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: <20110501080149.GD13542@dastard> Message-ID: References: <20110427022655.GE12436@dastard> <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304312379 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62526 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, 1 May 2011 at 18:01, Dave Chinner wrote: > I really don't know why the xfs inode cache is not being trimmed. I > really, really need to know if the XFS inode cache shrinker is > getting blocked or not running - do you have those sysrq-w traces > when near OOM I asked for a while back? I tried to generate those via /proc/sysrq-trigger (don't have a F13/Print Screen key), but the OOM killer kicks in prett fast - so fast thay my debug script, trying to generate sysrq-w every second was too late and the machine was already dead: http://nerdbynature.de/bits/2.6.39-rc4/oom/ * messages-10.txt.gz * slabinfo-10.txt.bz2 Timeline: - du(1) started at 12:25:16 (and immediately listed as "blocked" task) - the last sysrq-w succeeded at 12:38:05, listing kswapd0 - du invoked oom-killer at 12:38:06 I'll keep trying... > scan only scanned 516 pages. I can't see it freeing many inodes > (there's >600,000 of them in memory) based on such a low page scan > number. Not sure if this is related...this XFS filesytem I'm running du(1) on is ~1 TB in size, with 918K allocated inodes, if df(1) is correct: # df -hi /mnt/backup/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/wdc1 37M 918K 36M 3% /mnt/backup > Maybe you should tweak /proc/sys/vm/vfs_cache_pressure to make it > reclaim vfs structures more rapidly. It might help /proc/sys/vm/vfs_cache_pressure is currently set to '100'. You mean I should increase it? To..150? 200? 1000? Thanks, Christian. -- BOFH excuse #347: The rubber band broke From ajeet.yadav.77@gmail.com Mon May 2 00:35:57 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, T_DKIM_INVALID autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p425Zvgu214198 for ; Mon, 2 May 2011 00:35:57 -0500 X-ASG-Debug-ID: 1304314772-0b7203d20000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-vw0-f53.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 61E5D16474AF for ; Sun, 1 May 2011 22:39:32 -0700 (PDT) Received: from mail-vw0-f53.google.com (mail-vw0-f53.google.com [209.85.212.53]) by cuda.sgi.com with ESMTP id kTcyK8CsN7bzrL4m for ; Sun, 01 May 2011 22:39:32 -0700 (PDT) Received: by vws13 with SMTP id 13so4432755vws.26 for ; Sun, 01 May 2011 22:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=mQZJmKZGCGMijXcKZKfnx6vd08x5IdGxST6kPn6GVsw=; b=CzHdiclG5K46Wf+Hapzv2k5OrLCQPxT/kULwkTQ5KKQY3LnqobWt0vzfhSvY9lslJD EQqDX9749/Qp4RBjoylZQ5P6SSgoaDx8eDCgeVACTlKsJoAwGhgwdOKEBnrrwdAwPXYV jfHXduC2JwwYjYjsCrL6A9uQVc5x7idpzR/xo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=b1ZF0LVKjsllJNrPyZNpKmdhbyHYekKoIboEXv/afg4WTCxOsRa578sbXpAC9F+Oez kzfsec0t/Re6yVZ4I56GZU2zTz8CAERLugyDmNQcdi3ANNrVaqOumqb9x8SxBe4wIoDF k1pLVZPFlZ7nWWURorN+VxZRLrAz2hw7mHKd0= MIME-Version: 1.0 Received: by 10.52.72.229 with SMTP id g5mr2839273vdv.56.1304314772324; Sun, 01 May 2011 22:39:32 -0700 (PDT) Received: by 10.220.169.145 with HTTP; Sun, 1 May 2011 22:39:32 -0700 (PDT) In-Reply-To: References: <20110422065120.GB14189@infradead.org> Date: Mon, 2 May 2011 11:09:32 +0530 Message-ID: X-ASG-Orig-Subj: Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4 Subject: Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4 From: Ajeet Yadav To: Christoph Hellwig Cc: xfs@oss.sgi.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: mail-vw0-f53.google.com[209.85.212.53] X-Barracuda-Start-Time: 1304314773 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0209 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62530 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean It will be fine for me, if you have received the xfs_metadump file I sent in last mail. I am sure it will help you find problem in repair btree, please correct me if I left you anything from my side. On Tue, Apr 26, 2011 at 5:59 PM, Ajeet Yadav wro= te: > Sorry for delay, please find attached the xfs_metadump of xfs file system > > On Tue, Apr 26, 2011 at 12:47 PM, Ajeet Yadav = wrote: >> Sorry for delay, please find the metadump of file system. >> >> On Fri, Apr 22, 2011 at 12:21 PM, Christoph Hellwig = wrote: >>> The patch looks good to me. =C2=A0But I'm a bit worried about the lack = of >>> test coverage. =C2=A0As Eric said if you're able to get a metadump of >>> a filesystem that shows this issue it would come in useful for >>> regression testing. >>> >>> >> > From ajeet.yadav.77@gmail.com Mon May 2 00:37:39 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, T_DKIM_INVALID autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p425bdxF214267 for ; Mon, 2 May 2011 00:37:39 -0500 X-ASG-Debug-ID: 1304314874-0b7403d00000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-vw0-f53.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 80332164739B for ; Sun, 1 May 2011 22:41:15 -0700 (PDT) Received: from mail-vw0-f53.google.com (mail-vw0-f53.google.com [209.85.212.53]) by cuda.sgi.com with ESMTP id 9AIlRBKo64NFIT9M for ; Sun, 01 May 2011 22:41:15 -0700 (PDT) Received: by vws13 with SMTP id 13so4433232vws.26 for ; Sun, 01 May 2011 22:41:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=YHb8XRfnAo/pqbY3dvzaVKb2aRCnwFdetmQmYZ3dd3M=; b=XE06kXUU/LbR3QJyYdNvuzshmAMTZ0Peh+PNPW60ZXxW89lUAO+dbhKb0ilv8AIKMx b4tVGuK/ew01v2PtMgGf94ly4TRHxOEywyZTcjvqpaBUslRXkY6ma44rc9aSf8+xnyJe ZeGTLiC3MtbMvPncdUQ8PawGHlcFR42fsxq9I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=LU9UL52g/Z+DUAhhXJdZfYPN7sqQ7/i+JXTO2UzVI8QUkNgGzZ2OMfqOgEB7BsHYkj mzVOMWHoVvxvp9D3oActJf8BR1+6KV/aX8MZ2mbH2E6X9RZbXt3xp1MXoakDTHJdOhyb x3bla75jgL6+IR2+tbldC9JAnjAW9CQ/2CzEY= MIME-Version: 1.0 Received: by 10.220.189.70 with SMTP id dd6mr2225856vcb.116.1304314874266; Sun, 01 May 2011 22:41:14 -0700 (PDT) Received: by 10.220.169.145 with HTTP; Sun, 1 May 2011 22:41:14 -0700 (PDT) In-Reply-To: References: <20110427171107.GA29196@infradead.org> Date: Mon, 2 May 2011 11:11:14 +0530 Message-ID: X-ASG-Orig-Subj: Re: xfstests 013 - 2.6.35.11 - hang Subject: Re: xfstests 013 - 2.6.35.11 - hang From: Ajeet Yadav To: Christoph Hellwig Cc: xfs@oss.sgi.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: mail-vw0-f53.google.com[209.85.212.53] X-Barracuda-Start-Time: 1304314875 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62530 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Is there any thing I left out in xfs related to cache coherency. On Thu, Apr 28, 2011 at 12:59 PM, Ajeet Yadav wr= ote: > MIPS32=C2=AE 34K=E2=84=A2 Core > It does not provide =C2=A0invalidate_kernel_vmap_range() / > flush_kernel_vmap_range() to deal with cache coherency problem > Therefore we provided dma_cache_inv() / dma_cache_wback_inv() , after > that we did not had any coherency problem > But as you saying it still has cache coherency problem, wrt xfstests 001 = ? > > diff -Nurp -X linux-2.6.35.11/Documentation/dontdiff > linux-2.6.35.11/fs/xfs/linux-2.6/xfs_buf.c > linux-2.6.35.11-dirty/fs/xfs/linux-2.6/xfs_buf.c > --- linux-2.6.35.11/fs/xfs/linux-2.6/xfs_buf.c =C2=A02011-02-07 > 04:04:07.000000000 +0900 > +++ linux-2.6.35.11-dirty/fs/xfs/linux-2.6/xfs_buf.c =C2=A0 =C2=A02011-03= -22 > 18:29:09.000000000 +0900 > @@ -1192,7 +1192,7 @@ xfs_buf_bio_end_io( > =C2=A0 =C2=A0 =C2=A0 =C2=A0xfs_buf_ioerror(bp, -error); > > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!error && xfs_buf_is_vmapped(bp) && (bp->b= _flags & XBF_READ)) > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 invalidate_kernel_vmap= _range(bp->b_addr, xfs_buf_vmap_len(bp)); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dma_cache_inv((unsigne= d long)bp->b_addr, xfs_buf_vmap_len(bp)); > > =C2=A0 =C2=A0 =C2=A0 =C2=A0do { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct page =C2=A0= =C2=A0 *page =3D bvec->bv_page; > @@ -1304,7 +1304,7 @@ next_chunk: > =C2=A0submit_io: > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (likely(bio->bi_size)) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (xfs_buf_is_vma= pped(bp)) { > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 flush_kernel_vmap_range(bp->b_addr, > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 dma_cache_wback_inv((unsigned long)bp->b_addr, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0xfs_buf_vmap_len(bp)); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0submit_bio(rw, bio= ); > > On Wed, Apr 27, 2011 at 10:41 PM, Christoph Hellwig w= rote: >> Does 001 also fail for say ext2 and ext3? >> >> > From markus@trippelsdorf.de Mon May 2 01:11:56 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER, T_DKIM_INVALID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p426BuSt216077 for ; Mon, 2 May 2011 01:11:56 -0500 X-ASG-Debug-ID: 1304316929-13d7017a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.ud10.udmedia.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0E0FC511FC0 for ; Sun, 1 May 2011 23:15:30 -0700 (PDT) Received: from mail.ud10.udmedia.de (ud10.udmedia.de [194.117.254.50]) by cuda.sgi.com with ESMTP id AaC7oIEpy54xqAtc for ; Sun, 01 May 2011 23:15:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=mail.ud10.udmedia.de; h= date:from:to:cc:subject:message-id:references:mime-version: content-type:content-transfer-encoding:in-reply-to; q=dns/txt; s= beta; bh=r05/Zo0DvQEji3CFaeXF/3RVHqEVKIlLm7Q/dx3RXRs=; b=WhivU54 mVEBADEXcfUC119zQf2PirWwL3isOWW0m69N6Px8oku5bp1LjZ/xPxY2aziBqsQB afthJdws9pxs0TIRaCoI7rOfhRu09gTNqX/q3LFIk2ffxWWj27BuDC7EM8w1QDrs hmBCVlPGXoPI8LgzXkMFBQj+1PdcAbOcIzZA= Received: (qmail 8848 invoked from network); 2 May 2011 08:15:28 +0200 Received: from unknown (HELO x4.trippels.de) (ud10?360p3@91.66.182.48) by mail.ud10.udmedia.de with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 2 May 2011 08:15:28 +0200 Date: Mon, 2 May 2011 08:15:28 +0200 From: Markus Trippelsdorf To: Bruno =?iso-8859-1?Q?Pr=E9mont?= Cc: Dave Chinner , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner , linux-kernel@vger.kernel.org, James Bottomley X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110502061528.GA22538@x4.trippels.de> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110429151841.GA893@x4.trippels.de> <20110429213524.449e003b@neptune.home> <20110430161810.6ccd2c99@neptune.home> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20110430161810.6ccd2c99@neptune.home> X-Barracuda-Connect: ud10.udmedia.de[194.117.254.50] X-Barracuda-Start-Time: 1304316931 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62531 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 2011.04.30 at 16:18 +0200, Bruno Prémont wrote: > On Fri, 29 April 2011 Bruno Prémont wrote: > > On Fri, 29 April 2011 Markus Trippelsdorf wrote: > > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > > > OK, so the common elements here appears to be root filesystems > > > > with small log sizes, which means they are tail pushing all the > > > > time metadata operations are in progress. Definitely seems like a > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > reproduce this and cook up a patch to fix it. > > > > > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > > > that Thomas Gleixner fixed yesterday: > > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > > > > > It also looks similar to the issue that James Bottomley reported > > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ > > > > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without > > other changes to kernel). > > Going to run that kernel for the week-end and beyond if it survives to see what > > happens. > > Happened again (after a few hours of uptime), so it definitely is not > caused by hrtimer bug that Thomas Gleixner fixed. I've enabled lock debugging and this is what happened after a few hours uptime. (I can't tell if this is a false positive): ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.39-rc5-00130-g3fd9952 #10 ------------------------------------------------------- kio_file/7364 is trying to acquire lock: (&sb->s_type->i_mutex_key#5/2){+.+...}, at: [] generic_file_splice_write+0xce/0x180 but task is already holding lock: (xfs_iolock_active){++++++}, at: [] xfs_ilock+0x125/0x1f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (xfs_iolock_active){++++++}: [] lock_acquire+0x92/0x1f0 [] down_write_nested+0x2f/0x60 [] xfs_ilock+0x125/0x1f0 [] xfs_file_buffered_aio_write+0x66/0x290 [] xfs_file_aio_write+0x161/0x300 [] do_sync_write+0xd2/0x110 [] vfs_write+0xaf/0x160 [] sys_write+0x4a/0x90 [] system_call_fastpath+0x16/0x1b -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}: [] lock_acquire+0x92/0x1f0 [] mutex_lock_nested+0x51/0x370 [] vfs_rename+0xed/0x420 [] sys_renameat+0x207/0x230 [] sys_rename+0x1b/0x20 [] system_call_fastpath+0x16/0x1b -> #0 (&sb->s_type->i_mutex_key#5/2){+.+...}: [] __lock_acquire+0x169f/0x1b90 [] lock_acquire+0x92/0x1f0 [] mutex_lock_nested+0x51/0x370 [] generic_file_splice_write+0xce/0x180 [] xfs_file_splice_write+0xf4/0x250 [] do_splice_from+0x7e/0xb0 [] direct_splice_actor+0x20/0x30 [] splice_direct_to_actor+0xbe/0x1c0 [] do_splice_direct+0x78/0x90 [] do_sendfile+0x182/0x1d0 [] sys_sendfile64+0x5a/0xb0 [] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by kio_file/7364: #0: (xfs_iolock_active){++++++}, at: [] xfs_ilock+0x125/0x1f0 stack backtrace: Pid: 7364, comm: kio_file Not tainted 2.6.39-rc5-00130-g3fd9952 #10 Call Trace: [] print_circular_bug+0xb8/0xc7 [] __lock_acquire+0x169f/0x1b90 [] ? __generic_file_splice_read+0x1cd/0x5c0 [] lock_acquire+0x92/0x1f0 [] ? generic_file_splice_write+0xce/0x180 [] ? sock_def_write_space+0x140/0x140 [] mutex_lock_nested+0x51/0x370 [] ? generic_file_splice_write+0xce/0x180 [] generic_file_splice_write+0xce/0x180 [] xfs_file_splice_write+0xf4/0x250 [] ? xfs_file_splice_read+0xef/0x220 [] do_splice_from+0x7e/0xb0 [] direct_splice_actor+0x20/0x30 [] splice_direct_to_actor+0xbe/0x1c0 [] ? do_splice_from+0xb0/0xb0 [] do_splice_direct+0x78/0x90 [] do_sendfile+0x182/0x1d0 [] sys_sendfile64+0x5a/0xb0 [] system_call_fastpath+0x16/0x1b -- Markus From lists@nerdbynature.de Mon May 2 04:22:44 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p429Mi0J224245 for ; Mon, 2 May 2011 04:22:44 -0500 X-ASG-Debug-ID: 1304328379-624202b10000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 6013D1EC7317 for ; Mon, 2 May 2011 02:26:19 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id eidgDQApxYsU5oKf for ; Mon, 02 May 2011 02:26:19 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id D05973DFAF; Mon, 2 May 2011 11:26:18 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 379C63DCB8; Mon, 2 May 2011 11:26:17 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 1EC0E3DB79; Mon, 2 May 2011 11:26:17 +0200 (CEST) Date: Mon, 2 May 2011 02:26:17 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: <20110501080149.GD13542@dastard> Message-ID: References: <20110427022655.GE12436@dastard> <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304328380 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62545 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, 1 May 2011 at 18:01, Dave Chinner wrote: > I really don't know why the xfs inode cache is not being trimmed. I > really, really need to know if the XFS inode cache shrinker is > getting blocked or not running - do you have those sysrq-w traces > when near OOM I asked for a while back? Here's another attempt at getting those: http://nerdbynature.de/bits/2.6.39-rc4/oom/ * messages-11.txt.gz & slabinfo-11.txt.bz2 - oom-killer at 00:05:04 - last sysrq-w to succeed at 00:05:03 * messages-12.txt.gz & slabinfo-12.txt.bz2, along with meminfo-post-oom-12.txt & sysrq-w_post-oom-12.jpg could be more interesting: - last sysrq-w to succeed at 01:27:08 - oom-killer at 01:27:11 ...but after the OOM-killer was killing quite a few processes, MemFree showed 511236 kB free memory, yet ssh logins were still being killed. Finally I got a root shell on the box, issued sysrq-w again and even executed /bin/sync, which came back. But looking at the logs now nothing went to the disk (/var/log resides on / which is a ext4 fs). See sysrq-w_post-oom-12.jpg for a sysrq-w I took 2381s after boot time, or 01:32 - syslog stopped on 01:27. I shall try again with netconsole loggin or something... HTH & thanks for looking into this, Christian. -- BOFH excuse #176: vapors from evaporating sticky-note adhesives From jhon21@gmail.com Mon May 2 04:45:32 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.7 required=5.0 tests=BAYES_80,URIBL_BLACK autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p429jWqc225807 for ; Mon, 2 May 2011 04:45:32 -0500 X-ASG-Debug-ID: 1304329746-2408037f0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from megatron.dijihost1.co.uk (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 71FC41647C99 for ; Mon, 2 May 2011 02:49:06 -0700 (PDT) Received: from megatron.dijihost1.co.uk (static-89-255-132-95.oosha.co.uk [89.255.132.95]) by cuda.sgi.com with ESMTP id gzJ3a3EzCB0KXMnK for ; Mon, 02 May 2011 02:49:06 -0700 (PDT) Received: from cubework by megatron.dijihost1.co.uk with local (Exim 4.69) (envelope-from ) id 1QGpkV-0004vJ-W0 for xfs@oss.sgi.com; Mon, 02 May 2011 10:49:04 +0100 To: xfs@oss.sgi.com X-ASG-Orig-Subj: Product Recommended by jhonny Subject: Product Recommended by jhonny MIME-Version: 1.0 From: jhonny X-Mailer: CubeCart Mailer Reply-To: jhon21@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Message-ID: Date: Mon, 02 May 2011 10:49:03 +0100 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - megatron.dijihost1.co.uk X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [668 664] / [47 12] X-AntiAbuse: Sender Address Domain - gmail.com X-Barracuda-Connect: static-89-255-132-95.oosha.co.uk[89.255.132.95] X-Barracuda-Start-Time: 1304329747 X-Barracuda-Bayes: INNOCENT GLOBAL 0.3361 1.0000 -0.2034 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -0.20 X-Barracuda-Spam-Status: No, SCORE=-0.20 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62546 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Dear maam/sir, Hi- What if six clicks of mouse could place you in the elite of affiliate marketers? Imagine what you would do with a flood traffic pumping up your commissions into the six and seven figure range? Find out here: http://biz4wealth4.cz.cc/affecc.php?e=xfs@oss.sgi.com Later, Lea Smith USA click the link below to unsubscribe: http://biz4wealth4.cz.cc/un.php?e=xfs@oss.sgi.com . ~~~~~~~~~~~~~~~~~~~~~~~~~~ To view this product please follow the link below: http://dev.cubework.co.uk/index.php?_a=viewProd&productId=6 ~~~~~~~~~~~~~~~~~~~~~~~~~~ This email was sent from http://dev.cubework.co.uk Sender's IP Address: 112.201.233.47 From BATV+c7f46437ce921d9b3da5+2808+infradead.org+hch@bombadil.srs.infradead.org Mon May 2 05:10:30 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42AATnI226722 for ; Mon, 2 May 2011 05:10:30 -0500 X-ASG-Debug-ID: 1304331245-46f1030f0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B5DF71647DC2 for ; Mon, 2 May 2011 03:14:05 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id PRKHQ0CNeNUYth2x for ; Mon, 02 May 2011 03:14:05 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QGq8f-0002XV-AV; Mon, 02 May 2011 10:14:01 +0000 Date: Mon, 2 May 2011 06:14:01 -0400 From: Christoph Hellwig To: Markus Trippelsdorf Cc: Dave Chinner , xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem Message-ID: <20110502101401.GA9155@infradead.org> References: <4DB72084.8020205@inf.ethz.ch> <20110427023534.GF12436@dastard> <201104291827.35801.Martin@lichtvoll.de> <20110501085246.GF13542@dastard> <20110501165546.GB5391@infradead.org> <20110501182442.GA1635@x4.trippels.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110501182442.GA1635@x4.trippels.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304331245 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, May 01, 2011 at 08:24:42PM +0200, Markus Trippelsdorf wrote: > Another thing is that with the recent updates to block FLUSH handling, > using FUA might even be less efficient. The new implementation > aggressively merges those commit writes and flushes. IOW, depending > on timing, multiple consecutive commit writes can be merged as, > > FLUSH + commit writes + FLUSH > > or > > FLUSH + some commit writes + FLUSH + other commit writes + FLUSH > > and so on, Except that writing multiple log buffers right next to each other is rather unusual - you'd have to have a burst of metadata only operations to get there. What's more common is that a log write interrupts streams of actual data I/O, and the longer we drain the queue the more performance impact it has. Moreover I'm working on avoiding the pre-flush if it's not needed, e.g. there were no appending writes, and there as no pushing of the log tail required, in which case the log write will only be a write with FUA set, with no FLUSH and thus no queue draining on SATA at all. Also when you move away from SATA to higher latency links like FC or iSCSI (maybe even over a WAN) avoiding protocol roundtrips buys you a lot of performance. From BATV+c7f46437ce921d9b3da5+2808+infradead.org+hch@bombadil.srs.infradead.org Mon May 2 05:36:56 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42AatUj227603 for ; Mon, 2 May 2011 05:36:56 -0500 X-ASG-Debug-ID: 1304332832-4da800b30000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 75ECE1EC75C4 for ; Mon, 2 May 2011 03:40:32 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id 1fGBRy06LQmMQHhG for ; Mon, 02 May 2011 03:40:32 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QGqYJ-0007h3-4j; Mon, 02 May 2011 10:40:31 +0000 Date: Mon, 2 May 2011 06:40:31 -0400 From: Christoph Hellwig To: Peter Grandi Cc: Dave Chinner , Linux fs XFS , Linux fs JFS X-ASG-Orig-Subj: Re: op-journaled fs, journal size and storage speeds Subject: Re: op-journaled fs, journal size and storage speeds Message-ID: <20110502104031.GA22953@infradead.org> References: <19900.8703.214676.218477@tree.ty.sabi.co.UK> <20110501092758.GG13542@dastard> <19901.41647.606112.243194@tree.ty.sabi.co.UK> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <19901.41647.606112.243194@tree.ty.sabi.co.UK> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304332832 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, May 01, 2011 at 07:13:03PM +0100, Peter Grandi wrote: > > That's why you can configure an external log.... > > ...and lose barriers :-). But indeed. Using a writeback cache on the log device is rather pointless as every writes needs write through semantics using FUA or a post-flush anyway. But I actually have patch to allow for devices with a writeback cache in external log configurations, it's just a bit complicated as we basically need to copy the pre-flush statemachine into XFS to deal with the preflush beeing for a different device than the actual write. > >> But if they can be pretty small, I wonder whether putting the > >> journals of several filesystems on the same storage device then > >> becomes a sensible option as the locality will be quite narrow > >> (e.g. a single physical cylinder) or it could be wortwhile like > >> the database people do to journal to battery-backed RAM. > > For example as described in this old paper: It only makes sense if the log activity bursts for the different filesystems happen at different times, or none of the filesystems maxes out the log IOP rate. > But they seem to me fundamentally terrible for journals, because > of the large erase blocks sizes and the enormous latency of erase > operations (lots of read-erase-write cycles for small commits). > They seem more oriented to large mostly read-only data sets than > very small mostly write ones. As mentioned earlier in this thread XFS allows to align and pad log writes. Just make sure to get a device with an erase block size <= 256 kilobytes, which usually means SLC. But even drives with a larger erase block size and sane firmware tend to be faster than plain old disks. But as Dave mentioned there's nothing that's going to beat a battery backed cache/memory for log IOP performance. > The saving grace is the capacitor-backed RAM in SSDs (used to work > around erase block size issues as you probably know) which to a > significant extent may act as the battery-backed RAM I was > mentioning; and similarly as another post says the battery-backed > RAM in RAID host adapters would do much the same function. Just make sure your device actually has it. Both the Intel X25 SSDs and many other consumer / prosumer SSDs actually don't have them and will lose data in case of a powerloss. From david@fromorbit.com Mon May 2 07:16:28 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42CGS4V231162 for ; Mon, 2 May 2011 07:16:28 -0500 X-ASG-Debug-ID: 1304338800-2fed00a40000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 618D91EC7831 for ; Mon, 2 May 2011 05:20:01 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id iw9O7oVcwqwfOX4b for ; Mon, 02 May 2011 05:20:01 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEABGhvk15LBza/2dsb2JhbACmG3jADQ6DEYJhBJ0t Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 02 May 2011 21:50:00 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGs6Y-0000rN-BI; Mon, 02 May 2011 22:19:58 +1000 Date: Mon, 2 May 2011 22:19:58 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110502121958.GA2978@dastard> References: <20110427022655.GE12436@dastard> <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304338803 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62557 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, May 01, 2011 at 09:59:35PM -0700, Christian Kujau wrote: > On Sun, 1 May 2011 at 18:01, Dave Chinner wrote: > > I really don't know why the xfs inode cache is not being trimmed. I > > really, really need to know if the XFS inode cache shrinker is > > getting blocked or not running - do you have those sysrq-w traces > > when near OOM I asked for a while back? > > I tried to generate those via /proc/sysrq-trigger (don't have a F13/Print > Screen key), but the OOM killer kicks in prett fast - so fast thay my > debug script, trying to generate sysrq-w every second was too late and the > machine was already dead: > > http://nerdbynature.de/bits/2.6.39-rc4/oom/ > * messages-10.txt.gz > * slabinfo-10.txt.bz2 > > Timeline: > - du(1) started at 12:25:16 (and immediately listed > as "blocked" task) > - the last sysrq-w succeeded at 12:38:05, listing kswapd0 > - du invoked oom-killer at 12:38:06 > > I'll keep trying... > > > scan only scanned 516 pages. I can't see it freeing many inodes > > (there's >600,000 of them in memory) based on such a low page scan > > number. > > Not sure if this is related...this XFS filesytem I'm running du(1) on is > ~1 TB in size, with 918K allocated inodes, if df(1) is correct: > > # df -hi /mnt/backup/ > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/mapper/wdc1 37M 918K 36M 3% /mnt/backup > > > Maybe you should tweak /proc/sys/vm/vfs_cache_pressure to make it > > reclaim vfs structures more rapidly. It might help > > /proc/sys/vm/vfs_cache_pressure is currently set to '100'. You mean I > should increase it? To..150? 200? 1000? Yes. Try 2 orders of magnitude as a start. i.e change it to 10000... Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Mon May 2 07:34:52 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42CYqnb231773 for ; Mon, 2 May 2011 07:34:52 -0500 X-ASG-Debug-ID: 1304339907-44e700710000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 9E37A1EC7B03 for ; Mon, 2 May 2011 05:38:27 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id 1ctPhYicZykCol8u for ; Mon, 02 May 2011 05:38:27 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAJCkvk15LBza/2dsb2JhbACmG3jABg6FcgSdLQ Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 02 May 2011 22:08:26 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGsOO-0000tH-JV; Mon, 02 May 2011 22:38:24 +1000 Date: Mon, 2 May 2011 22:38:24 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110502123824.GB2978@dastard> References: <20110427022655.GE12436@dastard> <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304339908 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62557 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 02, 2011 at 02:26:17AM -0700, Christian Kujau wrote: > On Sun, 1 May 2011 at 18:01, Dave Chinner wrote: > > I really don't know why the xfs inode cache is not being trimmed. I > > really, really need to know if the XFS inode cache shrinker is > > getting blocked or not running - do you have those sysrq-w traces > > when near OOM I asked for a while back? > > Here's another attempt at getting those: > > http://nerdbynature.de/bits/2.6.39-rc4/oom/ > * messages-11.txt.gz & slabinfo-11.txt.bz2 > - oom-killer at 00:05:04 > - last sysrq-w to succeed at 00:05:03 > > * messages-12.txt.gz & slabinfo-12.txt.bz2, along > with meminfo-post-oom-12.txt & sysrq-w_post-oom-12.jpg could > be more interesting: > - last sysrq-w to succeed at 01:27:08 > - oom-killer at 01:27:11 > > ...but after the OOM-killer was killing quite a few processes, MemFree > showed 511236 kB free memory, yet ssh logins were still being killed. > Finally I got a root shell on the box, issued sysrq-w again and even > executed /bin/sync, which came back. But looking at the logs now > nothing went to the disk (/var/log resides on / which is a ext4 fs). > See sysrq-w_post-oom-12.jpg for a sysrq-w I took 2381s after boot time, > or 01:32 - syslog stopped on 01:27. Same problem: MemFree: 511236 kB .... LowTotal: 759904 kB LowFree: 3804 kB i.e. that low memory is being exhausted by the slab cache, while there is lots of free high memory, and the low memory zone is marked as all unreclaimable.... The sysrq trace less than 1s before the first OOM shows this: [c00770ec] __lock_acquire+0x43c/0x1818 (unreliable) [c000a924] __switch_to+0x9c/0x128 [c0417580] schedule+0x274/0x8bc [c0418128] schedule_timeout+0x16c/0x214 [c04172a0] io_schedule_timeout+0xb0/0x11c [c00b153c] congestion_wait+0x8c/0xdc [c00aa43c] kswapd+0x6d0/0x884 [c005e3d0] kthread+0x84/0x88 [c0010908] kernel_thread+0x4c/0x68 Background memory reclaim appears to be blocked by IO congestion.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Mon May 2 07:36:35 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42CaZ0e231836 for ; Mon, 2 May 2011 07:36:35 -0500 X-ASG-Debug-ID: 1304340009-60d000500000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 585A416481E1; Mon, 2 May 2011 05:40:10 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id m0gH7H6Al9RLmBQh; Mon, 02 May 2011 05:40:10 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAJCkvk15LBza/2dsb2JhbACmG3jABg6FcgSdLQ Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 02 May 2011 22:10:08 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGsQ3-0000tZ-R6; Mon, 02 May 2011 22:40:07 +1000 Date: Mon, 2 May 2011 22:40:07 +1000 From: Dave Chinner To: Markus Trippelsdorf Cc: Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner , linux-kernel@vger.kernel.org, James Bottomley X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110502124007.GC2978@dastard> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110429151841.GA893@x4.trippels.de> <20110429213524.449e003b@neptune.home> <20110430161810.6ccd2c99@neptune.home> <20110502061528.GA22538@x4.trippels.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20110502061528.GA22538@x4.trippels.de> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304340011 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62558 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 02, 2011 at 08:15:28AM +0200, Markus Trippelsdorf wrote: > On 2011.04.30 at 16:18 +0200, Bruno Prémont wrote: > > On Fri, 29 April 2011 Bruno Prémont wrote: > > > On Fri, 29 April 2011 Markus Trippelsdorf wrote: > > > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > > > > that Thomas Gleixner fixed yesterday: > > > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > > > > > > > It also looks similar to the issue that James Bottomley reported > > > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ > > > > > > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without > > > other changes to kernel). > > > Going to run that kernel for the week-end and beyond if it survives to see what > > > happens. > > > > Happened again (after a few hours of uptime), so it definitely is not > > caused by hrtimer bug that Thomas Gleixner fixed. > > I've enabled lock debugging and this is what happened after a few hours > uptime. (I can't tell if this is a false positive): > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.39-rc5-00130-g3fd9952 #10 > ------------------------------------------------------- > kio_file/7364 is trying to acquire lock: > (&sb->s_type->i_mutex_key#5/2){+.+...}, at: [] generic_file_splice_write+0xce/0x180 > > but task is already holding lock: > (xfs_iolock_active){++++++}, at: [] xfs_ilock+0x125/0x1f0 > > which lock already depends on the new lock. Known problem. Been broken for ages, yet I only first saw a lockdep report for this about a week ago on a 2.6.32 kernel.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From powool@gmail.com Mon May 2 10:44:13 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_DKIM_INVALID autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42FiDHx238313 for ; Mon, 2 May 2011 10:44:13 -0500 X-ASG-Debug-ID: 1304351268-527602960000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-qy0-f174.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 6BCE3156AD48 for ; Mon, 2 May 2011 08:47:49 -0700 (PDT) Received: from mail-qy0-f174.google.com (mail-qy0-f174.google.com [209.85.216.174]) by cuda.sgi.com with ESMTP id rk58sMYznYFRtGUQ for ; Mon, 02 May 2011 08:47:49 -0700 (PDT) Received: by qyk7 with SMTP id 7so1548318qyk.5 for ; Mon, 02 May 2011 08:47:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:date:x-google-sender-auth :message-id:subject:from:to:content-type; bh=JkM/TWhaF1zr2k+APMhPzGcHFxnkJ4HBKRHaRtN5cIw=; b=YFNOnfmX2B7+89+APhTilsFZ6pwaeBddyhIC0mAsE7BfiNYuSb8Qgv/azdN+98HBPe RwFyiWVJRQARG0IssPcesEQAtx6HMplC/4qSdaYxAdE2JcqkfRcnr7FVL5BVDm03rrrV 41l6nZysC63Y7C/4NZEq1avsb4OhrIPqfwb2w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=vAYDOilpF11NaW5A5gJTfBSAeSFxtavCetU2312RMavPl3ACoRmg/0EkqhCLMRVzCX WSzCX05ysHrjQUUtm4zn/CK1g4td71h6QTRV8GUfSXwB7VjYqxa3ouwK/tKy1gKh/FkS BaHV/FmWtb6QMerh7n6DNBReSbdQCXNDzDVCM= MIME-Version: 1.0 Received: by 10.224.183.17 with SMTP id ce17mr6363776qab.352.1304351268373; Mon, 02 May 2011 08:47:48 -0700 (PDT) Sender: powool@gmail.com Received: by 10.224.45.144 with HTTP; Mon, 2 May 2011 08:47:48 -0700 (PDT) Date: Mon, 2 May 2011 11:47:48 -0400 X-Google-Sender-Auth: V4cUi6ym3Omb4CXFzTHBqUxjROA Message-ID: X-ASG-Orig-Subj: XFS/Linux Sanity check Subject: XFS/Linux Sanity check From: Paul Anderson To: xfs@oss.sgi.com Content-Type: text/plain; charset=ISO-8859-1 X-Barracuda-Connect: mail-qy0-f174.google.com[209.85.216.174] X-Barracuda-Start-Time: 1304351269 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62569 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Our genetic sequencing research group is growing our file storage from 1PB to 2PB. Our workload looks very much like large video processing might look - relatively low metadata, very, very high sequential I/O. The servers will either be doing very high I/O with local I/O bound jobs, or serving data via NFSv4 (or possibly custom data distribution means) to our compute grid for compute bound jobs. Our first PB of data is largely on Promise RAID arrays, all of which are set up with XFS. Generally, we're big fans of XFS for stability, high performance, and robustness in the face of crashes. We tried ZFS, ran into I/O throttling issues that at the time seem intractable (write picketing - essentially half the maximum write rate of hardware). We are deploying five Dell 810s, 192GiB RAM, 12 core, each with three LSI 9200-8E SAS controllers, and three SuperMicro 847 45 drive bay cabinets with enterprise grade 2TB drives. We're running Ubuntu 10.04 LTS, and have tried either the stock kernel (2.6.32-30) or 2.6.35 from linux.org. We organize the storage as one software (MD) RAID 0 composed of 7 software RAID (MD) 6s, each with 18 drives, giving 204 TiB usable (9 drives of the 135 are unused). XFS is set up properly (as far as I know) with respect to stripe and chunk sizes. Allocation groups are 1TiB in size, which seems sane for the size of files we expect to work with. In isolated testing, I see around 5GiBytes/second raw (135 parallel dd reads), and with a benchmark test of 10 simultaneous 64GiByte dd commands, I can see just shy of 2 GiBytes/second reading, and around 1.4GiBytes/second writing through XFS. The benchmark is crude, but fairly representative of our expected use. md apparently does not support barriers, so we are badly exposed in that manner, I know. As a test, I disabled write cache on all drives, performance dropped by 30% or so, but since md is apparently the problem, barriers still didn't work. Nonetheless, what we need, but don't have, is stability. With 2.6.32-30, we get reliable kernel panics after 2 days of sustained rsync to the machine (around 150-250MiBytes/second for the entire time - the source machines are slow), and with 2.6.35, we get a bad resource contention problem fairly quickly - much less than 24 hours (in this instance, we start getting XFS kernel thread timeouts similar to what I've seen posted here recently, but it isn't clear whether it is only XFS or also ext3 boot drives that are starved for I/O - suspending or killing all I/O load doesn't solve the problem - only a reboot does). Ideally, I'd firstly be able to find informed opinions about how I can improve this arrangement - we are mildly flexible on RAID controllers, very flexible on versions of Linux, etc, and can try other OS's as a last resort (but the leading contender here would be "something" running ZFS, and though I love ZFS, it really didn't seem to work well for our needs). Secondly, I welcome suggestions about which version of the linux kernel you'd prefer to hear bug reports about, as well as what kinds of output is most useful (we're getting all chassis set up with serial console so we can do kgdb and also full kernel panic output results). Thanks in advance, Paul Anderson Center for Statistical Genetics University of Michigan USA From BATV+c7f46437ce921d9b3da5+2808+infradead.org+hch@bombadil.srs.infradead.org Mon May 2 10:59:03 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_FRT_LOLITA1 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42FwuPD238865 for ; Mon, 2 May 2011 10:59:03 -0500 X-ASG-Debug-ID: 1304352152-645701620000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E85831E19B4C for ; Mon, 2 May 2011 09:02:32 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id sNdaL8cfolmy3iAJ for ; Mon, 02 May 2011 09:02:32 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QGvZw-0003lk-As for xfs@oss.sgi.com; Mon, 02 May 2011 16:02:32 +0000 Date: Mon, 2 May 2011 12:02:32 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH] xfstests: support post-udev device mapper nodes Subject: [PATCH] xfstests: support post-udev device mapper nodes Message-ID: <20110502160232.GA14457@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304352152 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Because of udevs complaining device mapper now creates /dev/dm-N as the real device nodes, and just symlinks the /dev/mapper/ names to it. This would be easy if everything used the /dev/mapper clear names, but most system utilities translate them back to the /dev/mapper/ names and thus confuse various test cases. Add support to _is_block_dev to read symlinks, and add documentation on how to run xfstests on device mapper volumes. Signed-off-by: Christoph Hellwig Index: xfstests-dev/common.rc =================================================================== --- xfstests-dev.orig/common.rc 2011-05-02 12:45:25.000000000 +0000 +++ xfstests-dev/common.rc 2011-05-02 12:45:28.000000000 +0000 @@ -587,7 +587,14 @@ _is_block_dev() exit 1 fi - [ -b $1 ] && src/lstat64 $1 | $AWK_PROG '/Device type:/ { print $9 }' + _dev=$1 + if [ -L ${_dev} ]; then + _dev=`readlink -f ${_dev}` + fi + + if [ -b ${_dev} ]; then + src/lstat64 ${_dev} | $AWK_PROG '/Device type:/ { print $9 }' + fi } # Do a command, log it to $seq.full, optionally test return status @@ -700,10 +707,12 @@ _require_scratch() *) if [ -z "$SCRATCH_DEV" -o "`_is_block_dev $SCRATCH_DEV`" = "" ] then + echo "no a block device"; _notrun "this test requires a valid \$SCRATCH_DEV" fi if [ "`_is_block_dev $SCRATCH_DEV`" = "`_is_block_dev $TEST_DEV`" ] then + echo "foo" _notrun "this test requires a valid \$SCRATCH_DEV" fi if [ ! -d "$SCRATCH_MNT" ] Index: xfstests-dev/README.device-mapper =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfstests-dev/README.device-mapper 2011-05-02 15:51:24.000000000 +0000 @@ -0,0 +1,8 @@ + +To use xfstests on device mapper always use the /dev/mapper/ symlinks, +not the /dev/dm-* devices, or the symlinks created by LVM. + +For example: + +TEST_DEV=/dev/mapper/test +SCRATCH_DEV=/dev/mapper/scratch From bonbons@linux-vserver.org Mon May 2 11:14:53 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42GErej239551 for ; Mon, 2 May 2011 11:14:53 -0500 X-ASG-Debug-ID: 1304353106-2fd203460000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from legolas.restena.lu (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C4447513C84 for ; Mon, 2 May 2011 09:18:26 -0700 (PDT) Received: from legolas.restena.lu (legolas.restena.lu [158.64.1.34]) by cuda.sgi.com with ESMTP id ZJ52tCm5HmXYHSX1 for ; Mon, 02 May 2011 09:18:26 -0700 (PDT) Received: from legolas.restena.lu (localhost [127.0.0.1]) by legolas.restena.lu (Postfix) with ESMTP id 452DA9DD26; Mon, 2 May 2011 18:18:25 +0200 (CEST) Received: from neptune.home (unknown [158.64.15.115]) by legolas.restena.lu (Postfix) with ESMTP id F0EE59DD23; Mon, 2 May 2011 18:18:24 +0200 (CEST) Date: Mon, 2 May 2011 18:18:11 +0200 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= To: Doug Nazar Cc: linux-kernel , Dave Chinner , xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfs hang on 2.6.39-rc5 Subject: Re: xfs hang on 2.6.39-rc5 Message-ID: <20110502181811.61dd09eb@neptune.home> In-Reply-To: <4DBD7880.2040804@gmail.com> References: <4DBD7880.2040804@gmail.com> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.1; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Scanned: ClamAV X-Barracuda-Connect: legolas.restena.lu[158.64.1.34] X-Barracuda-Start-Time: 1304353108 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62571 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Status: Clean On Sun, 01 May 2011 Doug Nazar wrote: > On two different hosts, both running 2.6.39-rc5-00123-g33b6c92, xfs > seems to be locking up. > > I had to revert the box from the second trace but this trace is mounted > with: This looks related to at least: http://thread.gmane.org/gmane.linux.kernel/1130312/focus=1131769 > UUID=dcdc849e-bad8-4971-935c-223819a6dcc4 / > xfs noatime 0 1 > > which ends up being: > > /dev/sdb3 on / type xfs (rw,noatime,delaylog,noquota) > > meta-data=/dev/sdb3 isize=256 agcount=16, agsize=601056 blks > = sectsz=512 attr=0 > data = bsize=4096 blocks=9616896, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=4695, version=1 > = sectsz=512 sunit=0 blks, lazy-count=0 > realtime =none extsz=65536 blocks=0, rtextents=0 > > [38160.536046] INFO: task multilog:7586 blocked for more than 120 seconds. > [38160.536052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [38160.536059] multilog D 0012cfae 0 7586 7578 0x00000000 > [38160.536069] f5bcfd18 00000086 f9a4653a 0012cfae f9a33b34 00000024 > 6d783c2f 0000227f > [38160.536082] f5bcfcc8 f5bcfcd8 00000000 00000000 f5bcfd18 f5c91810 > f5c91810 00000024 > [38160.536092] 00d2e530 f1fd2e40 00001460 00dc5000 f5c1b780 0000695d > 00000000 f5bcfd18 > [38160.536103] Call Trace: > [38160.536188] [] ? kmem_alloc+0x51/0xbe [xfs] > [38160.536212] [] ? xlog_space_left+0x24/0xa9 [xfs] > [38160.536237] [] ? xlog_grant_push_ail+0xb8/0xdc [xfs] > [38160.536262] [] xlog_grant_log_space+0x173/0x42a [xfs] > [38160.536277] [] ? try_to_wake_up+0xd4/0xd4 > [38160.536302] [] xfs_log_reserve+0xab/0xfe [xfs] > [38160.536329] [] xfs_trans_reserve+0x74/0x1cb [xfs] > [38160.536357] [] xfs_rename+0x122/0x61a [xfs] > [38160.536367] [] ? link_path_walk+0x2de/0x77b > [38160.536375] [] ? generic_permission+0x1a/0x95 > [38160.536398] [] xfs_vn_rename+0x60/0x6a [xfs] > [38160.536407] [] vfs_rename+0x313/0x350 > [38160.536415] [] ? d_lookup+0x1e/0x3d > [38160.536422] [] sys_renameat+0x203/0x219 > [38160.536431] [] ? mntput+0x13/0x1f > [38160.536437] [] ? fput+0x118/0x1b0 > [38160.536443] [] sys_rename+0x28/0x2a > [38160.536451] [] sysenter_do_call+0x12/0x28 > > > > [94440.552055] INFO: task nfsd:8055 blocked for more than 120 seconds. > [94440.553587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [94440.556638] nfsd D df5df880 0 8055 2 0x00000000 > [94440.558258] d6fb3d78 00000046 d6941c90 df5df880 df2b1400 df5df8c0 > 5a0961bd 000055be > [94440.561634] d6fb3d28 03f35714 00014f61 00000000 d6fb3d78 d6fae370 > d6fae370 ffffff8c > [94440.565246] d6941cdc d6fb3d48 c10c4aa8 cb5aea80 cb918000 0004e83b > cb918201 e21adb34 > [94440.569039] Call Trace: > [94440.570871] [] ? d_obtain_alias+0x3e/0xe6 > [94440.572813] [] ? xlog_space_left+0x24/0xa9 [xfs] > [94440.574708] [] xlog_grant_log_space+0xec/0x42a [xfs] > [94440.576622] [] ? xlog_grant_push_ail+0xb8/0xdc [xfs] > [94440.578509] [] ? try_to_wake_up+0xd4/0xd4 > [94440.580451] [] xfs_log_reserve+0xab/0xfe [xfs] > [94440.582391] [] xfs_trans_reserve+0x74/0x1cb [xfs] > [94440.584353] [] xfs_remove+0xd9/0x333 [xfs] > [94440.586270] [] ? acl_permission_check+0x1b/0x8f > [94440.588209] [] ? generic_permission+0x1a/0x95 > [94440.590152] [] xfs_vn_unlink+0x30/0x6a [xfs] > [94440.592083] [] vfs_unlink+0x60/0xae > [94440.594006] [] nfsd_unlink+0x19e/0x21b [nfsd] > [94440.595943] [] ? nfsd4_encode_operation+0x56/0x161 [nfsd] > [94440.597911] [] nfsd4_remove+0x3e/0x114 [nfsd] > [94440.599878] [] nfsd4_proc_compound+0x334/0x3e9 [nfsd] > [94440.601853] [] ? nfsd4_decode_getattr+0x8/0xa [nfsd] > [94440.603832] [] ? nfs4svc_decode_compoundargs+0x268/0x342 > [nfsd] > [94440.605851] [] ? nfsd4_rename+0x1e0/0x1e0 [nfsd] > [94440.607826] [] nfsd_dispatch+0xbc/0x1e9 [nfsd] > [94440.609794] [] svc_process+0x401/0x740 [sunrpc] > [94440.611715] [] nfsd+0xae/0x130 [nfsd] > [94440.613620] [] ? nfsd_svc+0x197/0x197 [nfsd] > [94440.615495] [] ? nfsd_svc+0x197/0x197 [nfsd] > [94440.617277] [] kthread+0x67/0x69 > [94440.618996] [] ? kthreadd+0xa3/0xa3 > [94440.620641] [] kernel_thread_helper+0x6/0xd > > > Doug > From andi@firstfloor.org Mon May 2 12:06:54 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42H6sCB241351 for ; Mon, 2 May 2011 12:06:54 -0500 X-ASG-Debug-ID: 1304356230-499700fa0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mga11.intel.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id BA54716495A4 for ; Mon, 2 May 2011 10:10:30 -0700 (PDT) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by cuda.sgi.com with ESMTP id 4eAYCE0NYf5PXvjO for ; Mon, 02 May 2011 10:10:30 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 02 May 2011 10:10:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.64,303,1301900400"; d="scan'208";a="917005893" Received: from tassilo.jf.intel.com ([10.7.201.108]) by fmsmga001.fm.intel.com with ESMTP; 02 May 2011 10:10:06 -0700 Received: by tassilo.jf.intel.com (Postfix, from userid 501) id 6E7DE3E04A2; Mon, 2 May 2011 10:09:59 -0700 (PDT) From: Andi Kleen To: Paul Anderson Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS/Linux Sanity check Subject: Re: XFS/Linux Sanity check References: Date: Mon, 02 May 2011 10:09:59 -0700 In-Reply-To: (Paul Anderson's message of "Mon, 2 May 2011 11:47:48 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Barracuda-Connect: mga11.intel.com[192.55.52.93] X-Barracuda-Start-Time: 1304356230 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Paul Anderson writes: > > md apparently does not support barriers, so we are badly exposed in MD barriers for RAID-0 were added in 2.6.33, but very recent kernels have much improved barriers again. > Secondly, I welcome suggestions about which version of the linux > kernel you'd prefer to hear bug reports about, as well as what kinds Kernel developers usually want reports about the newest versions, ideally with enough information to debug the problem (that is backtraces etc.) When you have a hang but the console is still active you can also just dump the threads with sysrq-t Often when things hang it's the underlying IO subsystem (driver, IO device). -Andi -- ak@linux.intel.com -- Speaking for myself only From eflorac@intellique.com Mon May 2 12:09:46 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42H9jwc241416 for ; Mon, 2 May 2011 12:09:45 -0500 X-ASG-Debug-ID: 1304356398-42ba01480000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp4-g21.free.fr (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id F276F1649661 for ; Mon, 2 May 2011 10:13:19 -0700 (PDT) Received: from smtp4-g21.free.fr (smtp4-g21.free.fr [212.27.42.4]) by cuda.sgi.com with ESMTP id 5Ya4rPUpH28lp3GL for ; Mon, 02 May 2011 10:13:19 -0700 (PDT) Received: from harpe.intellique.com (unknown [82.225.196.72]) by smtp4-g21.free.fr (Postfix) with ESMTP id C9D7D4C8370; Mon, 2 May 2011 19:13:14 +0200 (CEST) Date: Mon, 2 May 2011 19:13:23 +0200 From: Emmanuel Florac To: Paul Anderson Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS/Linux Sanity check Subject: Re: XFS/Linux Sanity check Message-ID: <20110502191323.417ef644@harpe.intellique.com> In-Reply-To: References: Organization: Intellique X-Mailer: Claws Mail 3.7.9 (GTK+ 2.16.6; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: smtp4-g21.free.fr[212.27.42.4] X-Barracuda-Start-Time: 1304356401 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62576 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Le Mon, 2 May 2011 11:47:48 -0400 Paul Anderson =E9crivait: > We are deploying five Dell 810s, 192GiB RAM, 12 core, each with three > LSI 9200-8E SAS controllers, and three SuperMicro 847 45 drive bay > cabinets with enterprise grade 2TB drives. I have very little experience with these RAID coontrollers. However I have a 9212 4i4e (same card generation and same chipset) in test, and so far I must say it looks like _utter_ _crap_. The performance is abysmal (it's been busy rebuilding a 20TB array for... 6 days!); the server regularly freezes and crashes without any reason (it's a pure dev system with virtually zero load and zero IO); and there were lots of filesystem corruptions. I'm running a 2.6.32.25 64 bits plain vanilla kernel that poses no problem whatsoever with any other configuration. =20 > In isolated testing, I see around 5GiBytes/second raw (135 parallel dd > reads), and with a benchmark test of 10 simultaneous 64GiByte dd > commands, I can see just shy of 2 GiBytes/second reading, and around > 1.4GiBytes/second writing through XFS. The benchmark is crude, but > fairly representative of our expected use. I don't understand why there's such a gap between the raw and XFS performance. Generally XFS gives 90% performance or more of raw performance. =20 > md apparently does not support barriers, so we are badly exposed in > that manner, I know. As a test, I disabled write cache on all drives, > performance dropped by 30% or so, but since md is apparently the > problem, barriers still didn't work. Frankly, I'd stay away from md at this array size. I'm pretty sure you're exploring uncharted territory here.=20 > Ideally, I'd firstly be able to find informed opinions about how I can > improve this arrangement - we are mildly flexible on RAID controllers, > very flexible on versions of Linux, etc, and can try other OS's as a > last resort (but the leading contender here would be "something" > running ZFS, and though I love ZFS, it really didn't seem to work well > for our needs). I can't yet be sure because I plan more testing with this card, but I'd ditch the LSI controllers for LSI/3Ware or Adaptec (or Areca eventually), and stay away from md RAID and use hardware RAID. I'm an hardware RAID freak, but... hardware RAID allows proper write cache, for a start (because it has BBUs).=20 --=20 ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | | +33 1 78 94 84 02 ------------------------------------------------------------------------ From Joyce.Wright@lrsd.org Mon May 2 14:15:48 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42JFlTQ245070 for ; Mon, 2 May 2011 14:15:48 -0500 X-ASG-Debug-ID: 1304363961-3d1601580000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ironport.lrsd.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 9174B1578A1D for ; Mon, 2 May 2011 12:19:21 -0700 (PDT) Received: from ironport.lrsd.org (mail.lrsd.org [71.30.158.248]) by cuda.sgi.com with ESMTP id Dnzd5S6AyDEnaWUT for ; Mon, 02 May 2011 12:19:21 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation X-IronPort-AV: E=Sophos;i="4.64,304,1301875200"; d="scan'208";a="23324439" Received: from lrsdtcm03.lrsd.org ([10.17.23.185]) by ironport.lrsd.org with ESMTP; 02 May 2011 19:19:12 +0000 Received: from lrsdtcm03.lrsd.org ([10.17.23.185]) by lrsdtcm03.lrsd.org ([10.17.23.185]) with mapi; Mon, 2 May 2011 13:41:02 -0500 From: "Wright, Joyce" Date: Mon, 2 May 2011 13:41:01 -0500 X-ASG-Orig-Subj: Q-Limit Exceeded Subject: Q-Limit Exceeded Thread-Topic: Q-Limit Exceeded Thread-Index: AQHMCPh8WsQUiBGB3UCdjCmcukgsNA== Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: mail.lrsd.org[71.30.158.248] X-Barracuda-Start-Time: 1304363962 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com To: undisclosed-recipients:; X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean You may NOT receive or SEND emails, your mailbox quota limit exceeded. Please click- http://www.gymsations.com/forms/use/technical/form1.html System Administrator 192.168.0.1 From sandeen@redhat.com Mon May 2 14:19:33 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_66 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42JJWjs245143 for ; Mon, 2 May 2011 14:19:32 -0500 X-ASG-Debug-ID: 1304364187-2c8d01c60000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 1E65415D162A for ; Mon, 2 May 2011 12:23:07 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id 3IaimuiRghy16ggf for ; Mon, 02 May 2011 12:23:07 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p42JN6gU011943 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 2 May 2011 15:23:06 -0400 Received: from liberator.sandeen.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p42JN4QH020895 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 2 May 2011 15:23:05 -0400 Message-ID: <4DBF0498.6070905@redhat.com> Date: Mon, 02 May 2011 14:23:04 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Allison Henderson CC: linux-fsdevel , Ext4 Developers List , xfs-oss X-ASG-Orig-Subj: Re: [XFS Punch Hole 1/1] XFS Add Punch Hole Testing to FSX Subject: Re: [XFS Punch Hole 1/1] XFS Add Punch Hole Testing to FSX References: <4DBF02FF.608@linux.vnet.ibm.com> In-Reply-To: <4DBF02FF.608@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1304364189 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/2/11 2:16 PM, Allison Henderson wrote: > This patch adds punch hole tests to the fsx > stress test. The test is performed through > the fallocate call by randomly choosing to > use the punch hole flag when running the > fallocate test. Regions that have > been punched out should contain zeros, so > the expected file contents buffer is updated > to contain zeros when a hole is punched out. I'll cc: the xfs list since this would live in xfstests. Thanks, -Eric > Signed-off-by: Allison Henderson > --- > :100644 100644 32cd380... d424941... M ltp/Makefile > :100644 100644 fe072d3... 4f54ef6... M ltp/fsx.c > ltp/Makefile | 2 +- > ltp/fsx.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++--------- > 2 files changed, 62 insertions(+), 13 deletions(-) > > diff --git a/ltp/Makefile b/ltp/Makefile > index 32cd380..d424941 100644 > --- a/ltp/Makefile > +++ b/ltp/Makefile > @@ -27,7 +27,7 @@ LCFLAGS += -DAIO > LLDLIBS += -laio -lpthread > endif > > -ifeq ($(HAVE_FALLOCATE), true) > +ifeq ($(HAVE_FALLOCATE), yes) > LCFLAGS += -DFALLOCATE > endif > > diff --git a/ltp/fsx.c b/ltp/fsx.c > index fe072d3..4f54ef6 100644 > --- a/ltp/fsx.c > +++ b/ltp/fsx.c > @@ -207,7 +207,8 @@ logdump(void) > { > int i, count, down; > struct log_entry *lp; > - char *falloc_type[3] = {"PAST_EOF", "EXTENDING", "INTERIOR"}; > + char *falloc_type[4] = {"PAST_EOF", "EXTENDING", "INTERIOR", > + "PUNCH_HOLE"}; > > prt("LOG DUMP (%d total operations):\n", logcount); > if (logcount < LOGSIZE) { > @@ -791,7 +792,11 @@ dofallocate(unsigned offset, unsigned length) > { > unsigned end_offset; > int keep_size; > - > + int max_offset = 0; > + int max_len = 0; > + int punch_hole = 0; > + int mode = 0; > + char *op_name; > if (length == 0) { > if (!quiet && testcalls > simulatedopcount) > prt("skipping zero length fallocate\n"); > @@ -799,11 +804,31 @@ dofallocate(unsigned offset, unsigned length) > return; > } > > +#ifdef FALLOC_FL_PUNCH_HOLE > + punch_hole = random() % 2; > + /* Keep size must be set for punch hole */ > + if (punch_hole) { > + keep_size = 1; > + mode = FALLOC_FL_PUNCH_HOLE; > + } else > + keep_size = random() % 2; > +#else > keep_size = random() % 2; > +#endif > + > + if (keep_size) > + mode |= FALLOC_FL_KEEP_SIZE; > + > + if (punch_hole && file_size <= (loff_t)offset) { > + if (!quiet && testcalls > simulatedopcount) > + prt("skipping hole punch off the end of the file\n"); > + log4(OP_SKIPPED, OP_FALLOCATE, offset, length); > + return; > + } > > end_offset = keep_size ? 0 : offset + length; > > - if (end_offset > biggest) { > + if ((end_offset > biggest) && !punch_hole) { > biggest = end_offset; > if (!quiet && testcalls > simulatedopcount) > prt("fallocating to largest ever: 0x%x\n", end_offset); > @@ -811,13 +836,15 @@ dofallocate(unsigned offset, unsigned length) > > /* > * last arg: > - * 1: allocate past EOF > - * 2: extending prealloc > - * 3: interior prealloc > + * 0: allocate past EOF > + * 1: extending prealloc > + * 2: interior prealloc > + * 3: punch hole > */ > - log4(OP_FALLOCATE, offset, length, (end_offset > file_size) ? (keep_size ? 1 : 2) : 3); > + log4(OP_FALLOCATE, offset, length, punch_hole ? 3 : > + (end_offset > file_size) ? (keep_size ? 0 : 1) : 2); > > - if (end_offset > file_size) { > + if (((loff_t)end_offset > file_size) && !punch_hole) { > memset(good_buf + file_size, '\0', end_offset - file_size); > file_size = end_offset; > } > @@ -827,13 +854,35 @@ dofallocate(unsigned offset, unsigned length) > > if ((progressinterval && testcalls % progressinterval == 0) || > (debug && (monitorstart == -1 || monitorend == -1 || > - end_offset <= monitorend))) > - prt("%lu falloc\tfrom 0x%x to 0x%x\n", testcalls, offset, length); > - if (fallocate(fd, keep_size ? FALLOC_FL_KEEP_SIZE : 0, (loff_t)offset, (loff_t)length) == -1) { > - prt("fallocate: %x to %x\n", offset, length); > + end_offset <= monitorend))) { > +#ifdef FALLOC_FL_PUNCH_HOLE > + op_name = (mode & FALLOC_FL_PUNCH_HOLE) ? > + "punch hole" : "falloc"; > +#else > + op_name = "falloc"; > +#endif > + prt("%lu %s\tfrom 0x%x to 0x%x, (0x%x bytes)\n", testcalls, > + op_name, offset, offset+length, length); > + } > + if (fallocate(fd, mode, (loff_t)offset, (loff_t)length) == -1) { > +#ifdef FALLOC_FL_PUNCH_HOLE > + op_name = (mode & FALLOC_FL_PUNCH_HOLE) ? > + "punch hole" : "fallocate"; > +#else > + op_name = "fallocate"; > +#endif > + > + prt("%s: %x to %x\n", op_name, offset, length); > prterr("dofallocate: fallocate"); > report_failure(161); > } > + > + if (punch_hole) { > + max_offset = offset < file_size ? offset : file_size; > + max_len = max_offset + length <= file_size ? length : > + file_size - max_offset; > + memset(good_buf + max_offset, '\0', max_len); > + } > } > #else > void From aelder@sgi.com Mon May 2 14:29:26 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_FRT_LOLITA1 autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42JTQhw245427 for ; Mon, 2 May 2011 14:29:26 -0500 Received: from cas.corp.sgi.com (pv-excas1-dc21-nlb.corp.sgi.com [137.38.102.126]) by relay2.corp.sgi.com (Postfix) with ESMTP id BF50B304043; Mon, 2 May 2011 12:33:00 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Mon, 2 May 2011 14:33:00 -0500 Subject: Re: [PATCH] xfstests: support post-udev device mapper nodes From: Alex Elder Reply-To: To: Christoph Hellwig CC: In-Reply-To: <20110502160232.GA14457@infradead.org> References: <20110502160232.GA14457@infradead.org> Content-Type: text/plain; charset="UTF-8" Date: Mon, 2 May 2011 14:32:59 -0500 Message-ID: <1304364779.3077.41.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, 2011-05-02 at 12:02 -0400, Christoph Hellwig wrote: > Because of udevs complaining device mapper now creates /dev/dm-N as the real > device nodes, and just symlinks the /dev/mapper/ names to it. This would be > easy if everything used the /dev/mapper clear names, but most system utilities > translate them back to the /dev/mapper/ names and thus confuse various test > cases. Add support to _is_block_dev to read symlinks, and add documentation > on how to run xfstests on device mapper volumes. I'm not 100% sure I'm parsing the above right. What I read is that, although we want to use the "real" device (not the link), the utilities tend to report the /dev/mapper names. Therefore we want to use /dev/mapper names and internally translate them to their real devices. Based on that understanding I think what you're trying to do is fine, but there a few problems with what you sent. Note that I'm not even looking at the specifics of /dev/mapper links at the moment. > Signed-off-by: Christoph Hellwig > > Index: xfstests-dev/common.rc > =================================================================== > --- xfstests-dev.orig/common.rc 2011-05-02 12:45:25.000000000 +0000 > +++ xfstests-dev/common.rc 2011-05-02 12:45:28.000000000 +0000 > @@ -587,7 +587,14 @@ _is_block_dev() > exit 1 > fi > > - [ -b $1 ] && src/lstat64 $1 | $AWK_PROG '/Device type:/ { print $9 }' > + _dev=$1 > + if [ -L ${_dev} ]; then > + _dev=`readlink -f ${_dev}` Although it typically shouldn't, if the "readlink -f" fails, it will make _dev have an empty value... > + fi > + > + if [ -b ${_dev} ]; then ...which will lead to some sort of shell "syntax error" message here, which is rather unhelpful. At a minimum, I think putting quotes around it here would avoid that (but you should test), i.e., if [ -b "${_dev}" ]; then > + src/lstat64 ${_dev} | $AWK_PROG '/Device type:/ { print $9 }' > + fi > } > > # Do a command, log it to $seq.full, optionally test return status > @@ -700,10 +707,12 @@ _require_scratch() > *) > if [ -z "$SCRATCH_DEV" -o "`_is_block_dev $SCRATCH_DEV`" = "" ] > then > + echo "no a block device"; This and the next one appear to be junk that should be removed. > _notrun "this test requires a valid \$SCRATCH_DEV" > fi > if [ "`_is_block_dev $SCRATCH_DEV`" = "`_is_block_dev $TEST_DEV`" ] > then > + echo "foo" > _notrun "this test requires a valid \$SCRATCH_DEV" > fi > if [ ! -d "$SCRATCH_MNT" ] > Index: xfstests-dev/README.device-mapper > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ xfstests-dev/README.device-mapper 2011-05-02 15:51:24.000000000 +0000 > @@ -0,0 +1,8 @@ > + > +To use xfstests on device mapper always use the /dev/mapper/ symlinks, > +not the /dev/dm-* devices, or the symlinks created by LVM. > + > +For example: > + > +TEST_DEV=/dev/mapper/test > +SCRATCH_DEV=/dev/mapper/scratch > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From sandeen@redhat.com Mon May 2 14:32:05 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_66 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42JW5YU245605 for ; Mon, 2 May 2011 14:32:05 -0500 X-ASG-Debug-ID: 1304364941-273802850000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id BC71811BDBEE for ; Mon, 2 May 2011 12:35:41 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id cNGsZdD6R5Faa0U1 for ; Mon, 02 May 2011 12:35:41 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p42JZe2G016602 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 2 May 2011 15:35:40 -0400 Received: from liberator.sandeen.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p42JZc6q020523 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Mon, 2 May 2011 15:35:39 -0400 Message-ID: <4DBF0789.3090808@redhat.com> Date: Mon, 02 May 2011 14:35:37 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Allison Henderson CC: linux-fsdevel , Ext4 Developers List , xfs-oss X-ASG-Orig-Subj: Re: [XFS Punch Hole 1/1] XFS Add Punch Hole Testing to FSX Subject: Re: [XFS Punch Hole 1/1] XFS Add Punch Hole Testing to FSX References: <4DBF02FF.608@linux.vnet.ibm.com> <4DBF0498.6070905@redhat.com> In-Reply-To: <4DBF0498.6070905@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1304364941 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/2/11 2:23 PM, Eric Sandeen wrote: > On 5/2/11 2:16 PM, Allison Henderson wrote: >> This patch adds punch hole tests to the fsx >> stress test. The test is performed through >> the fallocate call by randomly choosing to >> use the punch hole flag when running the >> fallocate test. Regions that have >> been punched out should contain zeros, so >> the expected file contents buffer is updated >> to contain zeros when a hole is punched out. > > I'll cc: the xfs list since this would live in xfstests. > > Thanks, > -Eric > >> Signed-off-by: Allison Henderson >> --- >> :100644 100644 32cd380... d424941... M ltp/Makefile >> :100644 100644 fe072d3... 4f54ef6... M ltp/fsx.c >> ltp/Makefile | 2 +- >> ltp/fsx.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++--------- >> 2 files changed, 62 insertions(+), 13 deletions(-) >> >> diff --git a/ltp/Makefile b/ltp/Makefile >> index 32cd380..d424941 100644 >> --- a/ltp/Makefile >> +++ b/ltp/Makefile >> @@ -27,7 +27,7 @@ LCFLAGS += -DAIO >> LLDLIBS += -laio -lpthread >> endif >> >> -ifeq ($(HAVE_FALLOCATE), true) >> +ifeq ($(HAVE_FALLOCATE), yes) argh I ended up with 2 fallocate tests in aclocal.m4, need to tidy that up, I'll do that in a separate patch. >> LCFLAGS += -DFALLOCATE >> endif >> >> diff --git a/ltp/fsx.c b/ltp/fsx.c >> index fe072d3..4f54ef6 100644 >> --- a/ltp/fsx.c >> +++ b/ltp/fsx.c >> @@ -207,7 +207,8 @@ logdump(void) >> { >> int i, count, down; >> struct log_entry *lp; >> - char *falloc_type[3] = {"PAST_EOF", "EXTENDING", "INTERIOR"}; >> + char *falloc_type[4] = {"PAST_EOF", "EXTENDING", "INTERIOR", >> + "PUNCH_HOLE"}; >> >> prt("LOG DUMP (%d total operations):\n", logcount); >> if (logcount < LOGSIZE) { >> @@ -791,7 +792,11 @@ dofallocate(unsigned offset, unsigned length) >> { >> unsigned end_offset; >> int keep_size; >> - >> + int max_offset = 0; >> + int max_len = 0; >> + int punch_hole = 0; >> + int mode = 0; >> + char *op_name; >> if (length == 0) { >> if (!quiet && testcalls > simulatedopcount) >> prt("skipping zero length fallocate\n"); >> @@ -799,11 +804,31 @@ dofallocate(unsigned offset, unsigned length) >> return; >> } >> >> +#ifdef FALLOC_FL_PUNCH_HOLE >> + punch_hole = random() % 2; >> + /* Keep size must be set for punch hole */ >> + if (punch_hole) { >> + keep_size = 1; >> + mode = FALLOC_FL_PUNCH_HOLE; >> + } else >> + keep_size = random() % 2; >> +#else >> keep_size = random() % 2; >> +#endif >> + >> + if (keep_size) >> + mode |= FALLOC_FL_KEEP_SIZE; >> + >> + if (punch_hole && file_size <= (loff_t)offset) { >> + if (!quiet && testcalls > simulatedopcount) >> + prt("skipping hole punch off the end of the file\n"); >> + log4(OP_SKIPPED, OP_FALLOCATE, offset, length); >> + return; >> + } >> >> end_offset = keep_size ? 0 : offset + length; >> >> - if (end_offset > biggest) { >> + if ((end_offset > biggest) && !punch_hole) { >> biggest = end_offset; >> if (!quiet && testcalls > simulatedopcount) >> prt("fallocating to largest ever: 0x%x\n", end_offset); >> @@ -811,13 +836,15 @@ dofallocate(unsigned offset, unsigned length) >> >> /* >> * last arg: >> - * 1: allocate past EOF >> - * 2: extending prealloc >> - * 3: interior prealloc >> + * 0: allocate past EOF >> + * 1: extending prealloc >> + * 2: interior prealloc >> + * 3: punch hole >> */ >> - log4(OP_FALLOCATE, offset, length, (end_offset > file_size) ? (keep_size ? 1 : 2) : 3); >> + log4(OP_FALLOCATE, offset, length, punch_hole ? 3 : >> + (end_offset > file_size) ? (keep_size ? 0 : 1) : 2); >> >> - if (end_offset > file_size) { >> + if (((loff_t)end_offset > file_size) && !punch_hole) { >> memset(good_buf + file_size, '\0', end_offset - file_size); >> file_size = end_offset; >> } >> @@ -827,13 +854,35 @@ dofallocate(unsigned offset, unsigned length) >> >> if ((progressinterval && testcalls % progressinterval == 0) || >> (debug && (monitorstart == -1 || monitorend == -1 || >> - end_offset <= monitorend))) >> - prt("%lu falloc\tfrom 0x%x to 0x%x\n", testcalls, offset, length); >> - if (fallocate(fd, keep_size ? FALLOC_FL_KEEP_SIZE : 0, (loff_t)offset, (loff_t)length) == -1) { >> - prt("fallocate: %x to %x\n", offset, length); >> + end_offset <= monitorend))) { >> +#ifdef FALLOC_FL_PUNCH_HOLE >> + op_name = (mode & FALLOC_FL_PUNCH_HOLE) ? >> + "punch hole" : "falloc"; >> +#else >> + op_name = "falloc"; >> +#endif >> + prt("%lu %s\tfrom 0x%x to 0x%x, (0x%x bytes)\n", testcalls, >> + op_name, offset, offset+length, length); >> + } >> + if (fallocate(fd, mode, (loff_t)offset, (loff_t)length) == -1) { >> +#ifdef FALLOC_FL_PUNCH_HOLE >> + op_name = (mode & FALLOC_FL_PUNCH_HOLE) ? >> + "punch hole" : "fallocate"; >> +#else >> + op_name = "fallocate"; >> +#endif >> + >> + prt("%s: %x to %x\n", op_name, offset, length); >> prterr("dofallocate: fallocate"); >> report_failure(161); >> } >> + >> + if (punch_hole) { >> + max_offset = offset < file_size ? offset : file_size; >> + max_len = max_offset + length <= file_size ? length : >> + file_size - max_offset; >> + memset(good_buf + max_offset, '\0', max_len); >> + } >> } >> #else >> void > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html From BATV+c7f46437ce921d9b3da5+2808+infradead.org+hch@bombadil.srs.infradead.org Mon May 2 14:32:06 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42JW5JW245613 for ; Mon, 2 May 2011 14:32:06 -0500 X-ASG-Debug-ID: 1304364942-217500390000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id CFC454256B8; Mon, 2 May 2011 12:35:42 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id YYP9U8J4l5OF3i4l; Mon, 02 May 2011 12:35:42 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QGyuE-0008A2-AW; Mon, 02 May 2011 19:35:42 +0000 Date: Mon, 2 May 2011 15:35:42 -0400 From: Christoph Hellwig To: Alex Elder Cc: Christoph Hellwig , xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH] xfstests: support post-udev device mapper nodes Subject: Re: [PATCH] xfstests: support post-udev device mapper nodes Message-ID: <20110502193542.GA28530@infradead.org> References: <20110502160232.GA14457@infradead.org> <1304364779.3077.41.camel@doink> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1304364779.3077.41.camel@doink> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304364942 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 02, 2011 at 02:32:59PM -0500, Alex Elder wrote: > I'm not 100% sure I'm parsing the above right. What > I read is that, although we want to use the "real" > device (not the link), the utilities tend to report > the /dev/mapper names. Therefore we want to use > /dev/mapper names and internally translate them to > their real devices. We as in xfstests want to use whatever everyone else uses to make our life easier for parsing mount table output, df output, etc, and that's normally the /dev/mapper/ name. The only thing that can't cope with the symlink there is the _is_block_dev helper. > > + _dev=$1 > > + if [ -L ${_dev} ]; then > > + _dev=`readlink -f ${_dev}` > > Although it typically shouldn't, if the "readlink -f" fails, > it will make _dev have an empty value... > > > + fi > > + > > + if [ -b ${_dev} ]; then > > ...which will lead to some sort of shell "syntax error" > message here, which is rather unhelpful. > > At a minimum, I think putting quotes around it here > would avoid that (but you should test), i.e., > if [ -b "${_dev}" ]; then Ok. > This and the next one appear to be junk that should > be removed. Indeed. From lists@nerdbynature.de Mon May 2 14:56:17 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42JuGPM246286 for ; Mon, 2 May 2011 14:56:17 -0500 X-ASG-Debug-ID: 1304366392-3cfa033f0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2FF3C1D62186 for ; Mon, 2 May 2011 12:59:52 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id JrJ7wSan3HHeTk90 for ; Mon, 02 May 2011 12:59:52 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id 155DF3DFAF; Mon, 2 May 2011 21:59:52 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id AB37D3DCB8; Mon, 2 May 2011 21:59:50 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 8AE093DB79; Mon, 2 May 2011 21:59:50 +0200 (CEST) Date: Mon, 2 May 2011 12:59:50 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: <20110502121958.GA2978@dastard> Message-ID: References: <20110427022655.GE12436@dastard> <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304366393 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62587 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, 2 May 2011 at 22:19, Dave Chinner wrote: > Yes. Try 2 orders of magnitude as a start. i.e change it to 10000... I've run the -12 test with vfs_cache_pressure=200 and now the -13 test with vfs_cache_pressure=10000. The OOM killer still kicks in, but the machine seems to be more usable afterwards and does not get totally stuck: http://nerdbynature.de/bits/2.6.39-rc4/oom/ - messages-12.txt.gz & slabinfo-12.txt.bz2 * oom-debug.sh invoked oom-killer at 01:27:11 * sysrq-w works until 01:27:08, but got killed by oom - messages-13.txt.gz & slabinfo-13.txt.bz2 * find invoked oom-killer at 08:44:07 * sysrq-w works until 08:45:48 (listing jbd2/hda6-8), then my debug script got killed Thanks, Christian. -- BOFH excuse #224: Jan 9 16:41:27 huber su: 'su root' succeeded for .... on /dev/pts/1 From aelder@sgi.com Mon May 2 15:00:11 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42K0AnJ246532 for ; Mon, 2 May 2011 15:00:10 -0500 Received: from cas.corp.sgi.com (pv-excas1-dc21-nlb.corp.sgi.com [137.38.102.126]) by relay2.corp.sgi.com (Postfix) with ESMTP id B27CB304043; Mon, 2 May 2011 13:03:44 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Mon, 2 May 2011 15:03:44 -0500 Subject: Re: [PATCH] xfsprogs - resolve Debian readline build issue From: Alex Elder Reply-To: To: Nathan Scott CC: In-Reply-To: <360436721.31453.1304032530041.JavaMail.root@acxmail-au2.aconex.com> References: <360436721.31453.1304032530041.JavaMail.root@acxmail-au2.aconex.com> Content-Type: text/plain; charset="UTF-8" Date: Mon, 2 May 2011 15:03:43 -0500 Message-ID: <1304366623.3077.44.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, 2011-04-29 at 09:15 +1000, Nathan Scott wrote: > Address the recently reported build issue with libreadline5/6, via > the gplv2 route. Since this appears to be a relatively recent pkg, > I made its use conditional so the deb build continues to work for > everyone not running a bleeding edge distro. Works For Me (tm). > > This addresses Debian bug 553875: libreadline5-dev removal pending As far as I'm concerned, this looks fine. I understand the issue, but I'm not really familiar with the Debian build/dependency system at work here, so I'm not really a very qualified reviewer. But if you're comfortable with it and get nobody else to review it, you are welcome to use my sign-off. Signed-off-by: Alex Elder > Signed-off-by: Nathan Scott From eflorac@intellique.com Mon May 2 15:06:40 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42K6ecH246744 for ; Mon, 2 May 2011 15:06:40 -0500 X-ASG-Debug-ID: 1304367014-3cf603e00000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp3-g21.free.fr (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 110621D62466 for ; Mon, 2 May 2011 13:10:15 -0700 (PDT) Received: from smtp3-g21.free.fr (smtp3-g21.free.fr [212.27.42.3]) by cuda.sgi.com with ESMTP id P14oWgUZ8uOF3QRg for ; Mon, 02 May 2011 13:10:15 -0700 (PDT) Received: from galadriel2.home (unknown [82.235.234.79]) by smtp3-g21.free.fr (Postfix) with ESMTP id 0055AA6424; Mon, 2 May 2011 22:10:09 +0200 (CEST) Date: Mon, 2 May 2011 22:10:02 +0200 From: Emmanuel Florac To: Dave Chinner Cc: Peter Grandi , Linux fs XFS X-ASG-Orig-Subj: Re: xfs performance problem Subject: Re: xfs performance problem Message-ID: <20110502221002.081a897c@galadriel2.home> In-Reply-To: <20110502025042.GK13542@dastard> References: <4DB72084.8020205@inf.ethz.ch> <4DB74331.3030804@hardwarefreak.com> <4DB75C6D.1080901@inf.ethz.ch> <19898.53907.842827.480883@tree.ty.sabi.co.UK> <20110501084919.GE13542@dastard> <19901.28769.553575.864887@tree.ty.sabi.co.UK> <20110502025042.GK13542@dastard> Organization: Intellique X-Mailer: Claws Mail 3.7.8 (GTK+ 2.20.1; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: smtp3-g21.free.fr[212.27.42.3] X-Barracuda-Start-Time: 1304367017 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62587 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Le Mon, 2 May 2011 12:50:42 +1000 vous =C3=A9criviez: > Ah, quoting Joerg Schilling FUD about Linux. That's a good way > to get people to ignore you.... We should start a new thread about how poor the linux SCSI stack is :) Maybe that's the reason for these performance problems :) --=20 ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | | +33 1 78 94 84 02 ------------------------------------------------------------------------ From adilger@dilger.ca Mon May 2 15:25:46 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_57, J_CHICKENPOX_66 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42KPkFp247356 for ; Mon, 2 May 2011 15:25:46 -0500 X-ASG-Debug-ID: 1304368162-061400f40000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from idcmail-mo1so.shaw.ca (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D59CD1579C05 for ; Mon, 2 May 2011 13:29:22 -0700 (PDT) Received: from idcmail-mo1so.shaw.ca (idcmail-mo1so.shaw.ca [24.71.223.10]) by cuda.sgi.com with ESMTP id FQv9wwTgrniTlF5F for ; Mon, 02 May 2011 13:29:22 -0700 (PDT) Received: from pd2ml1so-ssvc.prod.shaw.ca ([10.0.141.139]) by pd3mo1so-svcs.prod.shaw.ca with ESMTP; 02 May 2011 14:29:21 -0600 X-Cloudmark-SP-Filtered: true X-Cloudmark-SP-Result: v=1.1 cv=6EkEX6JM2LCztCEhkE317K9SpBSN4cB8nbuuHVfFIzI= c=1 sm=1 a=yDJgh7vdcs8A:10 a=BLceEmwcHowA:10 a=kj9zAlcOel0A:10 a=c23vf5CSMVc0QQz9B4a6RA==:17 a=VnNF1IyMAAAA:8 a=r-W6NY7A8ZiBQWKdvTIA:9 a=0cEpQjvoW8n29exrnLYA:7 a=CjuIK1q_8ugA:10 a=R9mTD0jQ6LcWmDu-:21 a=0ZydnTTY-O1nHylr:21 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 Received: from unknown (HELO cabot-100.adilger.int) ([68.147.195.121]) by pd2ml1so-dmz.prod.shaw.ca with ESMTP; 02 May 2011 14:29:21 -0600 X-ASG-Orig-Subj: Re: [XFS Punch Hole 1/1] XFS Add Punch Hole Testing to FSX Subject: Re: [XFS Punch Hole 1/1] XFS Add Punch Hole Testing to FSX Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: Andreas Dilger In-Reply-To: <4DBF0789.3090808@redhat.com> Date: Mon, 2 May 2011 14:29:20 -0600 Cc: Allison Henderson , linux-fsdevel , Ext4 Developers List , xfs-oss Content-Transfer-Encoding: quoted-printable Message-Id: <61E784AC-2E07-41DC-A65C-0C1B766A4A6F@dilger.ca> References: <4DBF02FF.608@linux.vnet.ibm.com> <4DBF0498.6070905@redhat.com> <4DBF0789.3090808@redhat.com> To: Eric Sandeen X-Mailer: Apple Mail (2.1082) X-Barracuda-Connect: idcmail-mo1so.shaw.ca[24.71.223.10] X-Barracuda-Start-Time: 1304368162 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62589 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/2/11 2:16 PM, Allison Henderson wrote: > This patch adds punch hole tests to the fsx > stress test. The test is performed through > the fallocate call by randomly choosing to > use the punch hole flag when running the > fallocate test. Regions that have > been punched out should contain zeros, so > the expected file contents buffer is updated > to contain zeros when a hole is punched out. >=20 > Signed-off-by: Allison Henderson > --- > :100644 100644 32cd380... d424941... M ltp/Makefile > :100644 100644 fe072d3... 4f54ef6... M ltp/fsx.c > ltp/Makefile | 2 +- > ltp/fsx.c | 73 = ++++++++++++++++++++++++++++++++++++++++++++++++--------- > 2 files changed, 62 insertions(+), 13 deletions(-) >=20 > diff --git a/ltp/Makefile b/ltp/Makefile > index 32cd380..d424941 100644 > --- a/ltp/Makefile > +++ b/ltp/Makefile > @@ -27,7 +27,7 @@ LCFLAGS +=3D -DAIO > LLDLIBS +=3D -laio -lpthread > endif >=20 > -ifeq ($(HAVE_FALLOCATE), true) > +ifeq ($(HAVE_FALLOCATE), yes) >=20 > LCFLAGS +=3D -DFALLOCATE > endif >=20 > diff --git a/ltp/fsx.c b/ltp/fsx.c > index fe072d3..4f54ef6 100644 > --- a/ltp/fsx.c > +++ b/ltp/fsx.c > @@ -207,7 +207,8 @@ logdump(void) > { > int i, count, down; > struct log_entry *lp; > - char *falloc_type[3] =3D {"PAST_EOF", "EXTENDING", "INTERIOR"}; > + char *falloc_type[4] =3D {"PAST_EOF", "EXTENDING", "INTERIOR", > + "PUNCH_HOLE"}; >=20 > prt("LOG DUMP (%d total operations):\n", logcount); > if (logcount < LOGSIZE) { > @@ -791,7 +792,11 @@ dofallocate(unsigned offset, unsigned length) > { > unsigned end_offset; > int keep_size; > - > + int max_offset =3D 0; > + int max_len =3D 0; > + int punch_hole =3D 0; > + int mode =3D 0; > + char *op_name; > if (length =3D=3D 0) { > if (!quiet && testcalls > simulatedopcount) > prt("skipping zero length fallocate\n"); > @@ -799,11 +804,31 @@ dofallocate(unsigned offset, unsigned length) > return; > } >=20 > +#ifdef FALLOC_FL_PUNCH_HOLE > + punch_hole =3D random() % 2; > + /* Keep size must be set for punch hole */ > + if (punch_hole) { > + keep_size =3D 1; > + mode =3D FALLOC_FL_PUNCH_HOLE; > + } else > + keep_size =3D random() % 2; > +#else > keep_size =3D random() % 2; > +#endif > + > + if (keep_size) > + mode |=3D FALLOC_FL_KEEP_SIZE; > + > + if (punch_hole && file_size <=3D (loff_t)offset) { > + if (!quiet && testcalls > simulatedopcount) > + prt("skipping hole punch off the end of the = file\n"); > + log4(OP_SKIPPED, OP_FALLOCATE, offset, length); > + return; > + } Isn't a hole punch off the end of the file is just a truncate? I think = that is a valid test case, since punch(newsize, ~0ULL) should be = identical to truncate(newsize). In that case, "keep_size" would affect = whether the file size is left alone, or it is now "newsize", so I don't = think it should always be set to run with keep_size =3D 1 for = FALLOC_FL_PUNCH_HOLE operations. > end_offset =3D keep_size ? 0 : offset + length; >=20 > - if (end_offset > biggest) { > + if ((end_offset > biggest) && !punch_hole) { > biggest =3D end_offset; > if (!quiet && testcalls > simulatedopcount) > prt("fallocating to largest ever: 0x%x\n", = end_offset); > @@ -811,13 +836,15 @@ dofallocate(unsigned offset, unsigned length) >=20 > /* > * last arg: > - * 1: allocate past EOF > - * 2: extending prealloc > - * 3: interior prealloc > + * 0: allocate past EOF > + * 1: extending prealloc > + * 2: interior prealloc > + * 3: punch hole > */ > - log4(OP_FALLOCATE, offset, length, (end_offset > file_size) ? = (keep_size ? 1 : 2) : 3); > + log4(OP_FALLOCATE, offset, length, punch_hole ? 3 : > + (end_offset > file_size) ? (keep_size ? 0 : 1) : = 2); >=20 > - if (end_offset > file_size) { > + if (((loff_t)end_offset > file_size) && !punch_hole) { > memset(good_buf + file_size, '\0', end_offset - = file_size); > file_size =3D end_offset; > } > @@ -827,13 +854,35 @@ dofallocate(unsigned offset, unsigned length) > =09 > if ((progressinterval && testcalls % progressinterval =3D=3D 0) = || > (debug && (monitorstart =3D=3D -1 || monitorend =3D=3D -1 || > - end_offset <=3D monitorend))) > - prt("%lu falloc\tfrom 0x%x to 0x%x\n", testcalls, = offset, length); > - if (fallocate(fd, keep_size ? FALLOC_FL_KEEP_SIZE : 0, = (loff_t)offset, (loff_t)length) =3D=3D -1) { > - prt("fallocate: %x to %x\n", offset, length); > + end_offset <=3D monitorend))) { > +#ifdef FALLOC_FL_PUNCH_HOLE > + op_name =3D (mode & FALLOC_FL_PUNCH_HOLE) ? > + "punch hole" : "falloc"; > +#else > + op_name =3D "falloc"; > +#endif > + prt("%lu %s\tfrom 0x%x to 0x%x, (0x%x bytes)\n", = testcalls, > + op_name, offset, offset+length, length); > + } > + if (fallocate(fd, mode, (loff_t)offset, (loff_t)length) =3D=3D = -1) { > +#ifdef FALLOC_FL_PUNCH_HOLE > + op_name =3D (mode & FALLOC_FL_PUNCH_HOLE) ? > + "punch hole" : "fallocate"; > +#else > + op_name =3D "fallocate"; > +#endif > + > + prt("%s: %x to %x\n", op_name, offset, length); > prterr("dofallocate: fallocate"); > report_failure(161); > } > + > + if (punch_hole) { > + max_offset =3D offset < file_size ? offset : file_size; > + max_len =3D max_offset + length <=3D file_size ? length = : > + file_size - max_offset; > + memset(good_buf + max_offset, '\0', max_len); > + } > } > #else > void Cheers, Andreas From achender@linux.vnet.ibm.com Mon May 2 17:36:36 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_57, J_CHICKENPOX_66 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42MaYSh250972 for ; Mon, 2 May 2011 17:36:35 -0500 X-ASG-Debug-ID: 1304376009-25ac017a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from e5.ny.us.ibm.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2DE7611BDC8E for ; Mon, 2 May 2011 15:40:09 -0700 (PDT) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.145]) by cuda.sgi.com with ESMTP id eOSFp4gYwdr5IUM0 for ; Mon, 02 May 2011 15:40:09 -0700 (PDT) Received: from d01relay06.pok.ibm.com (d01relay06.pok.ibm.com [9.56.227.116]) by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p42MDbip011739 for ; Mon, 2 May 2011 18:13:37 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay06.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p42Me8lJ1343562 for ; Mon, 2 May 2011 18:40:08 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p42Me7FF020513 for ; Mon, 2 May 2011 18:40:08 -0400 Received: from [9.65.15.245] (sig-9-65-15-245.mts.ibm.com [9.65.15.245]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p42Me6XE020408; Mon, 2 May 2011 18:40:06 -0400 Message-ID: <4DBF32BE.70009@linux.vnet.ibm.com> Date: Mon, 02 May 2011 15:39:58 -0700 From: Allison Henderson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: Andreas Dilger CC: Eric Sandeen , linux-fsdevel , Ext4 Developers List , xfs-oss X-ASG-Orig-Subj: Re: [XFS Punch Hole 1/1] XFS Add Punch Hole Testing to FSX Subject: Re: [XFS Punch Hole 1/1] XFS Add Punch Hole Testing to FSX References: <4DBF02FF.608@linux.vnet.ibm.com> <4DBF0498.6070905@redhat.com> <4DBF0789.3090808@redhat.com> <61E784AC-2E07-41DC-A65C-0C1B766A4A6F@dilger.ca> In-Reply-To: <61E784AC-2E07-41DC-A65C-0C1B766A4A6F@dilger.ca> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: e5.ny.us.ibm.com[32.97.182.145] X-Barracuda-Start-Time: 1304376011 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62598 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/2/2011 1:29 PM, Andreas Dilger wrote: > On 5/2/11 2:16 PM, Allison Henderson wrote: >> This patch adds punch hole tests to the fsx >> stress test. The test is performed through >> the fallocate call by randomly choosing to >> use the punch hole flag when running the >> fallocate test. Regions that have >> been punched out should contain zeros, so >> the expected file contents buffer is updated >> to contain zeros when a hole is punched out. >> >> Signed-off-by: Allison Henderson >> --- >> :100644 100644 32cd380... d424941... M ltp/Makefile >> :100644 100644 fe072d3... 4f54ef6... M ltp/fsx.c >> ltp/Makefile | 2 +- >> ltp/fsx.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++--------- >> 2 files changed, 62 insertions(+), 13 deletions(-) >> >> diff --git a/ltp/Makefile b/ltp/Makefile >> index 32cd380..d424941 100644 >> --- a/ltp/Makefile >> +++ b/ltp/Makefile >> @@ -27,7 +27,7 @@ LCFLAGS += -DAIO >> LLDLIBS += -laio -lpthread >> endif >> >> -ifeq ($(HAVE_FALLOCATE), true) >> +ifeq ($(HAVE_FALLOCATE), yes) >> > >> LCFLAGS += -DFALLOCATE >> endif >> >> diff --git a/ltp/fsx.c b/ltp/fsx.c >> index fe072d3..4f54ef6 100644 >> --- a/ltp/fsx.c >> +++ b/ltp/fsx.c >> @@ -207,7 +207,8 @@ logdump(void) >> { >> int i, count, down; >> struct log_entry *lp; >> - char *falloc_type[3] = {"PAST_EOF", "EXTENDING", "INTERIOR"}; >> + char *falloc_type[4] = {"PAST_EOF", "EXTENDING", "INTERIOR", >> + "PUNCH_HOLE"}; >> >> prt("LOG DUMP (%d total operations):\n", logcount); >> if (logcount< LOGSIZE) { >> @@ -791,7 +792,11 @@ dofallocate(unsigned offset, unsigned length) >> { >> unsigned end_offset; >> int keep_size; >> - >> + int max_offset = 0; >> + int max_len = 0; >> + int punch_hole = 0; >> + int mode = 0; >> + char *op_name; >> if (length == 0) { >> if (!quiet&& testcalls> simulatedopcount) >> prt("skipping zero length fallocate\n"); >> @@ -799,11 +804,31 @@ dofallocate(unsigned offset, unsigned length) >> return; >> } >> >> +#ifdef FALLOC_FL_PUNCH_HOLE >> + punch_hole = random() % 2; >> + /* Keep size must be set for punch hole */ >> + if (punch_hole) { >> + keep_size = 1; >> + mode = FALLOC_FL_PUNCH_HOLE; >> + } else >> + keep_size = random() % 2; >> +#else >> keep_size = random() % 2; >> +#endif >> + >> + if (keep_size) >> + mode |= FALLOC_FL_KEEP_SIZE; >> + >> + if (punch_hole&& file_size<= (loff_t)offset) { >> + if (!quiet&& testcalls> simulatedopcount) >> + prt("skipping hole punch off the end of the file\n"); >> + log4(OP_SKIPPED, OP_FALLOCATE, offset, length); >> + return; >> + } > > Isn't a hole punch off the end of the file is just a truncate? I think that is a valid test case, since punch(newsize, ~0ULL) should be identical to truncate(newsize). In that case, "keep_size" would affect whether the file size is left alone, or it is now "newsize", so I don't think it should always be set to run with keep_size = 1 for FALLOC_FL_PUNCH_HOLE operations. Hi there, Well actually punch hole requires that the keep size flag be set when using the punch hole flag, so it's not quite the same as truncate. The punch hole operation does not modify the length of the file, it only changes whether or not the specified blocks are allocated. So a hole that extends off the end of the file would cause the last few blocks of the file to be released, but the length of the file would not shrink. The above code snippet though is checking to see if the entire hole is off the edge of the file. For example, the file is 3 blocks in length, but we are trying to punch out blocks 5 though 7. This operation would have no effect, so we skip it here. Allison Henderson > >> end_offset = keep_size ? 0 : offset + length; >> >> - if (end_offset> biggest) { >> + if ((end_offset> biggest)&& !punch_hole) { >> biggest = end_offset; >> if (!quiet&& testcalls> simulatedopcount) >> prt("fallocating to largest ever: 0x%x\n", end_offset); >> @@ -811,13 +836,15 @@ dofallocate(unsigned offset, unsigned length) >> >> /* >> * last arg: >> - * 1: allocate past EOF >> - * 2: extending prealloc >> - * 3: interior prealloc >> + * 0: allocate past EOF >> + * 1: extending prealloc >> + * 2: interior prealloc >> + * 3: punch hole >> */ >> - log4(OP_FALLOCATE, offset, length, (end_offset> file_size) ? (keep_size ? 1 : 2) : 3); >> + log4(OP_FALLOCATE, offset, length, punch_hole ? 3 : >> + (end_offset> file_size) ? (keep_size ? 0 : 1) : 2); >> >> - if (end_offset> file_size) { >> + if (((loff_t)end_offset> file_size)&& !punch_hole) { >> memset(good_buf + file_size, '\0', end_offset - file_size); >> file_size = end_offset; >> } >> @@ -827,13 +854,35 @@ dofallocate(unsigned offset, unsigned length) >> >> if ((progressinterval&& testcalls % progressinterval == 0) || >> (debug&& (monitorstart == -1 || monitorend == -1 || >> - end_offset<= monitorend))) >> - prt("%lu falloc\tfrom 0x%x to 0x%x\n", testcalls, offset, length); >> - if (fallocate(fd, keep_size ? FALLOC_FL_KEEP_SIZE : 0, (loff_t)offset, (loff_t)length) == -1) { >> - prt("fallocate: %x to %x\n", offset, length); >> + end_offset<= monitorend))) { >> +#ifdef FALLOC_FL_PUNCH_HOLE >> + op_name = (mode& FALLOC_FL_PUNCH_HOLE) ? >> + "punch hole" : "falloc"; >> +#else >> + op_name = "falloc"; >> +#endif >> + prt("%lu %s\tfrom 0x%x to 0x%x, (0x%x bytes)\n", testcalls, >> + op_name, offset, offset+length, length); >> + } >> + if (fallocate(fd, mode, (loff_t)offset, (loff_t)length) == -1) { >> +#ifdef FALLOC_FL_PUNCH_HOLE >> + op_name = (mode& FALLOC_FL_PUNCH_HOLE) ? >> + "punch hole" : "fallocate"; >> +#else >> + op_name = "fallocate"; >> +#endif >> + >> + prt("%s: %x to %x\n", op_name, offset, length); >> prterr("dofallocate: fallocate"); >> report_failure(161); >> } >> + >> + if (punch_hole) { >> + max_offset = offset< file_size ? offset : file_size; >> + max_len = max_offset + length<= file_size ? length : >> + file_size - max_offset; >> + memset(good_buf + max_offset, '\0', max_len); >> + } >> } >> #else >> void > > > Cheers, Andreas > > > > > From sandeen@sandeen.net Mon May 2 19:12:08 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p430C7Ek253487 for ; Mon, 2 May 2011 19:12:07 -0500 X-ASG-Debug-ID: 1304381742-2c7703620000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2836FA5511C for ; Mon, 2 May 2011 17:15:43 -0700 (PDT) Received: from mail.sandeen.net (sandeen.net [63.231.237.45]) by cuda.sgi.com with ESMTP id AjPUUlQdms19Ei5q for ; Mon, 02 May 2011 17:15:43 -0700 (PDT) Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sandeen.net (Postfix) with ESMTP id 8FF124964600; Mon, 2 May 2011 19:15:42 -0500 (CDT) Message-ID: <4DBF492E.3040400@sandeen.net> Date: Mon, 02 May 2011 19:15:42 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: xfs-oss CC: Allison Henderson X-ASG-Orig-Subj: [PATCH] xfstests: clean up fallocate configuration tests Subject: [PATCH] xfstests: clean up fallocate configuration tests Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: sandeen.net[63.231.237.45] X-Barracuda-Start-Time: 1304381744 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62603 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean When I added fallocate support to fsx I inadvertently added a duplicate fallocate test. Consolidate them both into one test (the link test, not the compile test) and make all tests use "true" rather than "yes" to be more consistent with other tests. Signed-off-by: Eric Sandeen --- diff --git a/aclocal.m4 b/aclocal.m4 index 70ea0f3..168eb59 100644 --- a/aclocal.m4 +++ b/aclocal.m4 @@ -18,27 +18,14 @@ AC_DEFUN([AC_PACKAGE_WANT_LINUX_FIEMAP_H], AC_DEFUN([AC_PACKAGE_WANT_FALLOCATE], [ AC_MSG_CHECKING([for fallocate]) - AC_TRY_COMPILE([ -#include - ], [ - fallocate(0, 0, 0, 0); - ], have_fallocate=true - AC_MSG_RESULT(true), - AC_MSG_RESULT(false)) - AC_SUBST(have_fallocate) - ]) -AC_DEFUN([AC_PACKAGE_WANT_FALLOCATE], - [ AC_MSG_CHECKING([for fallocate]) AC_TRY_LINK([ #define _GNU_SOURCE #define _FILE_OFFSET_BITS 64 #include -#include - ], [ - fallocate(0, 0, 0, 0); - ], have_fallocate=yes - AC_MSG_RESULT(yes), - AC_MSG_RESULT(no)) +#include ], + [ fallocate(0, 0, 0, 0); ], + [ have_fallocate=true; AC_MSG_RESULT(yes) ], + [ have_fallocate=false; AC_MSG_RESULT(no) ]) AC_SUBST(have_fallocate) ]) m4_include([m4/multilib.m4]) diff --git a/src/Makefile b/src/Makefile index 1162ee0..91088bf 100644 --- a/src/Makefile +++ b/src/Makefile @@ -31,7 +31,7 @@ ifeq ($(HAVE_FIEMAP), true) LINUX_TARGETS += fiemap-tester endif -ifeq ($(HAVE_FALLOCATE),yes) +ifeq ($(HAVE_FALLOCATE), true) LCFLAGS += -DHAVE_FALLOCATE endif From david@fromorbit.com Mon May 2 19:47:49 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p430lnir254334 for ; Mon, 2 May 2011 19:47:49 -0500 X-ASG-Debug-ID: 1304383880-03f100880000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 8414F1E19B93 for ; Mon, 2 May 2011 17:51:21 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id OZWjSvjKMbqEZlAP for ; Mon, 02 May 2011 17:51:21 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av0EAONQv015LBza/2dsb2JhbACEUKE+eLQckF4OgRyDVYEBBJ0t Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 03 May 2011 10:21:18 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QH3pa-0002D5-KL; Tue, 03 May 2011 10:51:14 +1000 Date: Tue, 3 May 2011 10:51:14 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110503005114.GE2978@dastard> References: <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304383882 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62607 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 02, 2011 at 12:59:50PM -0700, Christian Kujau wrote: > On Mon, 2 May 2011 at 22:19, Dave Chinner wrote: > > Yes. Try 2 orders of magnitude as a start. i.e change it to 10000... > > I've run the -12 test with vfs_cache_pressure=200 and now the -13 test > with vfs_cache_pressure=10000. The OOM killer still kicks in, but the > machine seems to be more usable afterwards and does not get totally stuck: > > http://nerdbynature.de/bits/2.6.39-rc4/oom/ > - messages-12.txt.gz & slabinfo-12.txt.bz2 > * oom-debug.sh invoked oom-killer at 01:27:11 > * sysrq-w works until 01:27:08, but got killed by oom > > - messages-13.txt.gz & slabinfo-13.txt.bz2 > * find invoked oom-killer at 08:44:07 > * sysrq-w works until 08:45:48 (listing jbd2/hda6-8), then > my debug script got killed So before the OOM killer kicks in, kswapd is stuck in congestion_wait(), and after a number of oom-kills over a 5s period it is still in congestion_wait(). 7s later it is still in congestion_wait() and the oom-killer starts up again, with kswapd still being in congestion_wait() when the oom-killer stops again 3s later. Ok, so kswapd being stuck in congestion wait means it can only be in balance_pgdat() and it thinks that it is getting into trouble. Looking at the OOM output: active_anon:7992 inactive_anon:8714 isolated_anon:0 active_file:5995 inactive_file:73780 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 free:35263 slab_reclaimable:182652 slab_unreclaimable:3224 mapped:6929 shmem:199 pagetables:396 bounce:0 DMA free:3436kB min:3532kB low:4412kB high:5296kB active_anon:0kB inactive_anon:0kB active_file:236kB inactive_file:248kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:780288kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:730608kB slab_unreclaimable:12896kB kernel_stack:1032kB pagetables:1584kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:680 all_unreclaimable? yes lowmem_reserve[]: 0 0 508 508 HighMem free:137616kB min:508kB low:1096kB high:1684kB active_anon:31968kB inactive_anon:34856kB active_file:23744kB inactive_file:294872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB mlocked:0kB dirty:0kB writeback:0kB mapped:27708kB shmem:796kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 There are no isolated pages, so that means we aren't in the congestion_wait() call related to having too many isolated pages. We see that the ZONE_DMA is all_unreclaimable and had 680 pages scanned. ZONE_HIGHMEM had _zero_ pages scanned, which means it must be over the high water marks for free memory and so no attempt is made to reclaim from this zone. That means lru_pages is set to zone_reclaimable_pages(ZONE_DMA), which at this point in time would be: active_anon:0kB inactive_anon:0kB active_file:236kB inactive_file:248kB about 484k or 121 pages. To get all_unreclaimable set, the shrink_slab() call must have returned zero to indicate it didn't free anything. So the first pass through would have passed that to shrink_slab, and asumming they are all mapped pages we'd end up with nr_scanned = 242. For the xfs inode cache with 600,000 reclaimable inodes, this would have resulted in: max_pass = 600000 delta = 4 * 242 / 2 = 484 delta = 484 * 600,000 = 290,400,000 delta = 290,400,000 / 121 + 1 Ù= 2,380,327 shrinker->nr += delta if (shrinker->nr > max_pass * 2) shrinker->nr = max_pass * 2; = 1,200,000 So, the shrinker->nr should be well above zero, even in the worst case. The question is now: how on earth is it returning zero? Two cases: if the shrinker returns -1, or because the cache is growing: nr_before = (*shrinker->shrink)(shrinker, 0, gfp_mask); shrink_ret = (*shrinker->shrink)(shrinker, this_scan, gfp_mask); if (shrink_ret == -1) break; if (shrink_ret < nr_before) ret += nr_before - shrink_ret; So, first case will happen for XFS when: if (!(gfp_mask & __GFP_FS)) return -1; In most of the OOM-killer invocations, the stack trace is: out_of_memory+0x27c/0x360 __alloc_pages_nodemask+0x6f8/0x708 new_slab+0x1fc/0x234 T.915+0x1f8/0x388 kmem_cache_alloc+0x11c/0x124 kmem_zone_alloc+0xa4/0x114 xfs_inode_alloc+0x40/0x13c xfs_iget+0x2a8/0x620 xfs_lookup+0xf8/0x114 xfs_vn_lookup+0x5c/0xb0 d_alloc_and_lookup+0x54/0x90 do_lookup+0x248/0x2bc path_lookupat+0xfc/0x8f4 do_path_lookup+0x34/0xac user_path_at+0x64/0xb4 vfs_fstatat+0x58/0xbc sys_fstatat64+0x24/0x50 ret_from_syscall+0x0/0x38 So we are not preventing reclaim via the gfp_mask. That leaves the other case, where the number of reclaimable inodes is growing faster than the shrinker is freeing them. I can't really see how that is possible with a single CPU machine without prempt enabled and, apparently, no dirty inodes. Inode reclaim should not block (shrinker or background), so there's something else going on here. Can you run an event trace of all the XFS events during a find for me? Don't do it over the entire subset of the filesystem - only 100,000 inodes is sufficient (i.e. kill the find once the xfs inode cache slab reaches 100k inodes. While still running the event trace, can you then drop the caches (echo 3 > /proc/sys/vm/drop_caches) and check that the xfs inode cache is emptied? If it isn't emptied, drop caches again to see if that empties it. If you coul dthen post the event trace, I might be able to see what is going strange with the shrinker and/or reclaim. Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Mon May 2 22:15:25 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p433FOXY257361 for ; Mon, 2 May 2011 22:15:25 -0500 X-ASG-Debug-ID: 1304392739-373103450000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 68FE911C6459 for ; Mon, 2 May 2011 20:18:59 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id kWwIz1X7x2fNH8nI for ; Mon, 02 May 2011 20:18:59 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAHhwv015LBza/2dsb2JhbACmEHiIcrwTDoVyBJxuRQ Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 03 May 2011 12:48:58 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QH68W-0002Qe-Iq; Tue, 03 May 2011 13:18:56 +1000 Date: Tue, 3 May 2011 13:18:56 +1000 From: Dave Chinner To: Paul Anderson Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS/Linux Sanity check Subject: Re: XFS/Linux Sanity check Message-ID: <20110503031856.GA9114@dastard> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304392740 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62616 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 02, 2011 at 11:47:48AM -0400, Paul Anderson wrote: > Our genetic sequencing research group is growing our file storage from > 1PB to 2PB. ..... > We are deploying five Dell 810s, 192GiB RAM, 12 core, each with three > LSI 9200-8E SAS controllers, and three SuperMicro 847 45 drive bay > cabinets with enterprise grade 2TB drives. So roughly 250TB raw capacity per box. > We're running Ubuntu 10.04 LTS, and have tried either the stock kernel > (2.6.32-30) or 2.6.35 from linux.org. (OT: why do people install a desktop OS on their servers?) > We organize the storage as one > software (MD) RAID 0 composed of 7 software RAID (MD) 6s, each with 18 > drives, giving 204 TiB usable (9 drives of the 135 are unused). That's adventurous. I would serious consider rethinking this - hardware RAID-6 with controllers that have ia significant amount of BBWC is much more appropriate for this scale of storage. You get an unclean shutdown (e.g. power loss) and MD is going to take _weeks_ to resync those RAID6 arrays. Background scrubbing is likely to never cease, either.... Also, knowing how you spread out the disks in each RAID-6 group between controllers, trays, etc as that has important performance and failure implications. e.g. I'm guessing that you are taking 6 drives from each enclosure for each 18-drive raid-6 group, which would split the RAID-6 group across all three SAS controllers and enclosures. That means if you lose a SAS controller or enclosure you lose all RAID-6 groups at once which is effectively catastrophic from a recovery point of view. It also means that one slow controller slows down everything so load balancing is difficult. Large stripes might look like a good idea, buti when you get to this scale concatenation of high throughput LUNs provides better throughput because of less contention through the storage controllers and enclosures. > XFS > is set up properly (as far as I know) with respect to stripe and chunk > sizes. Any details? You might be wrong ;) > Allocation groups are 1TiB in size, which seems sane for the > size of files we expect to work with. Any filesystem over 16TB will use 1TB AGs. > In isolated testing, I see around 5GiBytes/second raw (135 parallel dd > reads), and with a benchmark test of 10 simultaneous 64GiByte dd > commands, I can see just shy of 2 GiBytes/second reading, and around > 1.4GiBytes/second writing through XFS. The benchmark is crude, but > fairly representative of our expected use. If you want insightful comments, then you'll need to provide intimate details of the tests your ran and the results (e.g. command lines, raw results, etc). > md apparently does not support barriers, so we are badly exposed in > that manner, I know. As a test, I disabled write cache on all drives, > performance dropped by 30% or so, but since md is apparently the > problem, barriers still didn't work. Doesn't matter if you have BBWC on your hardware RAID controllers. Seriously, if you want to sustain high throughput, you want a large amount of BBWC in front your disks.... > Nonetheless, what we need, but don't have, is stability. > > With 2.6.32-30, we get reliable kernel panics after 2 days of > sustained rsync to the machine (around 150-250MiBytes/second for the > entire time - the source machines are slow), Stack traces from the crash? > and with 2.6.35, we get a > bad resource contention problem fairly quickly - much less than 24 > hours (in this instance, we start getting XFS kernel thread timeouts > similar to what I've seen posted here recently, but it isn't clear > whether it is only XFS or also ext3 boot drives that are starved for > I/O - suspending or killing all I/O load doesn't solve the problem - > only a reboot does). Details of the timeout messages? > Ideally, I'd firstly be able to find informed opinions about how I can > improve this arrangement - we are mildly flexible on RAID controllers, > very flexible on versions of Linux, etc, and can try other OS's as a > last resort (but the leading contender here would be "something" > running ZFS, and though I love ZFS, it really didn't seem to work well > for our needs). > > Secondly, I welcome suggestions about which version of the linux > kernel you'd prefer to hear bug reports about, as well as what kinds > of output is most useful (we're getting all chassis set up with serial > console so we can do kgdb and also full kernel panic output results). If you want to stay on mainline kernels with best-effort community support, I'd suggest 2.6.38 or more recent kernels are the only ones we're going to debug. If you want fixes, then running the curent -rc kernels is probably a good idea. It's unlikely you'll get anyone backporting fixes for you to older kernels. Alternatively, you can switch to something like RHEL (or SLES) where XFS is fully supported (and in the RHEL case, pays my bills :). The advantage of this is that once the bug is fixed in mainline, it will get backported to the supported kernel you are running. Cheers, Dave. -- Dave Chinner david@fromorbit.com From lists@nerdbynature.de Mon May 2 23:01:17 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4341Hdb258218 for ; Mon, 2 May 2011 23:01:17 -0500 X-ASG-Debug-ID: 1304395492-503501960000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 95E7C11C6221 for ; Mon, 2 May 2011 21:04:53 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id oMgu2lNO4gvFFolF for ; Mon, 02 May 2011 21:04:53 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id 85A403DDC9; Tue, 3 May 2011 06:04:52 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id A65393DD45; Tue, 3 May 2011 06:04:40 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 7F74D3DBCA; Tue, 3 May 2011 06:04:40 +0200 (CEST) Date: Mon, 2 May 2011 21:04:40 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: <20110503005114.GE2978@dastard> Message-ID: References: <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304395493 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62618 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, 3 May 2011 at 10:51, Dave Chinner wrote: > Can you run an event trace of all the XFS events during a find for > me? Don't do it over the entire subset of the filesystem - only You mean "event tracing", as in Documentation/trace/events.txt. For that I will have to enable CONFIG_FTRACE and CONFIG_FUNCTION_TRACER and probably others, right? Looking at http://lwn.net/Articles/341899, I see CONFIG_EVENT_TRACING and the way to enable event tracing for "all events in fs/xfs" would be: echo 1 > /sys/kernel/debug/tracing/events/xfs/enable > 100,000 inodes is sufficient (i.e. kill the find once the xfs inode > cache slab reaches 100k inodes. While still running the event trace, > can you then drop the caches (echo 3 > /proc/sys/vm/drop_caches) and > check that the xfs inode cache is emptied? If it isn't emptied, drop > caches again to see if that empties it. If you coul dthen post the > event trace, I might be able to see what is going strange with the > shrinker and/or reclaim. Will try to do all that. I wonder why nobody else is affected by this. Because nobody else runs powerpc or UP any more? I'm sure other people's filesystems are way bigger than mine, with much more inodes to cache... Thanks for your time, Christian. -- BOFH excuse #136: Daemons loose in system. From david@fromorbit.com Tue May 3 01:33:27 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p436XQv7003338 for ; Tue, 3 May 2011 01:33:27 -0500 X-ASG-Debug-ID: 1304404621-646002d80000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail05.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id DDC66157AC69 for ; Mon, 2 May 2011 23:37:02 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id mWJSq5a3kluJ4QJN for ; Mon, 02 May 2011 23:37:02 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAAShv015LBza/2dsb2JhbACmDniIcrtlDoVyBJ0z Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail05.adl6.internode.on.net with ESMTP; 03 May 2011 16:07:00 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QH9EA-0002gG-3Q; Tue, 03 May 2011 16:36:58 +1000 Date: Tue, 3 May 2011 16:36:58 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110503063657.GB9114@dastard> References: <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail05.adl6.internode.on.net[150.101.137.143] X-Barracuda-Start-Time: 1304404622 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62629 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 02, 2011 at 09:04:40PM -0700, Christian Kujau wrote: > On Tue, 3 May 2011 at 10:51, Dave Chinner wrote: > > Can you run an event trace of all the XFS events during a find for > > me? Don't do it over the entire subset of the filesystem - only > > You mean "event tracing", as in Documentation/trace/events.txt. For > that I will have to enable CONFIG_FTRACE and CONFIG_FUNCTION_TRACER and > probably others, right? > > Looking at http://lwn.net/Articles/341899, I see CONFIG_EVENT_TRACING > and the way to enable event tracing for "all events in fs/xfs" would be: > > echo 1 > /sys/kernel/debug/tracing/events/xfs/enable Download trace-cmd and use that - more efficient and easier to specify the events to record... > > 100,000 inodes is sufficient (i.e. kill the find once the xfs inode > > cache slab reaches 100k inodes. While still running the event trace, > > can you then drop the caches (echo 3 > /proc/sys/vm/drop_caches) and > > check that the xfs inode cache is emptied? If it isn't emptied, drop > > caches again to see if that empties it. If you coul dthen post the > > event trace, I might be able to see what is going strange with the > > shrinker and/or reclaim. > > Will try to do all that. > > I wonder why nobody else is affected by this. Because nobody else runs > powerpc or UP any more? I'm sure other people's filesystems are way bigger > than mine, with much more inodes to cache... XFS on uniprocessor, 32 bit, highmem system is pretty rare. Let alone on an old powerpc platform. It's so far out in left field that I'd expect the only test coverage we get is your laptop.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From BATV+ebbc15e8e05355d690e1+2809+infradead.org+hch@bombadil.srs.infradead.org Tue May 3 03:40:25 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_FRT_LOLITA1 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p438eNvj007433 for ; Tue, 3 May 2011 03:40:25 -0500 X-ASG-Debug-ID: 1304412240-1cc300770000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 59BD4157E79B for ; Tue, 3 May 2011 01:44:00 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id I7tDjzusF1I04kYW for ; Tue, 03 May 2011 01:44:00 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHBD5-0005e7-Pa for xfs@oss.sgi.com; Tue, 03 May 2011 08:43:59 +0000 Date: Tue, 3 May 2011 04:43:59 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH v2] xfstests: support post-udev device mapper nodes Subject: [PATCH v2] xfstests: support post-udev device mapper nodes Message-ID: <20110503084359.GA21704@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304412240 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Because of udevs complaining device mapper now creates /dev/dm-N as the real device nodes, and just symlinks the /dev/mapper/ names to it. This would be easy if everything used the /dev/mapper clear names, but most system utilities translate them back to the /dev/mapper/ names and thus confuse various test cases. Add support to _is_block_dev to read symlinks, and add documentation on how to run xfstests on device mapper volumes. Signed-off-by: Christoph Hellwig Index: xfstests-dev/common.rc =================================================================== --- xfstests-dev.orig/common.rc 2011-05-03 08:26:22.000000000 +0000 +++ xfstests-dev/common.rc 2011-05-03 08:28:01.000000000 +0000 @@ -587,7 +587,14 @@ _is_block_dev() exit 1 fi - [ -b $1 ] && src/lstat64 $1 | $AWK_PROG '/Device type:/ { print $9 }' + _dev=$1 + if [ -L "${_dev}" ]; then + _dev=`readlink -f ${_dev}` + fi + + if [ -b "${_dev}" ]; then + src/lstat64 ${_dev} | $AWK_PROG '/Device type:/ { print $9 }' + fi } # Do a command, log it to $seq.full, optionally test return status Index: xfstests-dev/README.device-mapper =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfstests-dev/README.device-mapper 2011-05-03 08:27:00.000000000 +0000 @@ -0,0 +1,8 @@ + +To use xfstests on device mapper always use the /dev/mapper/ symlinks, +not the /dev/dm-* devices, or the symlinks created by LVM. + +For example: + +TEST_DEV=/dev/mapper/test +SCRATCH_DEV=/dev/mapper/scratch From michael.monnerie@is.it-management.at Tue May 3 03:54:52 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p438sq4c007710 for ; Tue, 3 May 2011 03:54:52 -0500 X-ASG-Debug-ID: 1304413105-50c300ad0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mailsrv14.zmi.at (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 5613611C291C for ; Tue, 3 May 2011 01:58:26 -0700 (PDT) Received: from mailsrv14.zmi.at (mailsrv1.zmi.at [212.69.164.54]) by cuda.sgi.com with ESMTP id tVw2hU2xsKHoSxvB for ; Tue, 03 May 2011 01:58:26 -0700 (PDT) Received: from mailsrv.i.zmi.at (h081217106033.dyn.cm.kabsi.at [81.217.106.33]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailsrv2.i.zmi.at", Issuer "power4u.zmi.at" (not verified)) by mailsrv14.zmi.at (Postfix) with ESMTPSA id 7D4D3400; Tue, 3 May 2011 10:58:24 +0200 (CEST) Received: from saturn.localnet (saturn.i.zmi.at [10.72.27.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mailsrv.i.zmi.at (Postfix) with ESMTPSA id CC0DE401C3A; Tue, 3 May 2011 10:58:23 +0200 (CEST) From: Michael Monnerie Organization: it-management http://it-management.at To: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS/Linux Sanity check Subject: Re: XFS/Linux Sanity check Date: Tue, 3 May 2011 10:58:22 +0200 User-Agent: KMail/1.13.6 (Linux/2.6.37.1-1.2-desktop; KDE/4.6.0; x86_64; ; ) Cc: Dave Chinner References: <20110503031856.GA9114@dastard> In-Reply-To: <20110503031856.GA9114@dastard> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart3373778.hURPxmJAeC"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <201105031058.23294@zmi.at> X-Barracuda-Connect: mailsrv1.zmi.at[212.69.164.54] X-Barracuda-Start-Time: 1304413107 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62638 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean --nextPart3373778.hURPxmJAeC Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable On Dienstag, 3. Mai 2011 Dave Chinner wrote: > > We're running Ubuntu 10.04 LTS, and have tried either the stock > > kernel (2.6.32-30) or 2.6.35 from linux.org. >=20 > (OT: why do people install a desktop OS on their servers?) I can only tell for us: We use openSUSE, and tried with SLES, but=20 support on openSUSE is better despite you pay for SLES, and openSUSE is=20 (by nature) newer in every package. And that gives you (nearly) actual=20 XFS improvements due to newer kernels and therefore more performance=20 (e.g. delaylog). =2D-=20 mit freundlichen Gr=FCssen, Michael Monnerie, Ing. BSc it-management Internet Services: Prot=E9ger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 // ****** Radiointerview zum Thema Spam ****** // http://www.it-podcast.at/archiv.html#podcast-100716 //=20 // Haus zu verkaufen: http://zmi.at/langegg/ --nextPart3373778.hURPxmJAeC Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) iEYEABECAAYFAk2/w68ACgkQzhSR9xwSCbR1CQCg6deu7s+swqH7OvxTvqKW5DZI yl8AniADU95X8WGW/IS8F0vyYAt+CrZd =+9jw -----END PGP SIGNATURE----- --nextPart3373778.hURPxmJAeC-- From nscott@aconex.com Tue May 3 05:04:48 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43A4mdw009881 for ; Tue, 3 May 2011 05:04:48 -0500 X-ASG-Debug-ID: 1304417303-5bc902430000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from postoffice2.aconex.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id F11C711C712E for ; Tue, 3 May 2011 03:08:24 -0700 (PDT) Received: from postoffice2.aconex.com (mail.aconex.com [203.166.49.3]) by cuda.sgi.com with ESMTP id eF91KrUA23M45VJr for ; Tue, 03 May 2011 03:08:24 -0700 (PDT) Received: from postoffice.aconex.com (postoffice.yarra.acx [192.168.35.100]) by postoffice2.aconex.com with ESMTP id qoAxxqaF4RpJiIft; Tue, 03 May 2011 20:08:22 +1000 (EST) Received: from gatekeeper.aconex.com (gatekeeper.yarra.acx [192.168.35.102]) by postoffice.aconex.com (Postfix) with ESMTP id 341D2A50114; Tue, 3 May 2011 20:08:22 +1000 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by gatekeeper.aconex.com (Postfix) with ESMTP id CB915A00005; Tue, 3 May 2011 19:49:03 +1000 (EST) X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Scanned: amavisd-new at aconex.com Received: from gatekeeper.aconex.com ([127.0.0.1]) by localhost (gatekeeper.aconex.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vBVLfIMgfV4V; Tue, 3 May 2011 19:49:03 +1000 (EST) Received: from acxmail-au2.aconex.com (acxmail-au2.aconex.com [192.168.35.104]) by gatekeeper.aconex.com (Postfix) with ESMTP id 42DDAA00004; Tue, 3 May 2011 19:49:03 +1000 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by acxmail-au2.aconex.com (Postfix) with ESMTP id 95C0D3B20002; Tue, 3 May 2011 20:08:21 +1000 (EST) X-Virus-Scanned: amavisd-new at aconex.com Received: from acxmail-au2.aconex.com ([127.0.0.1]) by localhost (acxmail-au2.aconex.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id onZfh2myRUNs; Tue, 3 May 2011 20:08:21 +1000 (EST) Received: from acxmail-au2.aconex.com (acxmail-au2.aconex.com [192.168.35.104]) by acxmail-au2.aconex.com (Postfix) with ESMTP id 709D33B20001; Tue, 3 May 2011 20:08:21 +1000 (EST) Date: Tue, 3 May 2011 20:08:21 +1000 (EST) From: Nathan Scott To: aelder@sgi.com Cc: xfs@oss.sgi.com Message-ID: <408615549.38509.1304417301357.JavaMail.root@acxmail-au2.aconex.com> In-Reply-To: <1304366623.3077.44.camel@doink> X-ASG-Orig-Subj: Re: [PATCH] xfsprogs - resolve Debian readline build issue Subject: Re: [PATCH] xfsprogs - resolve Debian readline build issue MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [220.237.107.90] X-Mailer: Zimbra 6.0.12_GA_2888 (ZimbraWebClient - SAF3 (Mac)/6.0.12_GA_2883) X-Virus-Scanned: by bsmtpd at aconex.com X-Barracuda-Connect: mail.aconex.com[203.166.49.3] X-Barracuda-Start-Time: 1304417304 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62644 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Status: Clean ----- Original Message ----- > On Fri, 2011-04-29 at 09:15 +1000, Nathan Scott wrote: > > Address the recently reported build issue with libreadline5/6, via > > the gplv2 route. Since this appears to be a relatively recent pkg, > > I made its use conditional so the deb build continues to work for > > everyone not running a bleeding edge distro. Works For Me (tm). > > > > This addresses Debian bug 553875: libreadline5-dev removal pending > > As far as I'm concerned, this looks fine. > > I understand the issue, but I'm not really > familiar with the Debian build/dependency > system at work here, so I'm not really a > very qualified reviewer. But if you're > comfortable with it and get nobody else to > review it, you are welcome to use my sign-off. Thanks! I'll give it a couple more days then commit it if noone else has further input. cheers. -- Nathan From powool@gmail.com Tue May 3 11:01:33 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_DKIM_INVALID, T_FILL_THIS_FORM autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43G1Wk0020062 for ; Tue, 3 May 2011 11:01:32 -0500 X-ASG-Debug-ID: 1304438708-7bac037c0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-qw0-f53.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7538F4290B1 for ; Tue, 3 May 2011 09:05:08 -0700 (PDT) Received: from mail-qw0-f53.google.com (mail-qw0-f53.google.com [209.85.216.53]) by cuda.sgi.com with ESMTP id 8k0odv3gMPsIiJzg for ; Tue, 03 May 2011 09:05:08 -0700 (PDT) Received: by qwb7 with SMTP id 7so138627qwb.26 for ; Tue, 03 May 2011 09:05:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZzLpILro49Xh+JkCXazt3tw3FbXdhpAwH4UsNoKj8qk=; b=goGHQntg+BoCoQ7Z/kn+dM+4nO3L8Kvwsv1MB3pj0tyfe7z7Y7Vsmhnt6opuROy+GT yBPOeIgbBleewMhc+bO+y+3HTMVVEyFqtplyveJAaxMYOWPQcle6Kjitm062VjhcIF8q B0In8Wr/1j0/NkxiOeKJZC+bh+jiXzhbNHFbA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=geHZnckecq+Jv8ffssUSVZwhyjw82oscy+CRmaZbsLJsvcSL6upzH2t38iBHf8ODE7 WE0OMK4mfGP36aGZUTuWqqGsqVMpuYMEkN+DaCFjp/Fnub7TguZr3OIj9qtoXRoRUq0f CBnNp2i5EJb2fwxtN1N/EPQT+998R9L2PIKL4= MIME-Version: 1.0 Received: by 10.224.37.144 with SMTP id x16mr24619qad.102.1304438708580; Tue, 03 May 2011 09:05:08 -0700 (PDT) Sender: powool@gmail.com Received: by 10.224.45.144 with HTTP; Tue, 3 May 2011 09:05:08 -0700 (PDT) In-Reply-To: <20110503031856.GA9114@dastard> References: <20110503031856.GA9114@dastard> Date: Tue, 3 May 2011 12:05:08 -0400 X-Google-Sender-Auth: Ye-evMtZI83H8IB9rGb83Ez7xss Message-ID: X-ASG-Orig-Subj: Re: XFS/Linux Sanity check Subject: Re: XFS/Linux Sanity check From: Paul Anderson To: Dave Chinner Cc: xfs@oss.sgi.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: mail-qw0-f53.google.com[209.85.216.53] X-Barracuda-Start-Time: 1304438709 X-Barracuda-Bayes: INNOCENT GLOBAL 0.1898 1.0000 -0.8818 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -0.38 X-Barracuda-Spam-Status: No, SCORE=-0.38 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M, DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62667 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Dave, thanks for your feedback - comments below - possibly of interest to others. Several underlying assumptions strongly influence my choices that I've made here. Sequential I/O is of paramount importance - all else is nearly insignificant (not entirely true, but a reasonable plan for the coming year or two). Highly I/O intensive work can/should be done locally to avoid networking (NFS and 10 GigE just add more delays - later, research could be done to saturate a 10GigE link in a variety of other ways, but is of secondary concern to me today). Compute intensive workloads will start looking more random because we'll send those out to the grid and large numbers of incoming requests makes the I/O stream less predictable. Mind you, I envision eliminating NFS or any other network filesystem in favor of straight TCP/IP or even something like RoCE from Redhat. With proper buffering, even serving data like this can look sequential by and large. The team here favors large filesystems because from the user perspective it is simply easier than having to juggle space among distinct partitions. The easy admninistrative solution of splitting 204TiB into say 7 mounted volumes really imposes a big barrier to how work is organized, and further wastes storage. I believe that typical working file sizes will exceed 100GiB within a year or two - for example, one project is generating 250 sequencing sample files each of which is 250 GiB in size, which we need to pull, reprocess, and analyze. This is fallout from the fact that there is a very rapid drop in the cost of genome sequencing that is still underway. On Mon, May 2, 2011 at 11:18 PM, Dave Chinner wrote: > On Mon, May 02, 2011 at 11:47:48AM -0400, Paul Anderson wrote: >> Our genetic sequencing research group is growing our file storage from >> 1PB to 2PB. > ..... >> We are deploying five Dell 810s, 192GiB RAM, 12 core, each with three >> LSI 9200-8E SAS controllers, and three SuperMicro 847 45 drive bay >> cabinets with enterprise grade 2TB drives. > > So roughly 250TB raw capacity per box. > >> We're running Ubuntu 10.04 LTS, and have tried either the stock kernel >> (2.6.32-30) or 2.6.35 from linux.org. > > (OT: why do people install a desktop OS on their servers?) Our end users want many GUI based apps running on the compute head nodes, ergo we wind up installing most of the desktop anyway, so it is just easier to install it and add whatever server related packages we may need. I'm not fond of that situation myself. > >> We organize the storage as one >> software (MD) RAID 0 composed of 7 software RAID (MD) 6s, each with 18 >> drives, giving 204 TiB usable (9 drives of the 135 are unused). > > That's adventurous. I would serious consider rethinking this - > hardware RAID-6 with controllers that have ia significant amount of > BBWC is much more appropriate for this scale of storage. You get an > unclean shutdown (e.g. power loss) and MD is going to take _weeks_ > to resync those RAID6 arrays. Background scrubbing is likely to > never cease, either.... 18 hours from start - remember the sync is proceeding at over 4GiBytes/sec (14.5 hours if exactly 4 GiBytes/second). The big problem with my setup is lack of BBWC. They are running in JBOD mode, and I can disable per drive write cache and still maintain decent performance across the array. That said, there are few if any cases where we care about loss of in-flight data - we care a great deal about static data that is corrupted or lost due to metadata corruption, so this is still probably an open issue (ideas welcome). > Also, knowing how you spread out the disks in each RAID-6 group > between controllers, trays, etc as that has important performance > and failure implications. You bet! > e.g. I'm guessing that you are taking 6 drives from each enclosure > for each 18-drive raid-6 group, which would split the RAID-6 group > across all three SAS controllers and enclosures. That means if you > lose a SAS controller or enclosure you lose all RAID-6 groups at > once which is effectively catastrophic from a recovery point of view. > It also means that one slow controller slows down everything so load > balancing is difficult. Each of the three enclosures has a pair of SAS expanders, and each LSI 9200-8e controller has two SAS cables, so I actually ordered the RAID-6 drive sets as subsets of three, each from successive distinct controller cards in a round robin fashion until you have a full set of 18 drives. A wrinkle is that the SAS expanders have differing numbers of drives - 24 front, 21 rear (the other 3 on the rear are taken by the power supplies). So to finding a good match of RAID size versus available channels and splitting I/O across those channels is a bit challenging. > Large stripes might look like a good idea, buti when you get to this > scale concatenation of high throughput LUNs provides better > throughput because of less contention through the storage > controllers and enclosures. I don't disagree, but what I need to do is run a scripted test varying stripe size, stripe units, chunk size (md parameter), etc - this gets cumbersome with the 135 drives, as trying to get good balances across the available resources is tedious and not automatic. Basically, I found a combo (described immediately below) that works pretty well, and started working on other problems than performance. I have sufficient hardware to test other combinations, but time to run them is an issue for me. (ie set them up precisely right, babysit them, wait for parity to build, then test - yes, I tested on various subsets of the full 126 drive array, but getting those configs right and then knowing you can extrapolate to the full size set is confusing and hurts my poor little head) > >> XFS >> is set up properly (as far as I know) with respect to stripe and chunk >> sizes. > > Any details? You might be wrong ;) Oh yes indeedy, I could be wrong! Each of the 126 in use drives show something like this: /dev/sdbc1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : f3c44896:ecdcadca:153ee6d1:1770781f Name : louie:5 (local to host louie) Creation Time : Fri Apr 8 15:01:16 2011 Raid Level : raid6 Raid Devices : 18 Avail Dev Size : 3907026856 (1863.02 GiB 2000.40 GB) Array Size : 62512429056 (29808.25 GiB 32006.36 GB) Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : adbd8716:94ebf4a2:ea753ee0:418b7bd8 Update Time : Tue May 3 11:18:45 2011 Checksum : 44d36ef7 - correct Events : 187 Chunk Size : 64K There are 7 RAID-6 arrays, each of which look like this: /dev/md0: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : cbb4b32e:afc7126a:922e501d:9404011e Name : louie:8 (local to host louie) Creation Time : Fri Apr 8 15:02:20 2011 Raid Level : raid0 Raid Devices : 7 Avail Dev Size : 62512429048 (29808.25 GiB 32006.36 GB) Used Dev Size : 0 Data Offset : 8 sectors Super Offset : 0 sectors State : active Device UUID : 94bfd084:138f8ca5:2938df2e:1ef0b76d Update Time : Fri Apr 8 15:02:20 2011 Checksum : d733d87a - correct Events : 0 Chunk Size : 1024K Array Slot : 0 (0, 1, 2, 3, 4, 5, 6) Array State : Uuuuuuu The seven RAID 6 devices are concatenated into a RAID 0: /dev/md8: Version : 01.01 Creation Time : Fri Apr 8 15:02:20 2011 Raid Level : raid0 Array Size : 218793494528 (208657.74 GiB 224044.54 GB) Raid Devices : 7 Total Devices : 7 Preferred Minor : 8 Persistence : Superblock is persistent Update Time : Fri Apr 8 15:02:20 2011 State : clean Active Devices : 7 Working Devices : 7 Failed Devices : 0 Spare Devices : 0 Chunk Size : 1024K Name : louie:8 (local to host louie) UUID : cbb4b32e:afc7126a:922e501d:9404011e Events : 0 Number Major Minor RaidDevice State 0 9 0 0 active sync /dev/block/9:0 1 9 1 1 active sync /dev/block/9:1 2 9 2 2 active sync /dev/block/9:2 3 9 3 3 active sync /dev/block/9:3 4 9 4 4 active sync /dev/block/9:4 5 9 5 5 active sync /dev/block/9:5 6 9 6 6 active sync /dev/block/9:6 The xfs_info for the mounted volume is: meta-data=3D/dev/md8 isize=3D256 agcount=3D204, agsize=3D2= 68435440 blks =3D sectsz=3D512 attr=3D2 data =3D bsize=3D4096 blocks=3D54698373632, ima= xpct=3D1 =3D sunit=3D16 swidth=3D256 blks naming =3Dversion 2 bsize=3D4096 ascii-ci=3D0 log =3Dinternal bsize=3D4096 blocks=3D521728, version= =3D2 =3D sectsz=3D512 sunit=3D16 blks, lazy-cou= nt=3D1 realtime =3Dnone extsz=3D4096 blocks=3D0, rtextents=3D0 The sunit/swidth parameters are chosen to exactly match the RAID-6 device, not the RAID-0. Mount options are negligible, although I will be trying this: UUID=3D0a675b55-d68a-41f2-8bb7-063e33123531 /exports xfs inode64,largeio,logbufs=3D8,noatime 0 2 All disk drives (almost a thousand here now) are Hitachi HUA72202 2TB enterprise drives. We did a failed experiment awhile back with desktop drives... never again. > >> Allocation groups are 1TiB in size, which seems sane for the >> size of files we expect to work with. > > Any filesystem over 16TB will use 1TB AGs. > >> In isolated testing, I see around 5GiBytes/second raw (135 parallel dd >> reads), and with a benchmark test of 10 simultaneous 64GiByte dd >> commands, I can see just shy of 2 GiBytes/second reading, and around >> 1.4GiBytes/second writing through XFS. =A0 The benchmark is crude, but >> fairly representative of our expected use. > > If you want insightful comments, then you'll need to provide > intimate details of the tests your ran and the results (e.g. command > lines, raw results, etc). To test RAW read rates, I do this: for i in /dev/sd[b-z] /dev/sd[a-z][a-z] ; do dd if=3D$i of=3D/dev/null bs=3D1024k & done killall dd gets rid of them. I use "dstat 1" to check what the kernel thinks is happening. For filesystems test (configured and mounted as I described above with the mdadm commands and xfs_info), I do this: for load in 0 1 2 3 4 5 6 7 8 9 ; do dd if=3D/dev/zero of=3D/exports/load_$load$step bs=3D1024k count=3D= 32768 & done Later to test read, I do: for load in 0 1 2 3 4 5 6 7 8 9 ; do dd of=3D/dev/null if=3D/exports/load_$load bs=3D1024 & done In both cases, I watch I/O rates after the buffers overflow - with 192GB of RAM, this takes a few seconds. For giggles, I've allowed the read commands to cache 20-100GB in RAM, then rerun the read test to see what a cached read rate looks like - interestingly, the aggregate dd reported I/O rate in that case is around 5GiBytes/second, indicating that is approaching something of an upper limit for this particular chassis. I am fully aware that this is a simplified test. I'm also quite familiar with the workload, and know this is a reasonable facsimile of what we do. Better real world benchmarking for us now comprises of end user jobs - day long jobs on a single sequencing run using a bunch of home grown software. > >> md apparently does not support barriers, so we are badly exposed in >> that manner, I know. =A0As a test, I disabled write cache on all drives, >> performance dropped by 30% or so, but since md is apparently the >> problem, barriers still didn't work. > > Doesn't matter if you have BBWC on your hardware RAID > controllers. Seriously, if you want to sustain high throughput, you > want a large amount of BBWC in front your disks.... Here we talk performance expectations and goals - from my testing so far, I can reasonably say I'm happy with the performance of the software RAID with XFS running on top of that. What I need now are stability and robustness in the face of crashes. I'm still perfectly willing to buy good HW RAID cards, don't get me wrong, but their main benefit to me will be the battery backed cache, not the performance. Keep in mind that it is hard to balance a HW RAID card across multiple SAS expanders - you can certainly get a -16e card of some sort, but then it does ALL of the I/O to those 4 expanders ALL of the time. I'm not sure that is a win, either. Cheaper cards, one per expander might work, though (but with six 8x slots available, probably a HW RAID card with 8e would be the best - run two expanders per card as I do now). > >> Nonetheless, what we need, but don't have, is stability. >> >> With 2.6.32-30, we get reliable kernel panics after 2 days of >> sustained rsync to the machine (around 150-250MiBytes/second for the >> entire time - the source machines are slow), > > Stack traces from the crash? Mostly a non-responsive console and kgdb was not set up at the time - I am trying to get this set up now. Here's the one stack trace I wrote down from the console (again from a 2.6.32-30 kernel): RSP 0018:ffff880dcce39e48 E FLAGS 287 _spin_lock+0xe/0x20 futex_wake+0x7d/0x130 handle_nm_fault+0x1a8/0x3c0 do_futex+0x68/0x1b0 sys_futex+0x7b/0x170 do_page_fault+0x158/0x3b0 system_call_fastpath+0x16/0x1b All other info lost - other crashes result in a locked console that we've not been able to revive. The load on the system at the time of the crash was simply 3-4 rsync's copying data via 'ssh -c arcfour' over to the XFS filesystem (basically loading up the test server with user data for further testing). Sustained I/O rates were moderate - 200-400MiBytes/second. No swap, CPU load of significance or user jobs. Obviously, this is an old kernel and of less interest, but nonetheless answers your question. > >> and with 2.6.35, we get a >> bad resource contention problem fairly quickly - much less than 24 >> hours (in this instance, we start getting XFS kernel thread timeouts >> similar to what I've seen posted here recently, but it isn't clear >> whether it is only XFS or also ext3 boot drives that are starved for >> I/O - suspending or killing all I/O load doesn't solve the problem - >> only a reboot does). > > Details of the timeout messages? Here are some typical ones from yesterday when I was trying to run the sync command on a relatively lightly loaded 2.6.35 machine (sustained 100MiByte/second copies onto the server in question): 178602.197456] INFO: task sync:2787 blocked for more than 120 seconds. [178602.203933] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [178602.211863] sync D 0000000000000000 0 2787 2691 0x00000= 000 [178602.211867] ffff880d2dc51cd8 0000000000000086 ffff880d2dc51cc8 0000000000015880 [178602.211870] ffff880d2dc51fd8 0000000000015880 ffff880d2dc51fd8 ffff8817fb725d40 [178602.211872] 0000000000015880 0000000000015880 ffff880d2dc51fd8 0000000000015880 [178602.211875] Call Trace: [178602.211887] [] ? select_task_rq_fair+0x561/0x8e0 [178602.211893] [] schedule_timeout+0x22d/0x310 [178602.211896] [] ? enqueue_task_fair+0x43/0x90 [178602.211898] [] ? enqueue_task+0x79/0x90 [178602.211900] [] wait_for_common+0xd6/0x180 [178602.211904] [] ? default_wake_function+0x0/0x20 [178602.211910] [] ? sync_one_sb+0x0/0x30 [178602.211912] [] wait_for_completion+0x1d/0x20 [178602.211915] [] sync_inodes_sb+0x89/0x180 [178602.211955] [] ? xfs_quiesce_data+0x71/0xc0 [xfs] [178602.211958] [] ? sync_one_sb+0x0/0x30 [178602.211960] [] __sync_filesystem+0x88/0xa0 [178602.211962] [] sync_one_sb+0x20/0x30 [178602.211966] [] iterate_supers+0x8b/0xd0 [178602.211968] [] sys_sync+0x45/0x70 [178602.211973] [] system_call_fastpath+0x16/0x1b > >> Ideally, I'd firstly be able to find informed opinions about how I can >> improve this arrangement - we are mildly flexible on RAID controllers, >> very flexible on versions of Linux, etc, and can try other OS's as a >> last resort (but the leading contender here would be "something" >> running ZFS, and though I love ZFS, it really didn't seem to work well >> for our needs). >> >> Secondly, I welcome suggestions about which version of the linux >> kernel you'd prefer to hear bug reports about, as well as what kinds >> of output is most useful (we're getting all chassis set up with serial >> console so we can do kgdb and also full kernel panic output results). > > If you want to stay on mainline kernels with best-effort community > support, I'd suggest 2.6.38 or more recent kernels are the only ones > we're going to debug. If you want fixes, then running the curent -rc > kernels is probably a good idea. It's unlikely you'll get anyone > backporting fixes for you to older kernels. I will be doing that today. We can backport if it were crucial to do so, but I'm not aware of any local reasons why this would be so. > > Alternatively, you can switch to something like RHEL (or SLES) where > XFS is fully supported (and in the RHEL case, pays my bills :). The > advantage of this is that once the bug is fixed in mainline, it will > get backported to the supported kernel you are running. We're buying a RHEL support license today - hooray! My rationale for doing that is that I'm not convinced I will be seeing just XFS issues in the kernel - the stack trace I reported is more generic than XFS... Paul > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > From aelder@sgi.com Tue May 3 12:19:18 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43HJDCh022118 for ; Tue, 3 May 2011 12:19:13 -0500 Received: from cas.corp.sgi.com (pv-excas1-dc21-nlb.corp.sgi.com [137.38.102.126]) by relay2.corp.sgi.com (Postfix) with ESMTP id 98760304059; Tue, 3 May 2011 10:22:47 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Tue, 3 May 2011 12:22:47 -0500 Subject: Re: [PATCH v2] xfstests: support post-udev device mapper nodes From: Alex Elder Reply-To: To: Christoph Hellwig CC: In-Reply-To: <20110503084359.GA21704@infradead.org> References: <20110503084359.GA21704@infradead.org> Content-Type: text/plain; charset="UTF-8" Date: Tue, 3 May 2011 12:22:46 -0500 Message-ID: <1304443366.2853.24.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, 2011-05-03 at 04:43 -0400, Christoph Hellwig wrote: > Because of udevs complaining device mapper now creates /dev/dm-N as the real > device nodes, and just symlinks the /dev/mapper/ names to it. This would be > easy if everything used the /dev/mapper clear names, but most system utilities > translate them back to the /dev/mapper/ names and thus confuse various test > cases. Add support to _is_block_dev to read symlinks, and add documentation > on how to run xfstests on device mapper volumes. > > Signed-off-by: Christoph Hellwig Looks good to me. Reviewed-by: Alex Elder From BATV+ebbc15e8e05355d690e1+2809+infradead.org+hch@bombadil.srs.infradead.org Tue May 3 12:29:57 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43HTr53022452 for ; Tue, 3 May 2011 12:29:55 -0500 X-ASG-Debug-ID: 1304444007-1db8008e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 5539F11C93C7 for ; Tue, 3 May 2011 10:33:28 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id YwI3seKiYACXypBH for ; Tue, 03 May 2011 10:33:28 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHJTR-0003TM-Qv; Tue, 03 May 2011 17:33:25 +0000 Date: Tue, 3 May 2011 13:33:25 -0400 From: Christoph Hellwig To: Ajeet Yadav Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4 Subject: Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4 Message-ID: <20110503173325.GA13209@infradead.org> References: <20110422065120.GB14189@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304444008 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 02, 2011 at 11:09:32AM +0530, Ajeet Yadav wrote: > It will be fine for me, if you have received the xfs_metadump file I > sent in last mail. > I am sure it will help you find problem in repair btree, please > correct me if I left you anything from my side. Yes, got it. I'll apply your patch shortly. Thanks a lot! From aelder@sgi.com Tue May 3 12:35:11 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43HZAL9022728 for ; Tue, 3 May 2011 12:35:10 -0500 Received: from cas.corp.sgi.com (pv-excas1-dc21.corp.sgi.com [137.38.102.116]) by relay3.corp.sgi.com (Postfix) with ESMTP id 3AEA9AC002; Tue, 3 May 2011 10:38:45 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Tue, 3 May 2011 12:38:44 -0500 Subject: Re: [PATCH] xfstests: clean up fallocate configuration tests From: Alex Elder Reply-To: To: Eric Sandeen CC: xfs-oss , Allison Henderson In-Reply-To: <4DBF492E.3040400@sandeen.net> References: <4DBF492E.3040400@sandeen.net> Content-Type: text/plain; charset="UTF-8" Date: Tue, 3 May 2011 12:38:43 -0500 Message-ID: <1304444323.2853.28.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, 2011-05-02 at 19:15 -0500, Eric Sandeen wrote: > When I added fallocate support to fsx I inadvertently added > a duplicate fallocate test. > > Consolidate them both into one test (the link test, not the > compile test) and make all tests use "true" rather than "yes" > to be more consistent with other tests. Looks reasonable to me. I learned a little something about autoconf while looking at this. Glad you're an expert :) I do see that AC_TRY_COMPILE() is now considered obsolete so at some point maybe we should update to use the suggested alternatives (AC_COMPILE_IFELSE() in this example). Reviewed-by: Alex Elder > Signed-off-by: Eric Sandeen From BATV+ebbc15e8e05355d690e1+2809+infradead.org+hch@bombadil.srs.infradead.org Tue May 3 12:35:50 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43HZoUv022761 for ; Tue, 3 May 2011 12:35:50 -0500 X-ASG-Debug-ID: 1304444367-2499004a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 080C91B834B4 for ; Tue, 3 May 2011 10:39:27 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id NJeW60RN7achzk3r for ; Tue, 03 May 2011 10:39:27 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHJZH-0004sv-CN; Tue, 03 May 2011 17:39:27 +0000 Date: Tue, 3 May 2011 13:39:27 -0400 From: Christoph Hellwig To: Ajeet Yadav Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfstests 013 - 2.6.35.11 - hang Subject: Re: xfstests 013 - 2.6.35.11 - hang Message-ID: <20110503173927.GB13209@infradead.org> References: <20110427171107.GA29196@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304444368 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 02, 2011 at 11:11:14AM +0530, Ajeet Yadav wrote: > Is there any thing I left out in xfs related to cache coherency. I can't think of anything specific, but I have a hard time for other issues causing problems in these tests. Did you check if ext2 passes this test? From aelder@sgi.com Tue May 3 12:42:31 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: * X-Spam-Status: No, score=1.5 required=5.0 tests=BAYES_00,URIBL_BLACK, URIBL_DBL_SPAM autolearn=no version=3.4.0-r929098 Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43HgUb2022966 for ; Tue, 3 May 2011 12:42:31 -0500 Received: from cas.corp.sgi.com (pv-excas1-dc21-nlb.corp.sgi.com [137.38.102.126]) by relay2.corp.sgi.com (Postfix) with ESMTP id AAF95304032; Tue, 3 May 2011 10:46:08 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Tue, 3 May 2011 12:46:08 -0500 Subject: Re: [PATCH] xfstests: fix error discard test output in 251.out. From: Alex Elder Reply-To: To: Tao Ma CC: , Lukas Czerner , Christoph Hellwig In-Reply-To: <1303962551-14893-1-git-send-email-tm@tao.ma> References: <1303962551-14893-1-git-send-email-tm@tao.ma> Content-Type: text/plain; charset="UTF-8" Date: Tue, 3 May 2011 12:46:07 -0500 Message-ID: <1304444767.2853.29.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Thu, 2011-04-28 at 11:49 +0800, Tao Ma wrote: > From: Tao Ma > > I don't know why, but discard tests is 251 in xfs, > but 251.out has number of 248 in it, So it fails. > Change it to 251 now. > > Cc: Lukas Czerner > Cc: Christoph Hellwig > Cc: Alex Elder > Signed-off-by: Tao Ma Looks good. I'll commit this for you. Reviewed-by: Alex Elder From achender@linux.vnet.ibm.com Tue May 3 14:00:07 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43J077m025780 for ; Tue, 3 May 2011 14:00:07 -0500 X-ASG-Debug-ID: 1304449424-448101390000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from e33.co.us.ibm.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 628EA1E1AB1B for ; Tue, 3 May 2011 12:03:44 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) by cuda.sgi.com with ESMTP id 68DzVyryQ7F7zj3R for ; Tue, 03 May 2011 12:03:44 -0700 (PDT) Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e33.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p43IudC9007426 for ; Tue, 3 May 2011 12:56:39 -0600 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id p43J3XFj138868 for ; Tue, 3 May 2011 13:03:36 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p43J3DhM008646 for ; Tue, 3 May 2011 13:03:13 -0600 Received: from [9.65.15.245] (sig-9-65-15-245.mts.ibm.com [9.65.15.245]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p43J3Ber008122; Tue, 3 May 2011 13:03:12 -0600 Message-ID: <4DC0516F.2040108@linux.vnet.ibm.com> Date: Tue, 03 May 2011 12:03:11 -0700 From: Allison Henderson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: Eric Sandeen CC: xfs-oss X-ASG-Orig-Subj: Re: [PATCH] xfstests: clean up fallocate configuration tests Subject: Re: [PATCH] xfstests: clean up fallocate configuration tests References: <4DBF492E.3040400@sandeen.net> In-Reply-To: <4DBF492E.3040400@sandeen.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: e33.co.us.ibm.com[32.97.110.151] X-Barracuda-Start-Time: 1304449424 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62679 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/2/2011 5:15 PM, Eric Sandeen wrote: > When I added fallocate support to fsx I inadvertently added > a duplicate fallocate test. > > Consolidate them both into one test (the link test, not the > compile test) and make all tests use "true" rather than "yes" > to be more consistent with other tests. > > Signed-off-by: Eric Sandeen > --- > > diff --git a/aclocal.m4 b/aclocal.m4 > index 70ea0f3..168eb59 100644 > --- a/aclocal.m4 > +++ b/aclocal.m4 > @@ -18,27 +18,14 @@ AC_DEFUN([AC_PACKAGE_WANT_LINUX_FIEMAP_H], > > AC_DEFUN([AC_PACKAGE_WANT_FALLOCATE], > [ AC_MSG_CHECKING([for fallocate]) > - AC_TRY_COMPILE([ > -#include > - ], [ > - fallocate(0, 0, 0, 0); > - ], have_fallocate=true > - AC_MSG_RESULT(true), > - AC_MSG_RESULT(false)) > - AC_SUBST(have_fallocate) > - ]) > -AC_DEFUN([AC_PACKAGE_WANT_FALLOCATE], > - [ AC_MSG_CHECKING([for fallocate]) > AC_TRY_LINK([ > #define _GNU_SOURCE > #define _FILE_OFFSET_BITS 64 > #include > -#include > - ], [ > - fallocate(0, 0, 0, 0); > - ], have_fallocate=yes > - AC_MSG_RESULT(yes), > - AC_MSG_RESULT(no)) > +#include ], > + [ fallocate(0, 0, 0, 0); ], > + [ have_fallocate=true; AC_MSG_RESULT(yes) ], > + [ have_fallocate=false; AC_MSG_RESULT(no) ]) > AC_SUBST(have_fallocate) > ]) > m4_include([m4/multilib.m4]) > diff --git a/src/Makefile b/src/Makefile > index 1162ee0..91088bf 100644 > --- a/src/Makefile > +++ b/src/Makefile > @@ -31,7 +31,7 @@ ifeq ($(HAVE_FIEMAP), true) > LINUX_TARGETS += fiemap-tester > endif > > -ifeq ($(HAVE_FALLOCATE),yes) > +ifeq ($(HAVE_FALLOCATE), true) > LCFLAGS += -DHAVE_FALLOCATE > endif > > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs Thanks Eric, I tried it out and it looks like it works great. I will back out the changes to the Makefile in my fsx patch. Allison Henderson From aelder@sgi.com Tue May 3 15:11:28 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,J_CHICKENPOX_32, J_CHICKENPOX_33,J_CHICKENPOX_45,LOCAL_GNU_PATCH autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43KBS8x027829 for ; Tue, 3 May 2011 15:11:28 -0500 Received: from stout.americas.sgi.com (stout.americas.sgi.com [128.162.232.50]) by relay1.corp.sgi.com (Postfix) with ESMTP id 682258F8084; Tue, 3 May 2011 13:15:03 -0700 (PDT) Received: from stout.americas.sgi.com (localhost6.localdomain6 [127.0.0.1]) by stout.americas.sgi.com (8.14.4/8.14.2) with ESMTP id p43KF3LM012461; Tue, 3 May 2011 15:15:03 -0500 Received: (from aelder@localhost) by stout.americas.sgi.com (8.14.4/8.14.4/Submit) id p43KF2jq012460; Tue, 3 May 2011 15:15:02 -0500 From: Alex Elder To: xfs@oss.sgi.com Cc: Joe Perches , Alex Elder Subject: [PATCH] xfs: kill off xfs_printk() Date: Tue, 3 May 2011 15:14:44 -0500 Message-Id: X-Mailer: git-send-email 1.7.4.4 X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean From: Joe Perches xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid using xfs_printk() altogether. This is the only remaining use of xfs_printk(), so changing it this way means xfs_printk() can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated. Also add format checking to the non-debug inline function xfs_debug. Miscellaneous function prototype argument alignment. (Updated to delete the definition of xfs_printk(), which is no longer used or needed.) Signed-off-by: Alex Elder --- fs/xfs/linux-2.6/xfs_message.c | 20 +------------------- fs/xfs/linux-2.6/xfs_message.h | 7 +++---- 2 files changed, 4 insertions(+), 23 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_message.c b/fs/xfs/linux-2.6/xfs_message.c index 9f76cce..bd672de 100644 --- a/fs/xfs/linux-2.6/xfs_message.c +++ b/fs/xfs/linux-2.6/xfs_message.c @@ -41,23 +41,6 @@ __xfs_printk( printk("%sXFS: %pV\n", level, vaf); } -void xfs_printk( - const char *level, - const struct xfs_mount *mp, - const char *fmt, ...) -{ - struct va_format vaf; - va_list args; - - va_start(args, fmt); - - vaf.fmt = fmt; - vaf.va = &args; - - __xfs_printk(level, mp, &vaf); - va_end(args); -} - #define define_xfs_printk_level(func, kern_level) \ void func(const struct xfs_mount *mp, const char *fmt, ...) \ { \ @@ -95,8 +78,7 @@ xfs_alert_tag( int do_panic = 0; if (xfs_panic_mask && (xfs_panic_mask & panic_tag)) { - xfs_printk(KERN_ALERT, mp, - "XFS: Transforming an alert into a BUG."); + xfs_alert(mp, "Transforming an alert into a BUG."); do_panic = 1; } diff --git a/fs/xfs/linux-2.6/xfs_message.h b/fs/xfs/linux-2.6/xfs_message.h index f1b3fc1..7fb7ea0 100644 --- a/fs/xfs/linux-2.6/xfs_message.h +++ b/fs/xfs/linux-2.6/xfs_message.h @@ -3,9 +3,6 @@ struct xfs_mount; -extern void xfs_printk(const char *level, const struct xfs_mount *mp, - const char *fmt, ...) - __attribute__ ((format (printf, 3, 4))); extern void xfs_emerg(const struct xfs_mount *mp, const char *fmt, ...) __attribute__ ((format (printf, 2, 3))); extern void xfs_alert(const struct xfs_mount *mp, const char *fmt, ...) @@ -28,7 +25,9 @@ extern void xfs_info(const struct xfs_mount *mp, const char *fmt, ...) extern void xfs_debug(const struct xfs_mount *mp, const char *fmt, ...) __attribute__ ((format (printf, 2, 3))); #else -static inline void xfs_debug(const struct xfs_mount *mp, const char *fmt, ...) +static inline void +__attribute__ ((format (printf, 2, 3))) +xfs_debug(const struct xfs_mount *mp, const char *fmt, ...) { } #endif -- 1.7.4.4 From aelder@sgi.com Tue May 3 15:17:48 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43KHmCX027982 for ; Tue, 3 May 2011 15:17:48 -0500 Received: from cas.corp.sgi.com (pv-excas1-dc21-nlb.corp.sgi.com [137.38.102.126]) by relay2.corp.sgi.com (Postfix) with ESMTP id 7DA00304032; Tue, 3 May 2011 13:21:23 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Tue, 3 May 2011 15:21:23 -0500 Subject: Re: [PATCH] xfs: kill off xfs_printk() From: Alex Elder Reply-To: To: CC: Joe Perches In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Date: Tue, 3 May 2011 15:21:22 -0500 Message-ID: <1304454082.2853.46.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, 2011-05-03 at 15:14 -0500, Alex Elder wrote: ... > xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid > using xfs_printk() altogether. This is the only remaining use of > xfs_printk(), so changing it this way means xfs_printk() can simply > be eliminated.can simply be eliminated.can simply be eliminated.can > simply be eliminated.can simply be eliminated.can simply be > eliminated.can simply be eliminated.can simply be eliminated.can > simply be eliminated. . . . In case it wasn't clear from my text, xfs_printk() can simply be eliminated. (I don't know how all those duplicates got in there.) From joe@perches.com Tue May 3 15:23:49 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43KNms2028128 for ; Tue, 3 May 2011 15:23:49 -0500 X-ASG-Debug-ID: 1304454445-6499039f0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.perches.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 493B31E1AF31 for ; Tue, 3 May 2011 13:27:25 -0700 (PDT) Received: from mail.perches.com (mail.perches.com [173.55.12.10]) by cuda.sgi.com with ESMTP id g6RdT78rctw8uwhS for ; Tue, 03 May 2011 13:27:25 -0700 (PDT) Received: from [192.168.1.162] (unknown [192.168.1.162]) by mail.perches.com (Postfix) with ESMTP id 84EAF24368; Tue, 3 May 2011 13:27:23 -0700 (PDT) X-ASG-Orig-Subj: Re: [PATCH] xfs: kill off xfs_printk() Subject: Re: [PATCH] xfs: kill off xfs_printk() From: Joe Perches To: Alex Elder Cc: xfs@oss.sgi.com In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Date: Tue, 03 May 2011 13:27:24 -0700 Message-ID: <1304454444.1788.34.camel@Joe-Laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mail.perches.com[173.55.12.10] X-Barracuda-Start-Time: 1304454446 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0001 1.0000 -2.0203 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62685 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, 2011-05-03 at 15:14 -0500, Alex Elder wrote: > From: Joe Perches > > xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid > using xfs_printk() altogether. This is the only remaining use of > xfs_printk(), so changing it this way means xfs_printk() can simply > be eliminated.can simply be eliminated.can simply be eliminated.can > simply be eliminated.can simply be eliminated.can simply be > eliminated.can simply be eliminated.can simply be eliminated.can > simply be eliminated. Recursion overflow? From lists@nerdbynature.de Tue May 3 15:49:59 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43Knx1F028850 for ; Tue, 3 May 2011 15:49:59 -0500 X-ASG-Debug-ID: 1304456014-476500ce0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 560931E1B0FD for ; Tue, 3 May 2011 13:53:35 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id m9KIiJEQt9tOQnya for ; Tue, 03 May 2011 13:53:35 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id 01DDA3DDD3; Tue, 3 May 2011 22:53:33 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 13D133DD45; Tue, 3 May 2011 22:53:32 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id F00D23DB7E; Tue, 3 May 2011 22:53:31 +0200 (CEST) Date: Tue, 3 May 2011 13:53:31 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: <20110503005114.GE2978@dastard> Message-ID: References: <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304456015 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62687 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, 3 May 2011 at 10:51, Dave Chinner wrote: > Can you run an event trace of all the XFS events during a find for > me? Don't do it over the entire subset of the filesystem - only > 100,000 inodes is sufficient (i.e. kill the find once the xfs inode > cache slab reaches 100k inodes. While still running the event trace, > can you then drop the caches (echo 3 > /proc/sys/vm/drop_caches) and > check that the xfs inode cache is emptied? If it isn't emptied, drop > caches again to see if that empties it. If you coul dthen post the > event trace, I might be able to see what is going strange with the > shrinker and/or reclaim. OK, I've done something. Not sure if I got everything right: https://trent.utfs.org/p/bits/2.6.39-rc4/oom/trace/ (new URL, the other one ran out of webspace. Omit the s in https if you don't have the CAcert.org root cert imported) * I've started 'trace-cmd record -e xfs /usr/bin/find /mnt/backup' in one (screen-)window, which produced trace-14.dat.bz2 * I've started my oom-debug.sh script in another, which produced slabinfo-14.txt.bz2 * In another window, I was dropping the caches and looked at /proc/slabinfo again, see drop_caches-14.txt Somehow "trace-cmd report" segfaults here, but I hope "trace-14.report" contains enough details already. If not, I can do this again. Thanks, Christian. -- BOFH excuse #314: You need to upgrade your VESA local bus to a MasterCard local bus. From aelder@sgi.com Tue May 3 16:38:12 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p43LcCP4030024 for ; Tue, 3 May 2011 16:38:12 -0500 Received: from cas.corp.sgi.com (pv-excas1-dc21.corp.sgi.com [137.38.102.116]) by relay3.corp.sgi.com (Postfix) with ESMTP id 0FB1DAC002; Tue, 3 May 2011 14:41:47 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Tue, 3 May 2011 16:41:46 -0500 Subject: Re: [PATCH] [xfsprogs]: Don't translate command name. From: Alex Elder Reply-To: To: Arkadiusz =?UTF-8?Q?Mi=C5=9Bkiewicz?= CC: In-Reply-To: <1302858438-22215-1-git-send-email-arekm@maven.pl> References: <1302858438-22215-1-git-send-email-arekm@maven.pl> Content-Type: text/plain; charset="UTF-8" Date: Tue, 3 May 2011 16:41:46 -0500 Message-ID: <1304458906.9841.2.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 8bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, 2011-04-15 at 11:07 +0200, Arkadiusz MiÅ›kiewicz wrote: > Command names should never be translated. Currently there is > 'xfs_quota -x -c "project"...' in one locale (C) while > 'xfs_quota -x -c "projekt"...' in another (pl_PL). I haven't looked at this closely yet, but it looks pretty straightforward. I have tried to do a little research on localization though, so I'm a little better informed. I accept your statement that command names should never be translated, but is that a well-known convention? Thanks. -Alex > Signed-off-by: Arkadiusz MiÅ›kiewicz From david@fromorbit.com Tue May 3 19:41:32 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p440fVTn042976 for ; Tue, 3 May 2011 19:41:32 -0500 X-ASG-Debug-ID: 1304469906-740e03390000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail04.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 534141E1B038 for ; Tue, 3 May 2011 17:45:07 -0700 (PDT) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [150.101.137.141]) by cuda.sgi.com with ESMTP id UP0UmsMiQHJ677Wg for ; Tue, 03 May 2011 17:45:07 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEANydwE15LBza/2dsb2JhbACmH3jFYA6FdASWIoc6 Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail04.adl6.internode.on.net with ESMTP; 04 May 2011 10:15:05 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QHQD0-0004bK-3O; Wed, 04 May 2011 10:44:54 +1000 Date: Wed, 4 May 2011 10:44:54 +1000 From: Dave Chinner To: Alex Elder Cc: xfs@oss.sgi.com, Joe Perches X-ASG-Orig-Subj: Re: [PATCH] xfs: kill off xfs_printk() Subject: Re: [PATCH] xfs: kill off xfs_printk() Message-ID: <20110504004453.GC9114@dastard> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail04.adl6.internode.on.net[150.101.137.141] X-Barracuda-Start-Time: 1304469908 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0209 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62701 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 03, 2011 at 03:14:44PM -0500, Alex Elder wrote: > From: Joe Perches > > xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid > using xfs_printk() altogether. This is the only remaining use of > xfs_printk(), so changing it this way means xfs_printk() can simply > be eliminated.can simply be eliminated.can simply be eliminated.can > simply be eliminated.can simply be eliminated.can simply be > eliminated.can simply be eliminated.can simply be eliminated.can > simply be eliminated. > > Also add format checking to the non-debug inline function xfs_debug. > Miscellaneous function prototype argument alignment. > > (Updated to delete the definition of xfs_printk(), which is > no longer used or needed.) > > Signed-off-by: Alex Elder If you are going to credit Joe as the original source of the patch in the commit (i.e. via the "From:" tag), you probably should copy in his original Signed-off-by tag as well.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From lists@nerdbynature.de Tue May 3 19:42:39 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p440gcdh043480 for ; Tue, 3 May 2011 19:42:39 -0500 X-ASG-Debug-ID: 1304469975-598703990000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B4CEE42A378 for ; Tue, 3 May 2011 17:46:15 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id 01bUUbs4DprJcBVd for ; Tue, 03 May 2011 17:46:15 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id D82C33E5EC; Wed, 4 May 2011 02:46:14 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 2C46B3E5EA; Wed, 4 May 2011 02:46:14 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 1721F3DBCA; Wed, 4 May 2011 02:46:14 +0200 (CEST) Date: Tue, 3 May 2011 17:46:14 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: Message-ID: References: <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304469975 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62701 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean And another one, please see the files marked with 15- here: https://trent.utfs.org/p/bits/2.6.39-rc4/oom/trace/ I tried to have more concise timestamps in each of these files, hope that helps. Sadly though, trace-cmd reports still segfaults on the tracefile. Christian. -- BOFH excuse #263: It's stuck in the Web. From jamie@audible.transient.net Tue May 3 19:54:00 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_27, J_CHICKENPOX_28 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p440s05S047627 for ; Tue, 3 May 2011 19:54:00 -0500 X-ASG-Debug-ID: 1304470656-6fea03920000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from audible.transient.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id AA1DB1E1BFA2 for ; Tue, 3 May 2011 17:57:36 -0700 (PDT) Received: from audible.transient.net (audible.transient.net [216.254.12.79]) by cuda.sgi.com with SMTP id kJDXG7HSj19Tcr6z for ; Tue, 03 May 2011 17:57:36 -0700 (PDT) Received: (qmail 1296 invoked from network); 4 May 2011 00:57:36 -0000 Received: from cucamonga.audible.transient.net (192.168.2.5) by canarsie.audible.transient.net with QMQP; 4 May 2011 00:57:36 -0000 Received: (nullmailer pid 7608 invoked by uid 1000); Wed, 04 May 2011 00:57:36 -0000 Date: Wed, 4 May 2011 00:57:36 +0000 From: Jamie Heilman To: linux-kernel@vger.kernel.org Cc: Dave Chinner , Markus Trippelsdorf , Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110504005736.GA2958@cucamonga.audible.transient.net> Mail-Followup-To: linux-kernel@vger.kernel.org, Dave Chinner , Markus Trippelsdorf , Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110429011929.GA13542@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) X-Barracuda-Connect: audible.transient.net[216.254.12.79] X-Barracuda-Start-Time: 1304470657 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.52 X-Barracuda-Spam-Status: No, SCORE=-1.52 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62703 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Dave Chinner wrote: > OK, so the common elements here appears to be root filesystems > with small log sizes, which means they are tail pushing all the > time metadata operations are in progress. Definitely seems like a > race in the AIL workqueue trigger mechanism. I'll see if I can > reproduce this and cook up a patch to fix it. Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and other assorted feedback wrt this issue? I've had it happen twice now myself in the past week or so, though I have no reliable reproduction technique. Just wondering if more data points will help isolate the cause, and if so, how to be prepared to get them. For whatever its worth, my last lockup was while running 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. root@cucamonga:~# grep xfs /proc/mounts /dev/mapper/S-root / xfs rw,relatime,attr2,delaylog,noquota 0 0 /dev/mapper/S-var /var xfs rw,noatime,attr2,delaylog,inode64,noquota 0 0 root@cucamonga:~# xfs_info /var meta-data=/dev/mapper/S-var isize=256 agcount=4, agsize=6553600 blks = sectsz=512 attr=2 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=12800, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@cucamonga:~# xfs_info / meta-data=/dev/mapper/S-root isize=256 agcount=4, agsize=524288 blks = sectsz=512 attr=2 data = bsize=4096 blocks=2097152, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 those are lvm volumes on top of a md raid1 partition, though from the looks of everybody else's reports, that's not likely relevant. sysctl-w and l follow: May 3 07:12:28 cucamonga kernel: SysRq : Show Blocked State May 3 07:12:28 cucamonga kernel: task PC stack pid father May 3 07:12:28 cucamonga kernel: mutt D ffff88007bf89f40 0 2938 2921 0x00000000 May 3 07:12:28 cucamonga kernel: ffff88007ac0fb28 0000000000000046 ffff88007ac0fae8 ffffffff81095066 May 3 07:12:28 cucamonga kernel: ffff88007a8c4570 ffff88007ac0ffd8 00000000000112c0 ffff88007ac0ffd8 May 3 07:12:28 cucamonga kernel: ffff88007f0dcbc0 ffff88007a8c4570 ffff88007ac0fbb8 ffffffff81098363 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [] ? __pagevec_free+0x70/0x82 May 3 07:12:28 cucamonga kernel: [] ? release_pages+0x181/0x193 May 3 07:12:28 cucamonga kernel: [] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [] xlog_grant_log_space+0x129/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_inactive+0x16a/0x39a [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_fs_evict_inode+0xc7/0xcf [xfs] May 3 07:12:28 cucamonga kernel: [] evict+0x81/0x125 May 3 07:12:28 cucamonga kernel: [] iput+0x14b/0x153 May 3 07:12:28 cucamonga kernel: [] dentry_kill+0x127/0x149 May 3 07:12:28 cucamonga kernel: [] dput+0xde/0xee May 3 07:12:28 cucamonga kernel: [] fput+0x192/0x1aa May 3 07:12:28 cucamonga kernel: [] remove_vma+0x3c/0x64 May 3 07:12:28 cucamonga kernel: [] exit_mmap+0xbe/0xd9 May 3 07:12:28 cucamonga kernel: [] mmput+0x5b/0x104 May 3 07:12:28 cucamonga kernel: [] exit_mm+0x125/0x132 May 3 07:12:28 cucamonga kernel: [] ? acct_collect+0x176/0x182 May 3 07:12:28 cucamonga kernel: [] do_exit+0x21d/0x70a May 3 07:12:28 cucamonga kernel: [] ? fsnotify_modify+0x5f/0x67 May 3 07:12:28 cucamonga kernel: [] ? kvm_on_user_return+0x4d/0x4f [kvm] May 3 07:12:28 cucamonga kernel: [] ? fire_user_return_notifiers+0x3c/0x65 May 3 07:12:28 cucamonga kernel: [] do_group_exit+0x76/0x9e May 3 07:12:28 cucamonga kernel: [] sys_exit_group+0x17/0x17 May 3 07:12:28 cucamonga kernel: [] system_call_fastpath+0x16/0x1b May 3 07:12:28 cucamonga kernel: kworker/1:0 D ffffffff81341180 0 24351 2 0x00000000 May 3 07:12:28 cucamonga kernel: ffff8800046c1ca0 0000000000000046 ffff880000000000 0000000100000000 May 3 07:12:28 cucamonga kernel: ffff88007c230ca0 ffff8800046c1fd8 00000000000112c0 ffff8800046c1fd8 May 3 07:12:28 cucamonga kernel: ffff88007f0a3f20 ffff88007c230ca0 0000000000000000 0000000100000000 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [] xlog_grant_log_space+0x129/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [] ? xfs_reclaim_inode+0x23b/0x23b [xfs] May 3 07:12:28 cucamonga kernel: [] ? xfs_reclaim_inode+0x23b/0x23b [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_fs_log_dummy+0x43/0x7f [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_sync_worker+0x43/0x69 [xfs] May 3 07:12:28 cucamonga kernel: [] process_one_work+0x179/0x295 May 3 07:12:28 cucamonga kernel: [] worker_thread+0xd4/0x158 May 3 07:12:28 cucamonga kernel: [] ? manage_workers.isra.23+0x170/0x170 May 3 07:12:28 cucamonga kernel: [] ? manage_workers.isra.23+0x170/0x170 May 3 07:12:28 cucamonga kernel: [] kthread+0x84/0x8c May 3 07:12:28 cucamonga kernel: [] kernel_thread_helper+0x4/0x10 May 3 07:12:28 cucamonga kernel: [] ? kthread_worker_fn+0x116/0x116 May 3 07:12:28 cucamonga kernel: [] ? gs_change+0xb/0xb May 3 07:12:28 cucamonga kernel: dpkg D ffffffff81341180 0 28235 24677 0x00000000 May 3 07:12:28 cucamonga kernel: ffff88001b70dc98 0000000000000082 0000000000000001 ffff880000000000 May 3 07:12:28 cucamonga kernel: ffff88007f0a5eb0 ffff88001b70dfd8 00000000000112c0 ffff88001b70dfd8 May 3 07:12:28 cucamonga kernel: ffffffff81499020 ffff88007f0a5eb0 ffff88001b70dc88 000000011b70dcc8 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [] xlog_grant_log_space+0x247/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_free_eofblocks+0x153/0x1e2 [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_release+0x178/0x1b0 [xfs] May 3 07:12:28 cucamonga kernel: [] xfs_file_release+0x15/0x19 [xfs] May 3 07:12:28 cucamonga kernel: [] fput+0xfd/0x1aa May 3 07:12:28 cucamonga kernel: [] filp_close+0x6e/0x7a May 3 07:12:28 cucamonga kernel: [] sys_close+0xad/0xef May 3 07:12:28 cucamonga kernel: [] system_call_fastpath+0x16/0x1b May 3 07:22:26 cucamonga kernel: SysRq : Show backtrace of all active CPUs May 3 07:22:26 cucamonga kernel: CPU1: May 3 07:22:26 cucamonga kernel: CPU 1 May 3 07:22:26 cucamonga kernel: Modules linked in: pci_slot fan cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 cpufreq_conservative k May 3 07:22:26 cucamonga kernel: May 3 07:22:26 cucamonga kernel: Pid: 0, comm: kworker/0:0 Not tainted 2.6.39-rc5-00127-g1be6a1f #1 Dell Inc. Precision WorkStation T3400 /0TP4 May 3 07:22:26 cucamonga kernel: RIP: 0010:[] [] mwait_idle+0x7c/0x94 May 3 07:22:26 cucamonga kernel: RSP: 0018:ffff88007f0d1ee8 EFLAGS: 00000246 May 3 07:22:26 cucamonga kernel: RAX: 0000000000000000 RBX: ffffffff81592100 RCX: 0000000000000000 May 3 07:22:26 cucamonga kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007f0d0000 May 3 07:22:26 cucamonga kernel: RBP: ffff88007f0d1ee8 R08: 0000000000000000 R09: 0000000000000000 May 3 07:22:26 cucamonga kernel: R10: 0000000000000000 R11: ffff88007fb0dc50 R12: ffffffff8133468e May 3 07:22:26 cucamonga kernel: R13: ffff88007f0d1e78 R14: 0000000000000086 R15: ffff88007fb11c00 May 3 07:22:26 cucamonga kernel: FS: 0000000000000000(0000) GS:ffff88007fb00000(0000) knlGS:0000000000000000 May 3 07:22:26 cucamonga kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 3 07:22:26 cucamonga kernel: CR2: 00007ffec6e368f0 CR3: 0000000004820000 CR4: 00000000000406f0 May 3 07:22:26 cucamonga kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 3 07:22:26 cucamonga kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 3 07:22:26 cucamonga kernel: Process kworker/0:0 (pid: 0, threadinfo ffff88007f0d0000, task ffff88007f0a3f20) May 3 07:22:26 cucamonga kernel: Stack: May 3 07:22:26 cucamonga kernel: ffff88007f0d1f18 ffffffff810008ad ffff88007f0d1f08 57bb49c37cf88ea3 May 3 07:22:26 cucamonga kernel: 0000000000000001 0000000000000000 ffff88007f0d1f48 ffffffff81539e79 May 3 07:22:26 cucamonga kernel: 0000000000000000 3128fe8622c57963 0000000000000000 0000000000000000 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: [] cpu_idle+0xa3/0xe9 May 3 07:22:26 cucamonga kernel: [] start_secondary+0x1bd/0x1c4 May 3 07:22:26 cucamonga kernel: Code: d2 65 48 8b 04 25 c8 b5 00 00 48 89 d1 48 2d c8 1f 00 00 0f 01 c8 0f ae f0 e8 52 fe ff ff 85 c0 75 0b 31 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: [] cpu_idle+0xa3/0xe9 May 3 07:22:26 cucamonga kernel: [] start_secondary+0x1bd/0x1c4 May 3 07:22:26 cucamonga kernel: CPU0: May 3 07:22:26 cucamonga kernel: ffff88007fa03ef0 ffff88007fa03f48 0000000000000046 ffff88007fa03f68 May 3 07:22:26 cucamonga kernel: 0000000000000001 ffff88007aabdc48 0000000000000001 ffff88007fa03f38 May 3 07:22:26 cucamonga kernel: ffffffff810049a6 ffff88007fa03f58 ffffffff811acfb3 dead000000200200 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: [] ? show_stack+0x1c/0x1e May 3 07:22:26 cucamonga kernel: [] ? showacpu+0x4a/0x5d May 3 07:22:26 cucamonga kernel: [] ? generic_smp_call_function_single_interrupt+0xd3/0xf6 May 3 07:22:26 cucamonga kernel: [] ? smp_call_function_single_interrupt+0x18/0x27 May 3 07:22:26 cucamonga kernel: [] ? call_function_single_interrupt+0x13/0x20 May 3 07:22:26 cucamonga kernel: [] ? _raw_spin_unlock_irqrestore+0xd/0x30 May 3 07:22:26 cucamonga kernel: [] ? remove_wait_queue+0x51/0x56 May 3 07:22:26 cucamonga kernel: [] ? poll_freewait+0x3e/0xaa May 3 07:22:26 cucamonga kernel: [] ? schedule_hrtimeout_range+0x13/0x15 May 3 07:22:26 cucamonga kernel: [] ? do_select+0x4df/0x4f4 May 3 07:22:26 cucamonga kernel: [] ? poll_freewait+0xaa/0xaa May 3 07:22:26 cucamonga kernel: [] ? __pollwait+0xcc/0xcc May 3 07:22:26 cucamonga last message repeated 7 times May 3 07:22:26 cucamonga kernel: [] ? core_sys_select+0x177/0x216 May 3 07:22:26 cucamonga kernel: [] ? kmem_cache_free+0x79/0xbc May 3 07:22:26 cucamonga kernel: [] ? sock_destroy_inode+0x32/0x36 May 3 07:22:26 cucamonga kernel: [] ? _raw_spin_unlock+0x23/0x2e May 3 07:22:26 cucamonga kernel: [] ? dput+0xcf/0xee May 3 07:22:26 cucamonga kernel: [] ? sys_select+0x91/0xb9 May 3 07:22:26 cucamonga kernel: [] ? sys_read+0x61/0x6e May 3 07:22:26 cucamonga kernel: [] ? system_call_fastpath+0x16/0x1b I have captures of my process list and slabinfo at the time of the lockup too should they be useful. -- Jamie Heilman http://audible.transient.net/~jamie/ "Most people wouldn't know music if it came up and bit them on the ass." -Frank Zappa From stan@hardwarefreak.com Tue May 3 20:06:45 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4416jWq048192 for ; Tue, 3 May 2011 20:06:45 -0500 X-ASG-Debug-ID: 1304471421-31a6002a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from greer.hardwarefreak.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7592A42AC45 for ; Tue, 3 May 2011 18:10:21 -0700 (PDT) Received: from greer.hardwarefreak.com (mo-65-41-216-221.sta.embarqhsd.net [65.41.216.221]) by cuda.sgi.com with ESMTP id eSdfqB3hwr5pVOIl for ; Tue, 03 May 2011 18:10:21 -0700 (PDT) Received: from [192.168.100.53] (gffx.hardwarefreak.com [192.168.100.53]) by greer.hardwarefreak.com (Postfix) with ESMTP id AD9ED6C073 for ; Tue, 3 May 2011 20:10:20 -0500 (CDT) Message-ID: <4DC0A779.9030509@hardwarefreak.com> Date: Tue, 03 May 2011 20:10:17 -0500 From: Stan Hoeppner User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS/Linux Sanity check Subject: Re: XFS/Linux Sanity check References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mo-65-41-216-221.sta.embarqhsd.net[65.41.216.221] X-Barracuda-Start-Time: 1304471421 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0463 1.0000 -1.7231 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.12 X-Barracuda-Spam-Status: No, SCORE=-1.12 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_SC5_MJ1963, RDNS_DYNAMIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62703 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS 0.50 BSF_SC5_MJ1963 Custom Rule MJ1963 X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/2/2011 10:47 AM, Paul Anderson wrote: Hi Paul, > md apparently does not support barriers, so we are badly exposed in > that manner, I know. As a test, I disabled write cache on all drives, > performance dropped by 30% or so, but since md is apparently the > problem, barriers still didn't work. ... > Ideally, I'd firstly be able to find informed opinions about how I can > improve this arrangement - we are mildly flexible on RAID controllers, I'm not familiar enough with the md driver to address the barrier issue. Try the mdadm mailing list. However... You should be able to solve the barrier issue, and get additional advantages, by simply swapping out the LSI 9200-8E's with the 9285-8E w/cache battery. The 9285 has a dual core 800MHz PowerPC (vs single core 533MHz on the 9280) and 1GB of cache. Configure 3x15 drive hardware RAID6 arrays per controller, then stitch the resulting 9 arrays together with mdraid or LVM striping or concatenation. I'd test both under your normal multistreaming workload to see which works best. A multilevel stripe will show better performance with an artificial single stream test such as dd, but under your operational multiple stream workload, concatenation may have similar performance, while at the same time giving you additional capability, especially if done with LVM instead of mdraid --linear. Using LVM concatenation enables snapshots and the ability to grow and shrink the volume, neither of which you can do with striping (RAID 0). The 9285-8E will be pricier than the 9280-8E but it's well worth the extra dollars, given the low overall cost percentage of the HBAs vs total system cost. You'll get better performance and the data safety you're looking for. Just make sure that in addition to BBWC on the HBAs you have good UPS units backing the servers and SC847 chassis. > very flexible on versions of Linux, etc, and can try other OS's as a > last resort (but the leading contender here would be "something" > running ZFS, and though I love ZFS, it really didn't seem to work well > for our needs). Supermicro product is usually pretty decent. However, "DIY" arrays comprised of an inexpensive teir 2/3 vendor drive box/backplane/expander and off the shelf drives, whose firmware may not all match, can often be a recipe for problems that are difficult to troubleshoot. Your problems may not be caused by a kernel issue at all. The kernel may simply be showing the symptoms but not the cause. You've ordered, if my math is correct, 675 'enterprise class' 2TB SATA drives, 45 per chassis, 135 per system, 5 systems. Did you specify/verify with the vendor that all drives must be of the same manufacturing lot and have matching firmware? When building huge storage subsystems it is critical that all drives behave the same, which usually means identical firmware. > Secondly, I welcome suggestions about which version of the linux > kernel you'd prefer to hear bug reports about, as well as what kinds > of output is most useful (we're getting all chassis set up with serial > console so we can do kgdb and also full kernel panic output results). Others are better qualified to answer this. I'm just the lowly hardware guy on the list. ;) -- Stan From lists@nerdbynature.de Tue May 3 20:47:45 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p441ljUB049434 for ; Tue, 3 May 2011 20:47:45 -0500 X-ASG-Debug-ID: 1304473881-4bb000b00000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C141142A41C for ; Tue, 3 May 2011 18:51:21 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id YadzULoWPM0JCPNe for ; Tue, 03 May 2011 18:51:21 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id DFACE3E5EE; Wed, 4 May 2011 03:51:20 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 553593E5EA; Wed, 4 May 2011 03:51:19 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 3C6BD3DB7E; Wed, 4 May 2011 03:51:19 +0200 (CEST) Date: Tue, 3 May 2011 18:51:19 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: Message-ID: References: <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304473881 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62707 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, 3 May 2011 at 17:46, Christian Kujau wrote: > And another one, please see the files marked with 15- here: > > https://trent.utfs.org/p/bits/2.6.39-rc4/oom/trace/ > > I tried to have more concise timestamps in each of these files, hope that > helps. Sadly though, trace-cmd reports still segfaults on the tracefile. Running "trace-cmd report" on an i386 machine with those trace.dat files did not segfault. I've uploaded/will upload the reports to the url above. Christian. -- BOFH excuse #186: permission denied From achender@linux.vnet.ibm.com Tue May 3 23:47:25 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,J_CHICKENPOX_43, J_CHICKENPOX_64,J_CHICKENPOX_66,J_CHICKENPOX_92 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p444lMHZ055984 for ; Tue, 3 May 2011 23:47:24 -0500 X-ASG-Debug-ID: 1304484659-493b01e80000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from e8.ny.us.ibm.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 1E87842BDAF for ; Tue, 3 May 2011 21:50:59 -0700 (PDT) Received: from e8.ny.us.ibm.com (e8.ny.us.ibm.com [32.97.182.138]) by cuda.sgi.com with ESMTP id QdaoM1qTqP6MRiSl for ; Tue, 03 May 2011 21:50:59 -0700 (PDT) Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by e8.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p444P0Lb021849 for ; Wed, 4 May 2011 00:25:00 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p444ox6a091886 for ; Wed, 4 May 2011 00:50:59 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p444oxxE005856 for ; Wed, 4 May 2011 01:50:59 -0300 Received: from [9.65.15.245] (sig-9-65-15-245.mts.ibm.com [9.65.15.245]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p444ovEp005839; Wed, 4 May 2011 01:50:58 -0300 Message-ID: <4DC0DB30.6060601@linux.vnet.ibm.com> Date: Tue, 03 May 2011 21:50:56 -0700 From: Allison Henderson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: xfs-oss , Ext4 Developers List , linux-fsdevel X-ASG-Orig-Subj: [XFS Tests Punch Hole 1/1 v2] Add Punch Hole Testing to FSX Subject: [XFS Tests Punch Hole 1/1 v2] Add Punch Hole Testing to FSX Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: e8.ny.us.ibm.com[32.97.182.138] X-Barracuda-Start-Time: 1304484660 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62719 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean This patch adds punch hole tests to the fsx stress test. The test is performed through the fallocate call by randomly choosing to use the punch hole flag when running the fallocate test. Regions that have been punched out should contain zeros, so the expected file contents buffer is updated to contain zeros when a hole is punched out. v0 -> v1: Corrections to the Makefile have been backed out. This patch needs to be applied on top of the "xfstests: clean up fallocate configuration tests" patch The punch hole tests can be disabled with the -H flag, and will also be disabled if it is detected that the filesystem does not support punch hole Signed-off-by: Allison Henderson --- :100644 100644 fe072d3... 8978ef1... M ltp/fsx.c ltp/fsx.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 files changed, 96 insertions(+), 15 deletions(-) diff --git a/ltp/fsx.c b/ltp/fsx.c index fe072d3..8978ef1 100644 --- a/ltp/fsx.c +++ b/ltp/fsx.c @@ -110,6 +110,7 @@ int randomoplen = 1; /* -O flag disables it */ int seed = 1; /* -S flag */ int mapped_writes = 1; /* -W flag disables */ int fallocate_calls = 1; /* -F flag disables */ +int punch_hole_calls = 1; /* -H flag disables */ int mapped_reads = 1; /* -R flag disables it */ int fsxgoodfd = 0; int o_direct; /* -Z */ @@ -207,7 +208,8 @@ logdump(void) { int i, count, down; struct log_entry *lp; - char *falloc_type[3] = {"PAST_EOF", "EXTENDING", "INTERIOR"}; + char *falloc_type[4] = {"PAST_EOF", "EXTENDING", "INTERIOR", + "PUNCH_HOLE"}; prt("LOG DUMP (%d total operations):\n", logcount); if (logcount < LOGSIZE) { @@ -791,6 +793,11 @@ dofallocate(unsigned offset, unsigned length) { unsigned end_offset; int keep_size; + int max_offset = 0; + int max_len = 0; + int mode = 0; + char *op_name; + int punch_hole = 0; if (length == 0) { if (!quiet && testcalls > simulatedopcount) @@ -799,11 +806,37 @@ dofallocate(unsigned offset, unsigned length) return; } +#ifdef FALLOC_FL_PUNCH_HOLE + if (fallocate_calls && !punch_hole_calls) + punch_hole = 0; + else if (!fallocate_calls && punch_hole_calls) + punch_hole = 1; + else + punch_hole = random() % 2; + + /* Keep size must be set for punch hole */ + if (punch_hole) { + keep_size = 1; + mode = FALLOC_FL_PUNCH_HOLE; + } else + keep_size = random() % 2; +#else keep_size = random() % 2; +#endif + + if (keep_size) + mode |= FALLOC_FL_KEEP_SIZE; + + if (punch_hole && file_size <= (loff_t)offset) { + if (!quiet && testcalls > simulatedopcount) + prt("skipping hole punch off the end of the file\n"); + log4(OP_SKIPPED, OP_FALLOCATE, offset, length); + return; + } end_offset = keep_size ? 0 : offset + length; - if (end_offset > biggest) { + if ((end_offset > biggest) && !punch_hole) { biggest = end_offset; if (!quiet && testcalls > simulatedopcount) prt("fallocating to largest ever: 0x%x\n", end_offset); @@ -811,13 +844,15 @@ dofallocate(unsigned offset, unsigned length) /* * last arg: - * 1: allocate past EOF - * 2: extending prealloc - * 3: interior prealloc + * 0: allocate past EOF + * 1: extending prealloc + * 2: interior prealloc + * 3: punch hole */ - log4(OP_FALLOCATE, offset, length, (end_offset > file_size) ? (keep_size ? 1 : 2) : 3); + log4(OP_FALLOCATE, offset, length, punch_hole ? 3 : + (end_offset > file_size) ? (keep_size ? 0 : 1) : 2); - if (end_offset > file_size) { + if (((loff_t)end_offset > file_size) && !punch_hole) { memset(good_buf + file_size, '\0', end_offset - file_size); file_size = end_offset; } @@ -827,13 +862,35 @@ dofallocate(unsigned offset, unsigned length) if ((progressinterval && testcalls % progressinterval == 0) || (debug && (monitorstart == -1 || monitorend == -1 || - end_offset <= monitorend))) - prt("%lu falloc\tfrom 0x%x to 0x%x\n", testcalls, offset, length); - if (fallocate(fd, keep_size ? FALLOC_FL_KEEP_SIZE : 0, (loff_t)offset, (loff_t)length) == -1) { - prt("fallocate: %x to %x\n", offset, length); + end_offset <= monitorend))) { +#ifdef FALLOC_FL_PUNCH_HOLE + op_name = (mode & FALLOC_FL_PUNCH_HOLE) ? + "punch hole" : "falloc"; +#else + op_name = "falloc"; +#endif + prt("%lu %s\tfrom 0x%x to 0x%x, (0x%x bytes)\n", testcalls, + op_name, offset, offset+length, length); + } + if (fallocate(fd, mode, (loff_t)offset, (loff_t)length) == -1) { +#ifdef FALLOC_FL_PUNCH_HOLE + op_name = (mode & FALLOC_FL_PUNCH_HOLE) ? + "punch hole" : "fallocate"; +#else + op_name = "fallocate"; +#endif + + prt("%s: %x to %x\n", op_name, offset, length); prterr("dofallocate: fallocate"); report_failure(161); } + + if (punch_hole) { + max_offset = offset < file_size ? offset : file_size; + max_len = max_offset + length <= file_size ? length : + file_size - max_offset; + memset(good_buf + max_offset, '\0', max_len); + } } #else void @@ -895,8 +952,8 @@ test(void) unsigned long offset; unsigned long size = maxoplen; unsigned long rv = random(); - unsigned long op = rv % (3 + !lite + mapped_writes + fallocate_calls); - + unsigned long op = rv % (3 + !lite + mapped_writes + + (fallocate_calls || punch_hole_calls)); /* turn off the map read if necessary */ if (op == 2 && !mapped_reads) @@ -1013,6 +1070,9 @@ usage(void) #ifdef FALLOCATE " -F: Do not use fallocate (preallocation) calls\n" #endif +#ifdef FALLOC_FL_PUNCH_HOLE +" -H: Do not use punch hole calls\n" +#endif " -L: fsxLite - no file creations & no file size changes\n\ -N numops: total # operations to do (default infinity)\n\ -O: use oplen (see -o flag) for every op (default random)\n\ @@ -1179,7 +1239,7 @@ main(int argc, char **argv) setvbuf(stdout, (char *)0, _IOLBF, 0); /* line buffered stdout */ - while ((ch = getopt(argc, argv, "b:c:dfl:m:no:p:qr:s:t:w:xyAD:FLN:OP:RS:WZ")) + while ((ch = getopt(argc, argv, "b:c:dfl:m:no:p:qr:s:t:w:xyAD:FHLN:OP:RS:WZ")) != EOF) switch (ch) { case 'b': @@ -1276,6 +1336,9 @@ main(int argc, char **argv) case 'F': fallocate_calls = 0; break; + case 'H': + punch_hole_calls = 0; + break; case 'L': lite = 1; break; @@ -1426,8 +1489,26 @@ main(int argc, char **argv) if (fallocate(fd, 0, 0, 1) && errno == EOPNOTSUPP) { warn("main: filesystem does not support fallocate, disabling"); fallocate_calls = 0; - } else + /* + * punch hole depends on fallocate, + * so turn punch hole off too + */ + punch_hole_calls = 0; + } else { +#ifdef FALLOC_FL_PUNCH_HOLE + if (fallocate(fd, + FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + 0, 1) && errno == EOPNOTSUPP) { + warn("main: filesystem does not support" + " fallocate punch hole, disabling"); + punch_hole_calls = 0; + } +#else + punch_hole_calls = 0; +#endif + ftruncate(fd, 0); + } } #else /* ! FALLOCATE */ fallocate_calls = 0; -- 1.7.1 From arekm@maven.pl Wed May 4 00:18:48 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p445ImlF059431 for ; Wed, 4 May 2011 00:18:48 -0500 X-ASG-Debug-ID: 1304486543-6e7a00ca0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp-relay.maven.pl (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id F0CF9C38046; Tue, 3 May 2011 22:22:23 -0700 (PDT) Received: from smtp-relay.maven.pl (smtp-relay.maven.pl [193.239.45.138]) by cuda.sgi.com with ESMTP id Otv9KDtBgvJdcgzo; Tue, 03 May 2011 22:22:23 -0700 (PDT) Received: from 87-207-113-141.dynamic.chello.pl ([87.207.113.141]:54666 helo=tarm.maven.pl) by smtp-relay.maven.pl with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.74) (envelope-from ) id 1QHUXW-0003jJ-7L; Wed, 04 May 2011 07:22:22 +0200 Received: from arekm by tarm.maven.pl with local (Exim 4.75) (envelope-from ) id 1QHUXV-0003wQ-66; Wed, 04 May 2011 07:22:21 +0200 From: Arkadiusz Miskiewicz To: aelder@sgi.com X-ASG-Orig-Subj: Re: [PATCH] [xfsprogs]: Don't translate command name. Subject: Re: [PATCH] [xfsprogs]: Don't translate command name. Date: Wed, 4 May 2011 07:22:20 +0200 User-Agent: KMail/1.13.7 (Linux/2.6.38.4; KDE/4.6.2; x86_64; ; ) Cc: xfs@oss.sgi.com References: <1302858438-22215-1-git-send-email-arekm@maven.pl> <1304458906.9841.2.camel@doink> In-Reply-To: <1304458906.9841.2.camel@doink> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201105040722.21080.arekm@maven.pl> X-Barracuda-Connect: smtp-relay.maven.pl[193.239.45.138] X-Barracuda-Start-Time: 1304486544 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0044 1.0000 -1.9920 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.99 X-Barracuda-Spam-Status: No, SCORE=-1.99 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62720 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tuesday 03 of May 2011, Alex Elder wrote: > On Fri, 2011-04-15 at 11:07 +0200, Arkadiusz Mi=C5=9Bkiewicz wrote: > > Command names should never be translated. Currently there is > > 'xfs_quota -x -c "project"...' in one locale (C) while > > 'xfs_quota -x -c "projekt"...' in another (pl_PL). >=20 > I haven't looked at this closely yet, but it looks > pretty straightforward. >=20 > I have tried to do a little research on localization > though, so I'm a little better informed. I accept > your statement that command names should never be > translated, but is that a well-known convention? It isn't written anywhere as a "rule" afaik. I simply "makes sense". We don= 't=20 want documentation to be valid only for C/en_US locale. We want scripts tha= t=20 use xfs tools to be portable and not locale dependant etc, etc. There are cases when translating commands makes sense - MUDs or some text=20 based games and surely not admin tools. > Thanks. >=20 > -Alex >=20 > > Signed-off-by: Arkadiusz Mi=C5=9Bkiewicz =2D-=20 Arkadiusz Mi=C5=9Bkiewicz PLD/Linux Team arekm / maven.pl http://ftp.pld-linux.org/ From stan@hardwarefreak.com Wed May 4 01:14:43 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_43, T_LOTS_OF_MONEY autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p446EhmL061122 for ; Wed, 4 May 2011 01:14:43 -0500 X-ASG-Debug-ID: 1304489900-141200e50000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from greer.hardwarefreak.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D7FAC42C646 for ; Tue, 3 May 2011 23:18:20 -0700 (PDT) Received: from greer.hardwarefreak.com (mo-65-41-216-221.sta.embarqhsd.net [65.41.216.221]) by cuda.sgi.com with ESMTP id 7052bzEBVMyzhk6L for ; Tue, 03 May 2011 23:18:20 -0700 (PDT) Received: from [192.168.100.53] (gffx.hardwarefreak.com [192.168.100.53]) by greer.hardwarefreak.com (Postfix) with ESMTP id EAD466C073 for ; Wed, 4 May 2011 01:18:19 -0500 (CDT) Message-ID: <4DC0EFA8.1000702@hardwarefreak.com> Date: Wed, 04 May 2011 01:18:16 -0500 From: Stan Hoeppner User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS/Linux Sanity check Subject: Re: XFS/Linux Sanity check References: <20110503031856.GA9114@dastard> In-Reply-To: <20110503031856.GA9114@dastard> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mo-65-41-216-221.sta.embarqhsd.net[65.41.216.221] X-Barracuda-Start-Time: 1304489900 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0007 1.0000 -2.0165 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.42 X-Barracuda-Spam-Status: No, SCORE=-1.42 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_SC5_MJ1963, RDNS_DYNAMIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62723 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS 0.50 BSF_SC5_MJ1963 Custom Rule MJ1963 X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/2/2011 10:18 PM, Dave Chinner wrote: > Also, knowing how you spread out the disks in each RAID-6 group > between controllers, trays, etc as that has important performance > and failure implications. > > e.g. I'm guessing that you are taking 6 drives from each enclosure > for each 18-drive raid-6 group, which would split the RAID-6 group > across all three SAS controllers and enclosures. That means if you > lose a SAS controller or enclosure you lose all RAID-6 groups at > once which is effectively catastrophic from a recovery point of view. > It also means that one slow controller slows down everything so load > balancing is difficult. Assuming Paul's SC847 SAS chassis have the standard EL1 backplanes, his bandwidth profile per chassis is: 24 x 6Gb/s drives on 4 x 6Gb/s host ports via 36 port LSI expander 21 x 6Gb/s drives on 4 x 6Gb/s host ports via 36 port LSI expander Not balanced but not horribly bad. I recommend using one LSI 9285-8E RAID card per SC847 chassis, one SFF8088 cable connected to the front backplane the other connected to the rear. Create two 21 drive RAID6 arrays, taking care than one array consists only of drives on the front backplane, the other array consisting only of drives on the rear backplane. Configure the remaining 3 drives on the front backplane as cold spares. Not perfect, but I think the best solution given the unbalanced nature of the chassis backplanes. > Large stripes might look like a good idea, but when you get to this > scale concatenation of high throughput LUNs provides better > throughput because of less contention through the storage > controllers and enclosures. Now create an LVM or mdraid concatenated device of the 6 hardware RAID6 LUNs. Format the resulting device with mkfs.xfs defaults allowing XFS allocation groups to drive your parallelism and throughput instead of a big stripe, just as Dave recommends. Each 9285-8E should be able to pump streaming reads at about 3.2 to 3.5GB/s, a little less than the 38 RAID6 spindle streaming aggregate capability. At this throughput level you're bumping against the PCIe 2.0 x8 one way bandwidth limit after encoding and error correction overhead. So overall I think you're fairly well balanced now, overcoming the slight imbalance of the disk chassis configuration. Assuming you're able to load balance interrupts and tune things optimally, and assuming the Intel chipset in the R810 is up to the task, the above recommended setup should be capable of 8-10GB/s throughput with a parallel workload. Newegg carries both the 9285-8E and the cache battery unit, ~$1200 total. So it'll run you about $18,000 for 15 units for 5 servers, about 3x what you spent on the 9200-8E cards, and worth every sweet penny. -- Stan From david@fromorbit.com Wed May 4 02:32:44 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p447WiS1063698 for ; Wed, 4 May 2011 02:32:44 -0500 X-ASG-Debug-ID: 1304494579-4b3900a90000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail04.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 5154142C743 for ; Wed, 4 May 2011 00:36:20 -0700 (PDT) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [150.101.137.141]) by cuda.sgi.com with ESMTP id BL8aocg8MRSomPrb for ; Wed, 04 May 2011 00:36:20 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAFEAwU15LBza/2dsb2JhbACmIXjEZg6FeQSdeA Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail04.adl6.internode.on.net with ESMTP; 04 May 2011 17:06:17 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QHWd5-0005An-Qx; Wed, 04 May 2011 17:36:15 +1000 Date: Wed, 4 May 2011 17:36:15 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110504073615.GD9114@dastard> References: <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail04.adl6.internode.on.net[150.101.137.141] X-Barracuda-Start-Time: 1304494581 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62729 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 03, 2011 at 05:46:14PM -0700, Christian Kujau wrote: > And another one, please see the files marked with 15- here: > > https://trent.utfs.org/p/bits/2.6.39-rc4/oom/trace/ > > I tried to have more concise timestamps in each of these files, hope that > helps. Sadly though, trace-cmd reports still segfaults on the tracefile. Ok, that will be helpful. Also helpful is that I've (FINALLY!) reproduced this myself, and i think i can now reproduce it at will on a highmem i686 machine. I'll look into it more later tonight.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From BATV+75f33b7b1a870dba700e+2810+infradead.org+hch@bombadil.srs.infradead.org Wed May 4 04:08:07 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44985cf066652 for ; Wed, 4 May 2011 04:08:07 -0500 X-ASG-Debug-ID: 1304500303-732602b90000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id CD0A042CD4F for ; Wed, 4 May 2011 02:11:43 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id ItBvwVgXcjp8R2BI for ; Wed, 04 May 2011 02:11:43 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHY7R-0000FE-Qo; Wed, 04 May 2011 09:11:41 +0000 Date: Wed, 4 May 2011 05:11:41 -0400 From: Christoph Hellwig To: Anisse Astier Cc: Eric Sandeen , xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5) Subject: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5) Message-ID: <20110504091141.GA30330@infradead.org> References: <20110419082705.GI23985@dastard> <20110419130737.45beb611@destiny.ordissimo> <4DB084CE.8020600@sandeen.net> <20110422130920.7be686c6@destiny.ordissimo> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110422130920.7be686c6@destiny.ordissimo> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304500303 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, Apr 22, 2011 at 01:09:20PM +0200, Anisse Astier wrote: > Yep, I figured that much, it just took me a while to get up & running > another system capable of building xfsprogs. > > Now that I have that, and that I commented the do_warn, xfs_repair is > still running after the previous failing point: > [???] > - agno = 17 > bad key in bmbt root (is 73434, would reset to 74194) in inode 2283178100 data fork > bad fwd (right) sibling pointer (saw 145202888 should be NULLDFSBNO) > bad data fork in inode 2283178100 > would have cleared inode 2283178100 > - agno = 18 > [???] (ongoing) > > Once this is done, I'll test with %llu instead of %u. > > But please be patient, it's a 900GB filesystem (half-full) with just an 800 > MHz ARM9 processor doing the work, so xfs_repair takes hours to complete. > Plus I won't have time to do many tests before next week. > > To be continued. Any updates? In the meantime I cooked up a little patch (below) to add format string checking to the repair-internal varargs printing helpers, which produces a lot of warnings. A lot of that is different underlying types for fixes-size 64-bit types, but there's quite a few legit errors there as well. Index: xfsprogs-dev/repair/err_protos.h =================================================================== --- xfsprogs-dev.orig/repair/err_protos.h 2011-04-22 12:45:25.018475622 +0200 +++ xfsprogs-dev/repair/err_protos.h 2011-04-22 12:47:22.014508467 +0200 @@ -17,10 +17,14 @@ */ /* abort, internal error */ -void __attribute__((noreturn)) do_abort(char const *, ...); +void __attribute__((noreturn)) do_abort(char const *, ...) + __attribute__((format(printf,1,2))); /* abort, system error */ -void __attribute__((noreturn)) do_error(char const *, ...); +void __attribute__((noreturn)) do_error(char const *, ...) + __attribute__((format(printf,1,2))); /* issue warning */ -void do_warn(char const *, ...); +void do_warn(char const *, ...) + __attribute__((format(printf,1,2))); /* issue log message */ -void do_log(char const *, ...); +void do_log(char const *, ...) + __attribute__((format(printf,1,2))); From anisse@astier.eu Wed May 4 05:21:10 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44ALA9r069323 for ; Wed, 4 May 2011 05:21:10 -0500 X-ASG-Debug-ID: 1304504686-4aa900a40000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-bw0-f53.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D03BA42D1C8 for ; Wed, 4 May 2011 03:24:47 -0700 (PDT) Received: from mail-bw0-f53.google.com (mail-bw0-f53.google.com [209.85.214.53]) by cuda.sgi.com with ESMTP id OTJYBGZgMN8Y7A5E for ; Wed, 04 May 2011 03:24:47 -0700 (PDT) Received: by bwg12 with SMTP id 12so880738bwg.26 for ; Wed, 04 May 2011 03:24:46 -0700 (PDT) Received: by 10.204.23.81 with SMTP id q17mr953921bkb.2.1304504663178; Wed, 04 May 2011 03:24:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.70.81 with HTTP; Wed, 4 May 2011 03:24:03 -0700 (PDT) In-Reply-To: <20110504091141.GA30330@infradead.org> References: <20110419082705.GI23985@dastard> <20110419130737.45beb611@destiny.ordissimo> <4DB084CE.8020600@sandeen.net> <20110422130920.7be686c6@destiny.ordissimo> <20110504091141.GA30330@infradead.org> From: Anisse Astier Date: Wed, 4 May 2011 12:24:03 +0200 Message-ID: X-ASG-Orig-Subj: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5) Subject: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5) To: Christoph Hellwig Cc: Eric Sandeen , xfs@oss.sgi.com, Dave Chinner Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: mail-bw0-f53.google.com[209.85.214.53] X-Barracuda-Start-Time: 1304504687 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62741 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, May 4, 2011 at 11:11 AM, Christoph Hellwig wrot= e: > On Fri, Apr 22, 2011 at 01:09:20PM +0200, Anisse Astier wrote: >> Yep, I figured that much, it just took me a while to get up & running >> another system capable of building xfsprogs. >> >> Now that I have that, and that I commented the do_warn, xfs_repair is >> still running after the previous failing point: >> [???] >> =A0 =A0 =A0 =A0 - agno =3D 17 >> bad key in bmbt root (is 73434, would reset to 74194) in inode 228317810= 0 data fork >> bad fwd (right) sibling pointer (saw 145202888 should be NULLDFSBNO) >> bad data fork in inode 2283178100 >> would have cleared inode 2283178100 >> =A0 =A0 =A0 =A0 - agno =3D 18 >> [???] (ongoing) >> >> Once this is done, I'll test with %llu instead of %u. >> >> But please be patient, it's a 900GB filesystem (half-full) with just an = 800 >> MHz ARM9 processor doing the work, so xfs_repair takes hours to complete= . >> Plus I won't have time to do many tests before next week. >> >> To be continued. > > Any updates? Well, Dave had it all figured, and replacing %u by %llu fixes indeed the problem. Just for future reference, the stack of crashing process: #0 strlen () at ../ports/sysdeps/arm/strlen.S:29 #1 0x40204f78 in _IO_vfprintf_internal (s=3D0xbe9a9730, format=3D0xbe9a7676 "27", ap=3D) at vfprintf.c:1614 #2 0x40205f70 in buffered_vfprintf (s=3D0x402f2668, format=3D0x88168874
, args=3D...) at vfprintf.c:2254 #3 0x40201a44 in _IO_vfprintf_internal (s=3D0x402f2668, format=3D0x7c198 "\tin inode %u (%s fork) bmap btree block %llu\n", ap=3D) at vfprintf.c:1306 #4 0x0003cd48 in do_warn (msg=3D0x7b4dc "data") at xfs_repair.c:379 #5 0x00017088 in process_btinode (mp=3D, agno=3D17, ino=3D, dip=3D, type=3D34387, dirty=3D0xbe9aa418, tot=3D0x5, nex=3D0xbe9aa418, blkmapp=3D0xbe9aa2d8, whichfork=3D-1097162040, check_dups=3D-1097162016) at dinode.c:1284 #6 0x00017a04 in process_inode_data_fork (mp=3D, agno=3D17, ino=3D1476724, dino=3D0x1db7800, type=3D5, dirty=3D0xbe9aa418, totblocks=3D0xbe9aa2d8, nextents=3D0xbe9aa2c8, dblkmap=3D0xbe9aa2e0, check_dups=3D0) at dinode.c:2048 #7 0x0001a5f0 in process_dinode_int (mp=3D, dino=3D0x1db7800, agno=3D, ino=3D= , was_free=3D0, dirty=3D0x1ad34, used=3D0x0, verify_mode=3D-1097161704, uncertain=3D0, ino_discovery=3D1, check_dups=3D0, extra_attr_check=3D1, isa_dir=3D0x0, parent=3D0xbe9aa408) a= t dinode.c:2631 #8 0x0001ad34 in process_dinode (mp=3D0x7c198, dino=3D0x1b, agno=3D2283178100, ino=3D0, was_free=3D0, dirty=3D0xbe9aa418, used=3D0xbe9a= a41c, ino_discovery=3D1, check_dups=3D0, extra_attr_check=3D1, isa_dir=3D0xbe9aa414, parent=3D0xbe9aa408) at din= ode.c:2773 #9 0x00010630 in process_inode_chunk (mp=3D0xbe9aa508, agno=3D17, num_inos=3D, first_irec=3D, ino_discovery=3D1, check_dups=3D0, extra_attr_check=3D1, bogus=3D0x0) at dino_chunks.c:777 #10 0x000110ec in process_aginodes (mp=3D0xbe9aa508, pf_args=3D0xed5c8, agno=3D17, ino_discovery=3D1, check_dups=3D0, extra_attr_check=3D1) at dino_chunks.c:1024 #11 0x00028724 in process_ag_func (wq=3D0x400608, agno=3D17, arg=3D0xed5c8) at phase3.c:154 #12 0x00028e24 in process_ags (mp=3D0xbe9aa508) at phase3.c:193 #13 phase3 (mp=3D0xbe9aa508) at phase3.c:232 #14 0x0003ddd8 in main (argc=3D, argv=3D) at xfs_repair.c:712 > > In the meantime I cooked up a little patch (below) to add format string > checking to the repair-internal varargs printing helpers, which produces > a lot of warnings. =A0A lot of that is different underlying types for > fixes-size 64-bit types, but there's quite a few legit errors there as > well. > > > Index: xfsprogs-dev/repair/err_protos.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- xfsprogs-dev.orig/repair/err_protos.h =A0 =A0 =A0 2011-04-22 12:45:25= .018475622 +0200 > +++ xfsprogs-dev/repair/err_protos.h =A0 =A02011-04-22 12:47:22.014508467= +0200 > @@ -17,10 +17,14 @@ > =A0*/ > > =A0/* abort, internal error */ > -void =A0__attribute__((noreturn)) do_abort(char const *, ...); > +void =A0__attribute__((noreturn)) do_abort(char const *, ...) > + =A0 =A0 =A0 __attribute__((format(printf,1,2))); > =A0/* abort, system error */ > -void =A0__attribute__((noreturn)) do_error(char const *, ...); > +void =A0__attribute__((noreturn)) do_error(char const *, ...) > + =A0 =A0 =A0 __attribute__((format(printf,1,2))); > =A0/* issue warning */ > -void do_warn(char const *, ...); > +void do_warn(char const *, ...) > + =A0 =A0 =A0 __attribute__((format(printf,1,2))); > =A0/* issue log message */ > -void do_log(char const *, ...); > +void do_log(char const *, ...) > + =A0 =A0 =A0 __attribute__((format(printf,1,2))); > I'll give it a try. Anisse From stan@hardwarefreak.com Wed May 4 05:32:58 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44AWvsl070797 for ; Wed, 4 May 2011 05:32:58 -0500 X-ASG-Debug-ID: 1304505394-4ea0016e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from greer.hardwarefreak.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 170CC11CBB82 for ; Wed, 4 May 2011 03:36:34 -0700 (PDT) Received: from greer.hardwarefreak.com (mo-65-41-216-221.sta.embarqhsd.net [65.41.216.221]) by cuda.sgi.com with ESMTP id IoU5paGEUwcsQQ4C for ; Wed, 04 May 2011 03:36:34 -0700 (PDT) Received: from [192.168.100.53] (gffx.hardwarefreak.com [192.168.100.53]) by greer.hardwarefreak.com (Postfix) with ESMTP id 485416C0F7 for ; Wed, 4 May 2011 05:36:34 -0500 (CDT) Message-ID: <4DC12C2F.8060800@hardwarefreak.com> Date: Wed, 04 May 2011 05:36:31 -0500 From: Stan Hoeppner User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS/Linux Sanity check Subject: Re: XFS/Linux Sanity check References: <20110503031856.GA9114@dastard> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mo-65-41-216-221.sta.embarqhsd.net[65.41.216.221] X-Barracuda-Start-Time: 1304505395 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0208 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.42 X-Barracuda-Spam-Status: No, SCORE=-1.42 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_SC5_MJ1963, RDNS_DYNAMIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62742 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS 0.50 BSF_SC5_MJ1963 Custom Rule MJ1963 X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/3/2011 11:05 AM, Paul Anderson wrote: > I'm still perfectly willing to buy good HW RAID cards, don't get me > wrong, but their main benefit to me will be the battery backed cache, > not the performance. Good RAID cards have many more advantages than battery cache and performance. One is moving a RAID card and its attached arrays from a failed host to a new one. In the case of the hardware RAID card usually all that is required is loading the HBA driver and mounting the filesystem. Such a move of an mdraid array is usually, well, not nearly as straightforward, to put it kindly. > Keep in mind that it is hard to balance a HW RAID card across multiple > SAS expanders -you can certainly get a -16e card of some sort, but > then it does ALL of the I/O to those 4 expanders ALL of the time. I'm note sure I know exactly what you mean here Paul. You seem to be talking about RAID card <-> drive chassis cabling flexibility and symmetrical bandwidth. The following two SAS expander/switch products are likely worth a quick read: http://www.intel.com/Products/Server/RAID-controllers/re-res2sv240/RES2SV240-Overview.htm http://www.lsi.com/channel/products/switch/sas6160/index.html Using an LSI 9260-4i single 8087 port RAID card, the Intel expander, and some 8087/8088 panel converters, one could attach *5* x 24 drive LSI 620J SAS JBOD chassis for a total of 120 drives with equal bandwidth to/from all drives, about 2GB/s total bandwidth, RAID ASIC limited. Few would want to connect 120 drives to such a single port RAID controller, but this example demonstrates that symmetry can be achieved across a large number of cascaded SAS expander ASICs (6 total) with a lot of drives. -- Stan From david@fromorbit.com Wed May 4 06:08:39 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44B8cGr071836 for ; Wed, 4 May 2011 06:08:39 -0500 X-ASG-Debug-ID: 1304507534-5455005b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail04.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2717A1583572 for ; Wed, 4 May 2011 04:12:15 -0700 (PDT) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [150.101.137.141]) by cuda.sgi.com with ESMTP id KoTFvHZZe55Pow55 for ; Wed, 04 May 2011 04:12:15 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAIQxwU15LBza/2dsb2JhbACmFnjEOw6FeQSeAQ Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail04.adl6.internode.on.net with ESMTP; 04 May 2011 20:42:13 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QHa03-0005Xj-42; Wed, 04 May 2011 21:12:11 +1000 Date: Wed, 4 May 2011 21:12:11 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110504111211.GF9114@dastard> References: <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> <20110504073615.GD9114@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110504073615.GD9114@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail04.adl6.internode.on.net[150.101.137.141] X-Barracuda-Start-Time: 1304507536 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62743 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, May 04, 2011 at 05:36:15PM +1000, Dave Chinner wrote: > On Tue, May 03, 2011 at 05:46:14PM -0700, Christian Kujau wrote: > > And another one, please see the files marked with 15- here: > > > > https://trent.utfs.org/p/bits/2.6.39-rc4/oom/trace/ > > > > I tried to have more concise timestamps in each of these files, hope that > > helps. Sadly though, trace-cmd reports still segfaults on the tracefile. > > Ok, that will be helpful. Also helpful is that I've (FINALLY!) > reproduced this myself, and i think i can now reproduce it at will > on a highmem i686 machine. I'll look into it more later tonight.... And here's a patch for you to try. It fixes the problem on my test machine..... Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: ensure reclaim cursor is reset correctly at end of AG From: Dave Chinner On a 32 bit highmem PowerPC machine, the XFS inode cache was growing without bound and exhausting low memory causing the OOM killer to be triggered. After some effort, the problem was reproduced on a 32 bit x86 highmem machine. The problem is that the per-ag inode reclaim index cursor was not getting reset to the start of the AG if the radix tree tag lookup found no more reclaimable inodes. Hence every further reclaim attempt started at the same index beyond where any reclaimable inodes lay, and no further background reclaim ever occurred from the AG. Without background inode reclaim the VM driven cache shrinker simply cannot keep up with cache growth, and OOM is the result. While the change that exposed the problem was the conversion of the inode reclaim to use work queues for background reclaim, it was not the cause of the bug. The bug was introduced when the cursor code was added, just waiting for some weird configuration to strike.... Signed-off-by: Dave Chinner --- fs/xfs/linux-2.6/xfs_sync.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index 3253572..4e1f23a 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -936,6 +936,7 @@ restart: XFS_LOOKUP_BATCH, XFS_ICI_RECLAIM_TAG); if (!nr_found) { + done = 1; rcu_read_unlock(); break; } From david@fromorbit.com Wed May 4 08:22:29 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44DMTOH077231 for ; Wed, 4 May 2011 08:22:29 -0500 X-ASG-Debug-ID: 1304515565-537400f50000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail04.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 4D5861E1CCED; Wed, 4 May 2011 06:26:05 -0700 (PDT) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [150.101.137.141]) by cuda.sgi.com with ESMTP id Cx9HHhu7dtvJLkOt; Wed, 04 May 2011 06:26:05 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAClRwU15LBza/2dsb2JhbACmFnjFNQ6FeQSeAQ Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail04.adl6.internode.on.net with ESMTP; 04 May 2011 22:56:04 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QHc5Q-0005kw-4a; Wed, 04 May 2011 23:25:52 +1000 Date: Wed, 4 May 2011 23:25:52 +1000 From: Dave Chinner To: linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110504132552.GG9114@dastard> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110504005736.GA2958@cucamonga.audible.transient.net> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail04.adl6.internode.on.net[150.101.137.141] X-Barracuda-Start-Time: 1304515567 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0001 1.0000 -2.0207 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62753 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > Dave Chinner wrote: > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > other assorted feedback wrt this issue? I've had it happen twice now > myself in the past week or so, though I have no reliable reproduction > technique. Just wondering if more data points will help isolate the > cause, and if so, how to be prepared to get them. Not really. I think I know where the problem lies, but I currently lack a reproducer. There's also been another regression I've only just got to the bottom of, so I haven't really had a chance to focus properly on this one yet. Log space hangs like this have historically been difficult to reproduce reliably.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From sandeen@sandeen.net Wed May 4 08:45:06 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44Dj5GE078025 for ; Wed, 4 May 2011 08:45:05 -0500 X-ASG-Debug-ID: 1304516922-054f016d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 39299C3D805 for ; Wed, 4 May 2011 06:48:42 -0700 (PDT) Received: from mail.sandeen.net (sandeen.net [63.231.237.45]) by cuda.sgi.com with ESMTP id AwJpdYMnXJ7LuvlN for ; Wed, 04 May 2011 06:48:42 -0700 (PDT) Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sandeen.net (Postfix) with ESMTP id 1F9024964600; Wed, 4 May 2011 08:48:42 -0500 (CDT) Message-ID: <4DC15939.5020508@sandeen.net> Date: Wed, 04 May 2011 08:48:41 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Allison Henderson CC: xfs-oss X-ASG-Orig-Subj: Re: [PATCH] xfstests: clean up fallocate configuration tests Subject: Re: [PATCH] xfstests: clean up fallocate configuration tests References: <4DBF492E.3040400@sandeen.net> <4DC0516F.2040108@linux.vnet.ibm.com> In-Reply-To: <4DC0516F.2040108@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: sandeen.net[63.231.237.45] X-Barracuda-Start-Time: 1304516923 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62754 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/3/11 2:03 PM, Allison Henderson wrote: > Thanks Eric, > > I tried it out and it looks like it works great. I will back out the changes to the Makefile in my fsx patch. > > Allison Henderson > Thanks; I pushed it to the xfstests-dev git tree, you should be able to pull it and rebase your patches in a few minutes. -Eric From aelder@sgi.com Wed May 4 09:26:23 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44EQNR0079352 for ; Wed, 4 May 2011 09:26:23 -0500 Received: from cas.corp.sgi.com (pv-excas1-dc21-nlb.corp.sgi.com [137.38.102.126]) by relay2.corp.sgi.com (Postfix) with ESMTP id B8920304067; Wed, 4 May 2011 07:29:58 -0700 (PDT) Received: from [127.0.0.1] (128.162.232.50) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Wed, 4 May 2011 09:29:58 -0500 Subject: Re: [PATCH] xfs: kill off xfs_printk() From: Alex Elder Reply-To: To: Dave Chinner CC: , Joe Perches In-Reply-To: <20110504004453.GC9114@dastard> References: <20110504004453.GC9114@dastard> Content-Type: text/plain; charset="UTF-8" Date: Wed, 4 May 2011 09:29:58 -0500 Message-ID: <1304519398.2841.12.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.232.50] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, 2011-05-04 at 10:44 +1000, Dave Chinner wrote: > On Tue, May 03, 2011 at 03:14:44PM -0500, Alex Elder wrote: > > From: Joe Perches > > > > xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid > > using xfs_printk() altogether. This is the only remaining use of > > xfs_printk(), so changing it this way means xfs_printk() can simply > > be eliminated.can simply be eliminated.can simply be eliminated.can > > simply be eliminated.can simply be eliminated.can simply be > > eliminated.can simply be eliminated.can simply be eliminated.can > > simply be eliminated. > > > > Also add format checking to the non-debug inline function xfs_debug. > > Miscellaneous function prototype argument alignment. > > > > (Updated to delete the definition of xfs_printk(), which is > > no longer used or needed.) > > > > Signed-off-by: Alex Elder > > If you are going to credit Joe as the original source of the patch > in the commit (i.e. via the "From:" tag), you probably should copy > in his original Signed-off-by tag as well.... Honestly I wasn't sure about that. Since I changed it I kind of hoped he'd acknowledge or sign off on it. I guess I'm still learning about the conventions for signing off on things. I want to give all credit properly, I just don't always know the Right Way to do so. -Alex From joe@perches.com Wed May 4 09:50:35 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44EoZFs080079 for ; Wed, 4 May 2011 09:50:35 -0500 X-ASG-Debug-ID: 1304520853-2c2100b10000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.perches.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7E7F042E5FF for ; Wed, 4 May 2011 07:54:13 -0700 (PDT) Received: from mail.perches.com (mail.perches.com [173.55.12.10]) by cuda.sgi.com with ESMTP id U6iNpGykaJDEV28t for ; Wed, 04 May 2011 07:54:13 -0700 (PDT) Received: from [192.168.1.162] (unknown [192.168.1.162]) by mail.perches.com (Postfix) with ESMTP id 6650324368; Wed, 4 May 2011 07:54:04 -0700 (PDT) X-ASG-Orig-Subj: Re: [PATCH] xfs: kill off xfs_printk() Subject: Re: [PATCH] xfs: kill off xfs_printk() From: Joe Perches To: aelder@sgi.com Cc: Dave Chinner , xfs@oss.sgi.com In-Reply-To: <1304519398.2841.12.camel@doink> References: <20110504004453.GC9114@dastard> <1304519398.2841.12.camel@doink> Content-Type: text/plain; charset="UTF-8" Date: Wed, 04 May 2011 07:54:11 -0700 Message-ID: <1304520851.1788.96.camel@Joe-Laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mail.perches.com[173.55.12.10] X-Barracuda-Start-Time: 1304520853 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62759 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, 2011-05-04 at 09:29 -0500, Alex Elder wrote: > Honestly I wasn't sure about that. Since I changed it I > kind of hoped he'd acknowledge or sign off on it. I guess > I'm still learning about the conventions for signing off > on things. I want to give all credit properly, I just > don't always know the Right Way to do so. When I modify a patch to my taste I use Original-patch-by: Anyway, no worries to me, whatever you want. Acked-by: Joe Perches Signed-off-by: Joe Perches Original-patch-by: Joe Perches cheers, Joe From BATV+75f33b7b1a870dba700e+2810+infradead.org+hch@bombadil.srs.infradead.org Wed May 4 10:57:47 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44Fvjhp082562 for ; Wed, 4 May 2011 10:57:46 -0500 X-ASG-Debug-ID: 1304524883-391d03110000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id AF19D12298D4 for ; Wed, 4 May 2011 09:01:23 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id 6YRmcMVDHPuw7d5f for ; Wed, 04 May 2011 09:01:23 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHeVv-00023i-0P; Wed, 04 May 2011 16:01:23 +0000 Date: Wed, 4 May 2011 12:01:22 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org X-ASG-Orig-Subj: XFS status update for April 2011 Subject: XFS status update for April 2011 Message-ID: <20110504160122.GA7224@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304524883 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean April saw further stabilization work on the Linux 2.6.39 kernel, including a number of XFS bug fixes. Most importantly a series of patches fixes various OOM problems due to bad interactions between the generic writeback code and XFS inode reclaim, but there also were other patches for various smaller issues. In the meantime the XFS development tree saw the addition of the optimized busy extent tracking, which allows large speedups for multi-threaded meta data heavy workloads, and lays the groundwork for discard support on transaction commit, and a few other smaller patches. On the user space side the xfsprogs and xfsdump repositories saw a very quiet month with no applied patches, although a few were posted and discussed on the mailing list. The xfstests repository on the other hand saw a new test cases exercising the xfs_metadump functionally as well as a fixes to existing tests. From notify-return-linux-xfs=oss.sgi.com@returns.groups.yahoo.com Wed May 4 12:36:26 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_DKIM_INVALID autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44HaPRo086114 for ; Wed, 4 May 2011 12:36:26 -0500 X-ASG-Debug-ID: 1304530802-270801fc0000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from n43d.bullet.mail.sp1.yahoo.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id CE14742FE64 for ; Wed, 4 May 2011 10:40:03 -0700 (PDT) Received: from n43d.bullet.mail.sp1.yahoo.com (n43d.bullet.mail.sp1.yahoo.com [66.163.169.157]) by cuda.sgi.com with SMTP id nSZYU7CKckHCQ9Uu for ; Wed, 04 May 2011 10:40:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoogroups.com; s=lima; t=1304530802; bh=j2V/27uh4OKqtJclP61DYSpoCNZCm/hUZtP/Sm5AMPw=; h=Received:Received:Date:Message-ID:X-Yahoo-Newman-Property:From:Reply-To:To:Subject:MIME-Version:Content-Type:Content-Transfer-Encoding; b=dfu7rmOusR1KPwSYtPB3fx6L6mXOTxFPq7Ecy9V+iOuicM7e7g55ce0nSiuAHnzXBonhi5mYnzvX7uoANHZyPT6nLuG6bDGLR4sR8KdQObZIvc4B7JsiH6f5vM0VjWW3 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=lima; d=yahoogroups.com; b=muzCyG0EoM0JZPWMT8RV4Y6JlsCvIrw9p4yYbSW0AEGOF5NCEcbTfCipN4e/mrf+cLnEuvCsgOrhrpc4wCRGbFholzSmB/Spd/Hxgt4s8rJoSu+tHY389BH3o4/GsCr8; Received: from [69.147.65.147] by n43.bullet.mail.sp1.yahoo.com with NNFMP; 04 May 2011 17:40:02 -0000 Received: from [98.137.34.73] by t10.bullet.mail.sp1.yahoo.com with NNFMP; 04 May 2011 17:40:02 -0000 Date: 4 May 2011 17:40:02 -0000 Message-ID: <1304530802.565.55687.w7@yahoogroups.com> X-Yahoo-Newman-Property: groups-notify From: nxgaxk Moderator Reply-To: nxgaxk-unsubscribe@yahoogroups.com To: linux-xfs@oss.sgi.com X-ASG-Orig-Subj: Welcome to the nxgaxk group Subject: Welcome to the nxgaxk group MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Barracuda-Connect: n43d.bullet.mail.sp1.yahoo.com[66.163.169.157] X-Barracuda-Start-Time: 1304530803 X-Barracuda-Bayes: INNOCENT GLOBAL 0.5000 1.0000 0.7500 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.75 X-Barracuda-Spam-Status: No, SCORE=0.75 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62769 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hello, I've added you to my nxgaxk group at Yahoo! Groups, a free, easy-to-use service. Yahoo! Groups makes it easy to send and receive group messages, coordinate events, share photos and files, and more. Description of the group: ------------------------------------------------------------------------ tc2403u46m Complete your Yahoo! Groups account: ---------------------------------------------------------------------- Your email address has been added to the email list of a Yahoo! Group. To gain access to all of your group's web features (previous messages, photos, files, calendar, etc.) and easier control of your message delivery options, we highly recommend that you complete your account by connecting your email address to a Yahoo account. It is easy and free. Please visit: http://groups.yahoo.com/convacct?email=linux-xfs%40oss.sgi.com&list=nxgaxk Important information about the nxgaxk group ------------------------------------------------------------------------ * To send a message to the members of this group, send an email to: nxgaxk@yahoogroups.com * To leave the group, you can unsubscribe by replying to this message, or by sending an email to: nxgaxk-unsubscribe@yahoogroups.com Regards, Moderator, nxgaxk Report abuse: ------------------------------------------------------------------------ Because Yahoo! Groups values your privacy, it is a violation of our service rules for moderators to add subscribers to a group against their wishes. If you feel this has happened, please notify us: http://help.yahoo.com/l/us/yahoo/groups/original/members/forms/abuse.html You may also change your email preferences to prevent group owners from adding you to their groups. To do so, please go here: http://groups.yahoo.com/s?tag=NRlx8oLH5l-NHCWGqAiC9JMr9HfoUirQzETIRwikipm9beFJXvAYlvzRgcr588krLdEGO0p8gJYMqzOrH2gaPw Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From notify-return-linux-xfs=oss.sgi.com@returns.groups.yahoo.com Wed May 4 13:05:58 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: ** X-Spam-Status: No, score=2.2 required=5.0 tests=BAYES_50, RCVD_IN_BL_SPAMCOP_NET,T_DKIM_INVALID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44I5whc087247 for ; Wed, 4 May 2011 13:05:58 -0500 X-ASG-Debug-ID: 1304532575-2b21031d0000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from n5-vm6.bullet.mail.sp2.yahoo.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id 590DF42FBA9 for ; Wed, 4 May 2011 11:09:35 -0700 (PDT) Received: from n5-vm6.bullet.mail.sp2.yahoo.com (n5-vm6.bullet.mail.sp2.yahoo.com [67.195.135.101]) by cuda.sgi.com with SMTP id qiWFZXuf0U7m12IE for ; Wed, 04 May 2011 11:09:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoogroups.com; s=lima; t=1304532575; bh=zHoXsYNbh8mIAib5CT7ZIJZ5yTS/fYAwxQyt0pVwsGE=; h=Received:Received:Received:Date:Message-ID:X-Yahoo-Newman-Property:From:Reply-To:To:Subject:MIME-Version:Content-Type:Content-Transfer-Encoding; b=r7Sr3Yqs2M+bW/YtxAaM4OpWi+fyzyHmU3sn4bhXzDHZrrSOu1sKrSqdwLN1uwttkaOEks+d74ZdogS6ZhboS6lu+FQpSyIbAsYJ2NDkZVM0ruxUADVBBSwePrIkidvo DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=lima; d=yahoogroups.com; b=ETiLwicsrNxiJWMGkJ4xTAbebkUfQGuQ0/TYximIwQUo4gFBHDgQCn63PthPulbrY/9WldxDwQ3E284iEm/ROh793LtuUanWnOaPuof4BM5RmMvld+vCBBpxBPA0i8WG; Received: from [67.195.134.48] by n5.bullet.mail.sp2.yahoo.com with NNFMP; 04 May 2011 18:09:35 -0000 Received: from [69.147.65.149] by t1.bullet.mail.sp2.yahoo.com with NNFMP; 04 May 2011 18:09:35 -0000 Received: from [98.137.34.33] by t9.bullet.mail.sp1.yahoo.com with NNFMP; 04 May 2011 18:09:35 -0000 Date: 4 May 2011 18:09:35 -0000 Message-ID: <1304532575.11.90555.w2@yahoogroups.com> X-Yahoo-Newman-Property: groups-notify From: Yahoo! Groups Customer Care Reply-To: confirm-prefs-5_if4BOtYfDuPq4R459rxXeMcOUGmJSs347sCnkuoQE1TB1gzmVXC5d8tERD9a2rarmR8hUMzntaMPXa@yahoogroups.com To: linux-xfs@oss.sgi.com X-ASG-Orig-Subj: Please confirm request to change your email preferences Subject: Please confirm request to change your email preferences MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Barracuda-Connect: n5-vm6.bullet.mail.sp2.yahoo.com[67.195.135.101] X-Barracuda-Start-Time: 1304532576 X-Barracuda-Bayes: INNOCENT GLOBAL 0.4381 1.0000 0.0000 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62771 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hello linux-xfs@oss.sgi.com, We have received your request to change your email preferences for Yahoo! Groups. This request will expire in 7 days. Email preferences: ------------------------------------------------------------------------------- - Allow group moderators to add me to their group: No - Allow group moderators to invite me to join their groups: No To confirm your email preferences: ------------------------------------------------------------------------------- 1) Go to the Yahoo! Groups site by clicking on this link: http://groups.yahoo.com/s?tag=5_if4BOtYfDuPq4R459rxXeMcOUGmJSs347sCnkuoQE1TB1gzmVXC5d8tERD9a2rarmR8hUMzntaMPXa (If clicking doesn't work, "Cut" and "Paste" the line above into your Web browser's address bar.) -OR- 2) REPLY to this email by clicking "Reply" and then "Send" in your email program If you did not request to change any of your email preferences, please ignore this message. Regards, Yahoo! Groups Customer Care Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ From BATV+75f33b7b1a870dba700e+2810+infradead.org+hch@bombadil.srs.infradead.org Wed May 4 13:56:33 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44IuXcA088817 for ; Wed, 4 May 2011 13:56:33 -0500 X-ASG-Debug-ID: 1304535611-051101900000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B8C0943000E for ; Wed, 4 May 2011 12:00:11 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id POE3TqnOaL6DNQk4 for ; Wed, 04 May 2011 12:00:11 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHhIx-00023l-6W for xfs@oss.sgi.com; Wed, 04 May 2011 19:00:11 +0000 Message-Id: <20110504190011.156999943@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 04 May 2011 14:55:14 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 1/4] xfs: add online discard support Subject: [PATCH 1/4] xfs: add online discard support References: <20110504185513.136746538@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-add-online-discard-support X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304535611 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_super.c 2011-05-04 20:44:30.466422727 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_super.c 2011-05-04 20:45:06.302895250 +0200 @@ -112,6 +112,8 @@ mempool_t *xfs_ioend_pool; #define MNTOPT_QUOTANOENF "qnoenforce" /* same as uqnoenforce */ #define MNTOPT_DELAYLOG "delaylog" /* Delayed loging enabled */ #define MNTOPT_NODELAYLOG "nodelaylog" /* Delayed loging disabled */ +#define MNTOPT_DISCARD "discard" /* Discard unused blocks */ +#define MNTOPT_NODISCARD "nodiscard" /* Do not discard unused blocks */ /* * Table driven mount option parser. @@ -355,6 +357,10 @@ xfs_parseargs( mp->m_flags |= XFS_MOUNT_DELAYLOG; } else if (!strcmp(this_char, MNTOPT_NODELAYLOG)) { mp->m_flags &= ~XFS_MOUNT_DELAYLOG; + } else if (!strcmp(this_char, MNTOPT_DISCARD)) { + mp->m_flags |= XFS_MOUNT_DISCARD; + } else if (!strcmp(this_char, MNTOPT_NODISCARD)) { + mp->m_flags &= ~XFS_MOUNT_DISCARD; } else if (!strcmp(this_char, "ihashsize")) { xfs_warn(mp, "ihashsize no longer used, option is deprecated."); @@ -488,6 +494,7 @@ xfs_showargs( { XFS_MOUNT_FILESTREAMS, "," MNTOPT_FILESTREAM }, { XFS_MOUNT_GRPID, "," MNTOPT_GRPID }, { XFS_MOUNT_DELAYLOG, "," MNTOPT_DELAYLOG }, + { XFS_MOUNT_DISCARD, "," MNTOPT_DISCARD }, { 0, NULL } }; static struct proc_xfs_info xfs_info_unset[] = { Index: xfs/fs/xfs/xfs_mount.h =================================================================== --- xfs.orig/fs/xfs/xfs_mount.h 2011-05-04 20:44:30.356423323 +0200 +++ xfs/fs/xfs/xfs_mount.h 2011-05-04 20:45:06.302895250 +0200 @@ -224,6 +224,7 @@ typedef struct xfs_mount { #define XFS_MOUNT_FS_SHUTDOWN (1ULL << 4) /* atomic stop of all filesystem operations, typically for disk errors in metadata */ +#define XFS_MOUNT_DISCARD (1ULL << 5) /* discard unused blocks */ #define XFS_MOUNT_RETERR (1ULL << 6) /* return alignment errors to user */ #define XFS_MOUNT_NOALIGN (1ULL << 7) /* turn off stripe alignment Index: xfs/fs/xfs/xfs_log_cil.c =================================================================== --- xfs.orig/fs/xfs/xfs_log_cil.c 2011-05-04 20:44:30.369756584 +0200 +++ xfs/fs/xfs/xfs_log_cil.c 2011-05-04 20:45:06.302895250 +0200 @@ -29,6 +29,7 @@ #include "xfs_mount.h" #include "xfs_error.h" #include "xfs_alloc.h" +#include "xfs_discard.h" /* * Perform initial CIL structure initialisation. If the CIL is not @@ -361,18 +362,28 @@ xlog_cil_committed( int abort) { struct xfs_cil_ctx *ctx = args; + struct xfs_mount *mp = ctx->cil->xc_log->l_mp; xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain, ctx->start_lsn, abort); xfs_alloc_busy_sort(&ctx->busy_extents); - xfs_alloc_busy_clear(ctx->cil->xc_log->l_mp, &ctx->busy_extents); + xfs_alloc_busy_clear(mp, &ctx->busy_extents, + (mp->m_flags & XFS_MOUNT_DISCARD) && !abort); spin_lock(&ctx->cil->xc_cil_lock); list_del(&ctx->committing); spin_unlock(&ctx->cil->xc_cil_lock); xlog_cil_free_logvec(ctx->lv_chain); + + if (!list_empty(&ctx->busy_extents)) { + ASSERT(mp->m_flags & XFS_MOUNT_DISCARD); + + xfs_discard_extents(mp, &ctx->busy_extents); + xfs_alloc_busy_clear(mp, &ctx->busy_extents, false); + } + kmem_free(ctx); } Index: xfs/fs/xfs/linux-2.6/xfs_discard.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_discard.c 2011-05-04 20:44:30.329756801 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_discard.c 2011-05-04 20:45:06.306228566 +0200 @@ -191,3 +191,32 @@ xfs_ioc_trim( return -XFS_ERROR(EFAULT); return 0; } + +int +xfs_discard_extents( + struct xfs_mount *mp, + struct list_head *list) +{ + struct xfs_busy_extent *busyp; + int error = 0; + + list_for_each_entry(busyp, list, list) { + trace_xfs_discard_extent(mp, busyp->agno, busyp->bno, + busyp->length); + + error = -blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, + XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno), + XFS_FSB_TO_BB(mp, busyp->length), + GFP_NOFS, 0); + if (error && error != EOPNOTSUPP) { + xfs_info(mp, + "discard failed for extent [0x%llu,%u], error %d", + (unsigned long long)busyp->bno, + busyp->length, + error); + return error; + } + } + + return 0; +} Index: xfs/fs/xfs/linux-2.6/xfs_discard.h =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_discard.h 2011-05-04 20:44:30.343090061 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_discard.h 2011-05-04 20:45:06.306228566 +0200 @@ -2,7 +2,9 @@ #define XFS_DISCARD_H 1 struct fstrim_range; +struct list_head; extern int xfs_ioc_trim(struct xfs_mount *, struct fstrim_range __user *); +extern int xfs_discard_extents(struct xfs_mount *, struct list_head *); #endif /* XFS_DISCARD_H */ Index: xfs/fs/xfs/xfs_ag.h =================================================================== --- xfs.orig/fs/xfs/xfs_ag.h 2011-05-04 20:44:30.376423214 +0200 +++ xfs/fs/xfs/xfs_ag.h 2011-05-04 20:45:11.406200936 +0200 @@ -187,6 +187,8 @@ struct xfs_busy_extent { xfs_agnumber_t agno; xfs_agblock_t bno; xfs_extlen_t length; + unsigned int flags; +#define XFS_ALLOC_BUSY_DISCARDED 0x01 /* undergoing a discard op. */ }; /* Index: xfs/fs/xfs/xfs_alloc.c =================================================================== --- xfs.orig/fs/xfs/xfs_alloc.c 2011-05-04 20:44:30.386423159 +0200 +++ xfs/fs/xfs/xfs_alloc.c 2011-05-04 20:45:11.432867459 +0200 @@ -2610,6 +2610,18 @@ xfs_alloc_busy_update_extent( xfs_agblock_t bend = bbno + busyp->length; /* + * This extent is currently beeing discard. Give the thread + * performing the discard a chance to mark the extent unbusy + * and retry. + */ + if (busyp->flags & XFS_ALLOC_BUSY_DISCARDED) { + spin_unlock(&pag->pagb_lock); + delay(1); + spin_lock(&pag->pagb_lock); + return false; + } + + /* * If there is a busy extent overlapping a user allocation, we have * no choice but to force the log and retry the search. * @@ -2814,7 +2826,8 @@ restart: * If this is a metadata allocation, try to reuse the busy * extent instead of trimming the allocation. */ - if (!args->userdata) { + if (!args->userdata && + !(busyp->flags & XFS_ALLOC_BUSY_DISCARDED)) { if (!xfs_alloc_busy_update_extent(args->mp, args->pag, busyp, fbno, flen, false)) @@ -2980,10 +2993,16 @@ xfs_alloc_busy_clear_one( kmem_free(busyp); } +/* + * Remove all extents on the passed in list from the busy extents tree. + * If do_discard is set skip extents that need to be discarded, and mark + * these as undergoing a discard operation instead. + */ void xfs_alloc_busy_clear( struct xfs_mount *mp, - struct list_head *list) + struct list_head *list, + bool do_discard) { struct xfs_busy_extent *busyp, *n; struct xfs_perag *pag = NULL; @@ -3000,7 +3019,10 @@ xfs_alloc_busy_clear( agno = busyp->agno; } - xfs_alloc_busy_clear_one(mp, pag, busyp); + if (do_discard && busyp->length) + busyp->flags = XFS_ALLOC_BUSY_DISCARDED; + else + xfs_alloc_busy_clear_one(mp, pag, busyp); } if (pag) { Index: xfs/Documentation/filesystems/xfs.txt =================================================================== --- xfs.orig/Documentation/filesystems/xfs.txt 2011-05-04 20:44:30.409756366 +0200 +++ xfs/Documentation/filesystems/xfs.txt 2011-05-04 20:45:06.306228566 +0200 @@ -39,6 +39,12 @@ When mounting an XFS filesystem, the fol drive level write caching to be enabled, for devices that support write barriers. + discard + Issue command to let the block device reclaim space freed by the + filesystem. This is useful for SSD devices, thinly provisioned + LUNs and virtual machine images, but may have a performance + impact. + dmapi Enable the DMAPI (Data Management API) event callouts. Use with the "mtpt" option. Index: xfs/fs/xfs/xfs_alloc.h =================================================================== --- xfs.orig/fs/xfs/xfs_alloc.h 2011-05-04 20:44:30.393089791 +0200 +++ xfs/fs/xfs/xfs_alloc.h 2011-05-04 20:45:11.452867351 +0200 @@ -140,7 +140,8 @@ xfs_alloc_busy_insert(struct xfs_trans * xfs_agblock_t bno, xfs_extlen_t len); void -xfs_alloc_busy_clear(struct xfs_mount *mp, struct list_head *list); +xfs_alloc_busy_clear(struct xfs_mount *mp, struct list_head *list, + bool do_discard); int xfs_alloc_busy_search(struct xfs_mount *mp, xfs_agnumber_t agno, Index: xfs/fs/xfs/xfs_trans.c =================================================================== --- xfs.orig/fs/xfs/xfs_trans.c 2011-05-04 20:44:30.399756421 +0200 +++ xfs/fs/xfs/xfs_trans.c 2011-05-04 20:45:06.306228566 +0200 @@ -609,7 +609,7 @@ xfs_trans_free( struct xfs_trans *tp) { xfs_alloc_busy_sort(&tp->t_busy); - xfs_alloc_busy_clear(tp->t_mountp, &tp->t_busy); + xfs_alloc_busy_clear(tp->t_mountp, &tp->t_busy, false); atomic_dec(&tp->t_mountp->m_active_trans); xfs_trans_free_dqinfo(tp); From BATV+75f33b7b1a870dba700e+2810+infradead.org+hch@bombadil.srs.infradead.org Wed May 4 13:56:34 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_44, J_CHICKENPOX_45 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44IuX5R088825 for ; Wed, 4 May 2011 13:56:34 -0500 X-ASG-Debug-ID: 1304535611-355500d50000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0674511CD884 for ; Wed, 4 May 2011 12:00:11 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id w25Ut00pi4PHehze for ; Wed, 04 May 2011 12:00:11 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHhIx-00024J-Bt for xfs@oss.sgi.com; Wed, 04 May 2011 19:00:11 +0000 Message-Id: <20110504190011.319521066@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 04 May 2011 14:55:15 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 2/4] xfs: do not discard alloc btree blocks Subject: [PATCH 2/4] xfs: do not discard alloc btree blocks References: <20110504185513.136746538@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-dont-discard-allocbtblocks X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304535612 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Blocks for the allocation btree are allocated from and release to the AGFL, and thus frequently reused. Even worse we do not have an easy way to avoid using an AGFL block when it is discarded due to the simple FILO list of free blocks, and thus can frequently stall on blocks that are currently undergoing a discard. Add a flag to the busy extent tracking structure to skip the discard for allocation btree blocks. In normal operation these blocks are reused frequently enough that there is no need to discard them anyway, but if they spill over to the allocation btree as part of a balance we "leak" blocks that we would otherwise discard. We could fix this by adding another flag and keeping these block in the rbtree even after they aren't busy any more so that we could discard them when they migrate out of the AGFL. Given that this would cause significant overhead I don't think it's worthwile for now. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_ag.h =================================================================== --- xfs.orig/fs/xfs/xfs_ag.h 2011-05-04 20:45:11.406200936 +0200 +++ xfs/fs/xfs/xfs_ag.h 2011-05-04 20:45:15.309513124 +0200 @@ -189,6 +189,7 @@ struct xfs_busy_extent { xfs_extlen_t length; unsigned int flags; #define XFS_ALLOC_BUSY_DISCARDED 0x01 /* undergoing a discard op. */ +#define XFS_ALLOC_BUSY_SKIP_DISCARD 0x02 /* do not discard */ }; /* Index: xfs/fs/xfs/xfs_alloc_btree.c =================================================================== --- xfs.orig/fs/xfs/xfs_alloc_btree.c 2011-05-04 20:45:11.419534199 +0200 +++ xfs/fs/xfs/xfs_alloc_btree.c 2011-05-04 20:45:15.309513124 +0200 @@ -120,7 +120,8 @@ xfs_allocbt_free_block( if (error) return error; - xfs_alloc_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1); + xfs_alloc_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, + XFS_ALLOC_BUSY_SKIP_DISCARD); xfs_trans_agbtree_delta(cur->bc_tp, -1); return 0; } Index: xfs/fs/xfs/xfs_alloc.c =================================================================== --- xfs.orig/fs/xfs/xfs_alloc.c 2011-05-04 20:45:11.432867459 +0200 +++ xfs/fs/xfs/xfs_alloc.c 2011-05-04 20:45:15.312846439 +0200 @@ -2470,7 +2470,7 @@ xfs_free_extent( error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0); if (!error) - xfs_alloc_busy_insert(tp, args.agno, args.agbno, len); + xfs_alloc_busy_insert(tp, args.agno, args.agbno, len, 0); error0: xfs_perag_put(args.pag); return error; @@ -2481,7 +2481,8 @@ xfs_alloc_busy_insert( struct xfs_trans *tp, xfs_agnumber_t agno, xfs_agblock_t bno, - xfs_extlen_t len) + xfs_extlen_t len, + unsigned int flags) { struct xfs_busy_extent *new; struct xfs_busy_extent *busyp; @@ -2505,6 +2506,7 @@ xfs_alloc_busy_insert( new->bno = bno; new->length = len; INIT_LIST_HEAD(&new->list); + new->flags = flags; /* trace before insert to be able to see failed inserts */ trace_xfs_alloc_busy(tp->t_mountp, agno, bno, len); @@ -3019,7 +3021,8 @@ xfs_alloc_busy_clear( agno = busyp->agno; } - if (do_discard && busyp->length) + if (do_discard && busyp->length && + !(busyp->flags & XFS_ALLOC_BUSY_SKIP_DISCARD)) busyp->flags = XFS_ALLOC_BUSY_DISCARDED; else xfs_alloc_busy_clear_one(mp, pag, busyp); Index: xfs/fs/xfs/xfs_alloc.h =================================================================== --- xfs.orig/fs/xfs/xfs_alloc.h 2011-05-04 20:45:11.452867351 +0200 +++ xfs/fs/xfs/xfs_alloc.h 2011-05-04 20:45:15.312846439 +0200 @@ -137,7 +137,7 @@ xfs_alloc_longest_free_extent(struct xfs #ifdef __KERNEL__ void xfs_alloc_busy_insert(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len); + xfs_agblock_t bno, xfs_extlen_t len, unsigned int flags); void xfs_alloc_busy_clear(struct xfs_mount *mp, struct list_head *list, From BATV+75f33b7b1a870dba700e+2810+infradead.org+hch@bombadil.srs.infradead.org Wed May 4 13:56:33 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44IuXTe088815 for ; Wed, 4 May 2011 13:56:33 -0500 X-ASG-Debug-ID: 1304535611-1f0602f70000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 78B6B1B83F3D for ; Wed, 4 May 2011 12:00:11 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id e7RIJ0vcEm8j9cAs for ; Wed, 04 May 2011 12:00:11 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHhIx-00023B-0N for xfs@oss.sgi.com; Wed, 04 May 2011 19:00:11 +0000 Message-Id: <20110504185513.136746538@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 04 May 2011 14:55:13 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 0/4] online discard support V3 Subject: [PATCH 0/4] online discard support V3 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304535611 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Add support for discarding unused blocks at CIL commit time. The first patch is the guts of the implementation and relies and the resently included improved busy extent tracking. The second patch is an optimization that helps a lot with performance, and the last two patches submit the discard requests asynchronously to not stall the log I/O completions threads and improve performance, but they currently trip over bugs in the block layer. The performance with these patches is quite bad on the SATA SSDs I tested, with up to 50% slowdowns on meta data intensive workloads, although ext4 is much worse and btrfs is almost as bad as bad. I've demonstrated a prototype of a vectored discard at LSF that builds on the code subitted here and only changes the internals of xfs_discard_extents, which brings the performance back to acceptable levels. For now my suggestion is to put patches 1 and 2 in to give people a chance to play with online discard on XFS. If the block layer issues get fixed in time we can add patches 3 and 4 later in the 2.6.40 cycle, if not they will still be needed once we get the proper vectored discard support that I'm going to start working on soon. And here's the block layer workaround for the discard merge bug: Index: xfs/block/blk-core.c =================================================================== --- xfs.orig/block/blk-core.c 2011-03-30 16:04:45.700659775 +0200 +++ xfs/block/blk-core.c 2011-03-30 16:04:59.775160021 +0200 @@ -1247,7 +1247,7 @@ static int __make_request(struct request */ blk_queue_bounce(q, &bio); - if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) { + if (bio->bi_rw & (REQ_FLUSH | REQ_FUA | REQ_DISCARD)) { spin_lock_irq(q->queue_lock); where = ELEVATOR_INSERT_FLUSH; goto get_rq; From BATV+75f33b7b1a870dba700e+2810+infradead.org+hch@bombadil.srs.infradead.org Wed May 4 13:56:34 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,LOCAL_GNU_PATCH autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44IuXR2088827 for ; Wed, 4 May 2011 13:56:34 -0500 X-ASG-Debug-ID: 1304535611-1ac901d50000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 357FC11CD885 for ; Wed, 4 May 2011 12:00:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id mZAtcXrwXFqtrNTC for ; Wed, 04 May 2011 12:00:12 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHhIx-00024p-IB for xfs@oss.sgi.com; Wed, 04 May 2011 19:00:11 +0000 Message-Id: <20110504190011.508745503@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 04 May 2011 14:55:16 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 3/4] xfs: add a reference count to the CIL context Subject: [PATCH 3/4] xfs: add a reference count to the CIL context References: <20110504185513.136746538@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-cil-ctx-refcounting X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304535612 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean For the upcoming asynchronoyus discard support we need to be able to delay freeing the CIL context until the last discard request that reference it has completed. Add a reference count to the CIL context, and only clear the busy extents and free the CIL context structure when it reaches zero, and a work item to allow freeing it from irq context. Note that this means delaying the clearing of the busy extents for a little bit even on non-discard mounts, but with the new busy extent trim/reuse code there is no real life impact of this change. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_log_cil.c =================================================================== --- xfs.orig/fs/xfs/xfs_log_cil.c 2011-05-03 12:00:49.000000000 +0200 +++ xfs/fs/xfs/xfs_log_cil.c 2011-05-03 12:17:19.399350953 +0200 @@ -20,7 +20,7 @@ #include "xfs_types.h" #include "xfs_bit.h" #include "xfs_log.h" -#include "xfs_inum.h" +#include "xfs_inum.h" #include "xfs_trans.h" #include "xfs_trans_priv.h" #include "xfs_log_priv.h" @@ -31,6 +31,46 @@ #include "xfs_alloc.h" #include "xfs_discard.h" +static void +xlog_cil_ctx_free( + struct xfs_cil_ctx *ctx) +{ + if (!list_empty(&ctx->busy_extents)) { + xfs_alloc_busy_clear(ctx->cil->xc_log->l_mp, + &ctx->busy_extents, false); + } + kmem_free(ctx); +} + +static void +xlog_cil_ctx_free_work( + struct work_struct *work) +{ + xlog_cil_ctx_free(container_of(work, struct xfs_cil_ctx, work)); +} + +static void +xlog_cil_ctx_init( + struct xfs_cil_ctx *ctx, + struct xfs_cil *cil, + xfs_lsn_t sequence) +{ + INIT_LIST_HEAD(&ctx->committing); + INIT_LIST_HEAD(&ctx->busy_extents); + ctx->sequence = sequence; + ctx->cil = cil; + atomic_set(&ctx->ref, 1); + INIT_WORK(&ctx->work, xlog_cil_ctx_free_work); + cil->xc_ctx = ctx; + + /* + * Mirror the sequence into the cil structure so that we can do + * unlocked checks against the current sequence in log forces without + * risking deferencing a freed context pointer. + */ + cil->xc_current_sequence = ctx->sequence; +} + /* * Perform initial CIL structure initialisation. If the CIL is not * enabled in this filesystem, ensure the log->l_cilp is null so @@ -64,12 +104,7 @@ xlog_cil_init( init_rwsem(&cil->xc_ctx_lock); init_waitqueue_head(&cil->xc_commit_wait); - INIT_LIST_HEAD(&ctx->committing); - INIT_LIST_HEAD(&ctx->busy_extents); - ctx->sequence = 1; - ctx->cil = cil; - cil->xc_ctx = ctx; - cil->xc_current_sequence = ctx->sequence; + xlog_cil_ctx_init(ctx, cil, 1); cil->xc_log = log; log->l_cilp = cil; @@ -381,10 +416,10 @@ xlog_cil_committed( ASSERT(mp->m_flags & XFS_MOUNT_DISCARD); xfs_discard_extents(mp, &ctx->busy_extents); - xfs_alloc_busy_clear(mp, &ctx->busy_extents, false); } - kmem_free(ctx); + if (atomic_dec_and_test(&ctx->ref)) + xlog_cil_ctx_free(ctx); } /* @@ -491,18 +526,7 @@ xlog_cil_push( * during log forces to extract the commit lsn of the sequence that * needs to be forced. */ - INIT_LIST_HEAD(&new_ctx->committing); - INIT_LIST_HEAD(&new_ctx->busy_extents); - new_ctx->sequence = ctx->sequence + 1; - new_ctx->cil = cil; - cil->xc_ctx = new_ctx; - - /* - * mirror the new sequence into the cil structure so that we can do - * unlocked checks against the current sequence in log forces without - * risking deferencing a freed context pointer. - */ - cil->xc_current_sequence = new_ctx->sequence; + xlog_cil_ctx_init(new_ctx, cil, ctx->sequence + 1); /* * The switch is now done, so we can drop the context lock and move out Index: xfs/fs/xfs/xfs_log_priv.h =================================================================== --- xfs.orig/fs/xfs/xfs_log_priv.h 2011-05-03 11:41:29.000000000 +0200 +++ xfs/fs/xfs/xfs_log_priv.h 2011-05-03 12:13:49.743820088 +0200 @@ -390,6 +390,8 @@ struct xfs_cil_ctx { struct xfs_log_vec *lv_chain; /* logvecs being pushed */ xfs_log_callback_t log_cb; /* completion callback hook. */ struct list_head committing; /* ctx committing list */ + atomic_t ref; /* reference count */ + struct work_struct work; /* for deferred freeing */ }; /* From BATV+75f33b7b1a870dba700e+2810+infradead.org+hch@bombadil.srs.infradead.org Wed May 4 13:56:34 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,LOCAL_GNU_PATCH autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44IuYHI088835 for ; Wed, 4 May 2011 13:56:34 -0500 X-ASG-Debug-ID: 1304535612-145b023a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7BFBE11CD88E for ; Wed, 4 May 2011 12:00:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id CwsKOv6RLpKzVd3K for ; Wed, 04 May 2011 12:00:12 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHhIx-00025L-Od for xfs@oss.sgi.com; Wed, 04 May 2011 19:00:11 +0000 Message-Id: <20110504190011.710110863@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 04 May 2011 14:55:17 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 4/4] xfs: make discard operations asynchronous Subject: [PATCH 4/4] xfs: make discard operations asynchronous References: <20110504185513.136746538@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-async-discard-workqueue X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304535612 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Instead of waiting for each discard request keep the CIL context alive until all of them are done, at which point we can tear it down completly and remove the busy extents from the rbtree. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/linux-2.6/xfs_discard.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_discard.c 2011-05-03 19:43:13.467745055 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_discard.c 2011-05-03 19:43:14.514406051 +0200 @@ -30,6 +30,7 @@ #include "xfs_inode.h" #include "xfs_alloc.h" #include "xfs_error.h" +#include "xfs_log_priv.h" #include "xfs_discard.h" #include "xfs_trace.h" @@ -192,31 +193,88 @@ xfs_ioc_trim( return 0; } -int -xfs_discard_extents( - struct xfs_mount *mp, - struct list_head *list) +STATIC void +xfs_discard_end_io( + struct bio *bio, + int err) { - struct xfs_busy_extent *busyp; - int error = 0; + struct xfs_cil_ctx *ctx = bio->bi_private; - list_for_each_entry(busyp, list, list) { - trace_xfs_discard_extent(mp, busyp->agno, busyp->bno, - busyp->length); - - error = -blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, - XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno), - XFS_FSB_TO_BB(mp, busyp->length), - GFP_NOFS, 0); - if (error && error != EOPNOTSUPP) { - xfs_info(mp, - "discard failed for extent [0x%llu,%u], error %d", - (unsigned long long)busyp->bno, - busyp->length, - error); - return error; + if (err && err != EOPNOTSUPP) { + xfs_info(ctx->cil->xc_log->l_mp, + "discard failed at sector 0x%llu, error %d", + (unsigned long long)bio->bi_sector, err); + } + + if (atomic_dec_and_test(&ctx->ref)) + queue_work(xfs_discard_workqueue, &ctx->work); + bio_put(bio); +} + +STATIC int +xfs_issue_discard( + struct xfs_cil_ctx *ctx, + struct xfs_busy_extent *busyp) +{ + struct xfs_mount *mp = ctx->cil->xc_log->l_mp; + struct block_device *bdev = mp->m_ddev_targp->bt_bdev; + struct request_queue *q = bdev_get_queue(bdev); + unsigned int max_discard_sectors; + struct bio *bio; + sector_t sector; + sector_t nr_sects; + + if (!blk_queue_discard(q)) + return -EOPNOTSUPP; + + trace_xfs_discard_extent(mp, busyp->agno, busyp->bno, busyp->length); + + sector = XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno); + nr_sects = XFS_FSB_TO_BB(mp, busyp->length); + + /* + * Ensure that max_discard_sectors is of the proper granularity + */ + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); + if (q->limits.discard_granularity) { + unsigned int disc_sects = q->limits.discard_granularity >> 9; + + max_discard_sectors &= ~(disc_sects - 1); + } + + while (nr_sects) { + bio = bio_alloc(GFP_NOFS, 1); + if (!bio) + return -ENOMEM; + + bio->bi_sector = sector; + bio->bi_end_io = xfs_discard_end_io; + bio->bi_bdev = bdev; + bio->bi_private = ctx; + + if (nr_sects > max_discard_sectors) { + bio->bi_size = max_discard_sectors << 9; + nr_sects -= max_discard_sectors; + sector += max_discard_sectors; + } else { + bio->bi_size = nr_sects << 9; + nr_sects = 0; } + + atomic_inc(&ctx->ref); + submit_bio(REQ_WRITE | REQ_DISCARD, bio); } return 0; } + +int +xfs_discard_extents( + struct xfs_cil_ctx *ctx) +{ + struct xfs_busy_extent *busyp; + + list_for_each_entry(busyp, &ctx->busy_extents, list) + xfs_issue_discard(ctx, busyp); + return 0; +} Index: xfs/fs/xfs/linux-2.6/xfs_buf.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c 2011-05-03 19:43:07.164445869 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_buf.c 2011-05-03 19:43:14.517739367 +0200 @@ -48,6 +48,7 @@ STATIC void xfs_buf_delwri_queue(xfs_buf static struct workqueue_struct *xfslogd_workqueue; struct workqueue_struct *xfsdatad_workqueue; struct workqueue_struct *xfsconvertd_workqueue; +struct workqueue_struct *xfs_discard_workqueue; #ifdef XFS_BUF_LOCK_TRACKING # define XB_SET_OWNER(bp) ((bp)->b_last_holder = current->pid) @@ -1785,6 +1786,7 @@ xfs_flush_buftarg( LIST_HEAD(wait_list); struct blk_plug plug; + xfs_buf_runall_queues(xfs_discard_workqueue); xfs_buf_runall_queues(xfsconvertd_workqueue); xfs_buf_runall_queues(xfsdatad_workqueue); xfs_buf_runall_queues(xfslogd_workqueue); @@ -1848,8 +1850,15 @@ xfs_buf_init(void) if (!xfsconvertd_workqueue) goto out_destroy_xfsdatad_workqueue; + xfs_discard_workqueue = alloc_workqueue("xfs_discard", + WQ_MEM_RECLAIM | WQ_HIGHPRI, 1); + if (!xfs_discard_workqueue) + goto out_destroy_xfsconvertd_workqueue; + return 0; + out_destroy_xfsconvertd_workqueue: + destroy_workqueue(xfsconvertd_workqueue); out_destroy_xfsdatad_workqueue: destroy_workqueue(xfsdatad_workqueue); out_destroy_xfslogd_workqueue: @@ -1863,6 +1872,7 @@ xfs_buf_init(void) void xfs_buf_terminate(void) { + destroy_workqueue(xfs_discard_workqueue); destroy_workqueue(xfsconvertd_workqueue); destroy_workqueue(xfsdatad_workqueue); destroy_workqueue(xfslogd_workqueue); Index: xfs/fs/xfs/linux-2.6/xfs_buf.h =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h 2011-05-03 19:43:07.181112445 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_buf.h 2011-05-03 19:43:14.517739367 +0200 @@ -348,4 +348,6 @@ extern struct list_head *xfs_get_buftarg #define xfs_binval(buftarg) xfs_flush_buftarg(buftarg, 1) #define XFS_bflush(buftarg) xfs_flush_buftarg(buftarg, 1) +extern struct workqueue_struct *xfs_discard_workqueue; + #endif /* __XFS_BUF_H__ */ Index: xfs/fs/xfs/xfs_alloc.c =================================================================== --- xfs.orig/fs/xfs/xfs_alloc.c 2011-05-03 19:43:13.967742345 +0200 +++ xfs/fs/xfs/xfs_alloc.c 2011-05-03 19:43:14.517739367 +0200 @@ -1084,6 +1084,7 @@ restart: if (!forced++) { trace_xfs_alloc_near_busy(args); xfs_log_force(args->mp, XFS_LOG_SYNC); + flush_workqueue(xfs_discard_workqueue); goto restart; } @@ -1243,8 +1244,10 @@ restart: xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); trace_xfs_alloc_size_busy(args); - if (!forced++) + if (!forced++) { xfs_log_force(args->mp, XFS_LOG_SYNC); + flush_workqueue(xfs_discard_workqueue); + } goto restart; } } @@ -1314,6 +1317,7 @@ restart: xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); trace_xfs_alloc_size_busy(args); xfs_log_force(args->mp, XFS_LOG_SYNC); + flush_workqueue(xfs_discard_workqueue); goto restart; } goto out_nominleft; @@ -2612,13 +2616,13 @@ xfs_alloc_busy_update_extent( xfs_agblock_t bend = bbno + busyp->length; /* - * This extent is currently beeing discard. Give the thread - * performing the discard a chance to mark the extent unbusy - * and retry. + * This extent is currently beeing discard. Flush the discard + * completion queue and retry the search. */ if (busyp->flags & XFS_ALLOC_BUSY_DISCARDED) { spin_unlock(&pag->pagb_lock); - delay(1); + flush_workqueue(xfs_discard_workqueue); + trace_xfs_alloc_busy_discarded(mp, pag->pag_agno, fbno, flen); spin_lock(&pag->pagb_lock); return false; } Index: xfs/fs/xfs/linux-2.6/xfs_trace.h =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_trace.h 2011-05-03 19:43:07.194445707 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_trace.h 2011-05-03 19:43:14.521072682 +0200 @@ -1183,6 +1183,7 @@ DEFINE_BUSY_EVENT(xfs_alloc_busy_enomem) DEFINE_BUSY_EVENT(xfs_alloc_busy_force); DEFINE_BUSY_EVENT(xfs_alloc_busy_reuse); DEFINE_BUSY_EVENT(xfs_alloc_busy_clear); +DEFINE_BUSY_EVENT(xfs_alloc_busy_discarded); TRACE_EVENT(xfs_alloc_busy_trim, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, Index: xfs/fs/xfs/linux-2.6/xfs_discard.h =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_discard.h 2011-05-03 19:43:13.467745055 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_discard.h 2011-05-03 19:43:14.521072682 +0200 @@ -2,9 +2,9 @@ #define XFS_DISCARD_H 1 struct fstrim_range; -struct list_head; +struct xfs_cil_ctx; extern int xfs_ioc_trim(struct xfs_mount *, struct fstrim_range __user *); -extern int xfs_discard_extents(struct xfs_mount *, struct list_head *); +extern int xfs_discard_extents(struct xfs_cil_ctx *); #endif /* XFS_DISCARD_H */ Index: xfs/fs/xfs/xfs_log_cil.c =================================================================== --- xfs.orig/fs/xfs/xfs_log_cil.c 2011-05-03 19:43:14.214407676 +0200 +++ xfs/fs/xfs/xfs_log_cil.c 2011-05-03 19:43:14.521072682 +0200 @@ -415,7 +415,7 @@ xlog_cil_committed( if (!list_empty(&ctx->busy_extents)) { ASSERT(mp->m_flags & XFS_MOUNT_DISCARD); - xfs_discard_extents(mp, &ctx->busy_extents); + xfs_discard_extents(ctx); } if (atomic_dec_and_test(&ctx->ref)) From lists@nerdbynature.de Wed May 4 14:06:29 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44J6Sxn089369 for ; Wed, 4 May 2011 14:06:29 -0500 X-ASG-Debug-ID: 1304536205-266a00360000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 476754302BD for ; Wed, 4 May 2011 12:10:05 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id gM2zCn7FBK5LFiaE for ; Wed, 04 May 2011 12:10:05 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id 2E40D3E5E9; Wed, 4 May 2011 21:10:05 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id C2AE33DD40; Wed, 4 May 2011 21:10:04 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id ACB383DCEE; Wed, 4 May 2011 21:10:04 +0200 (CEST) Date: Wed, 4 May 2011 12:10:04 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: <20110504111211.GF9114@dastard> Message-ID: References: <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> <20110504073615.GD9114@dastard> <20110504111211.GF9114@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304536206 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62775 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, 4 May 2011 at 21:12, Dave Chinner wrote: >> Ok, that will be helpful. Also helpful is that I've (FINALLY!) >> reproduced this myself, and i think i can now reproduce it at will >> on a highmem i686 machine. I'll look into it more later tonight.... I've tried to reproduce it on an SMP i686 with 1024MB RAM, but there was no OOM there. > And here's a patch for you to try. It fixes the problem on my test > machine..... Excellent! After one run with that patch, the machine does not go OOM any more when running du(1) over this XFS filesystem. I've run du(1) login with my oom-debug script to gather sysrq-w and slabinfo, see https://trent.utfs.org/p/bits/2.6.39-rc4/oom/trace/16-* for the details. I'll poke it a bit more over the day, but your patch (one line only! Wow!) seems to help. Thanks so much, Christian. -- BOFH excuse #160: non-redundant fan failure From david@fromorbit.com Wed May 4 18:12:18 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p44NCHPb097348 for ; Wed, 4 May 2011 18:12:18 -0500 X-ASG-Debug-ID: 1304550953-77b1037d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail05.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id DD318C49D0E for ; Wed, 4 May 2011 16:15:53 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id iyxWRgsqRwJdUx7F for ; Wed, 04 May 2011 16:15:53 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlQEAJrawU15LBzagWdsb2JhbACmHhUBARYmJcdtDoV5BJ4B Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail05.adl6.internode.on.net with ESMTP; 05 May 2011 08:45:52 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QHlIL-0006sK-JG; Thu, 05 May 2011 09:15:49 +1000 Date: Thu, 5 May 2011 09:15:49 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110504231549.GA25956@dastard> References: <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> <20110504073615.GD9114@dastard> <20110504111211.GF9114@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail05.adl6.internode.on.net[150.101.137.143] X-Barracuda-Start-Time: 1304550954 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62792 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, May 04, 2011 at 12:10:04PM -0700, Christian Kujau wrote: > On Wed, 4 May 2011 at 21:12, Dave Chinner wrote: > >> Ok, that will be helpful. Also helpful is that I've (FINALLY!) > >> reproduced this myself, and i think i can now reproduce it at will > >> on a highmem i686 machine. I'll look into it more later tonight.... > > I've tried to reproduce it on an SMP i686 with 1024MB RAM, but there was > no OOM there. > > > And here's a patch for you to try. It fixes the problem on my test > > machine..... > > Excellent! After one run with that patch, the machine does not go OOM any > more when running du(1) over this XFS filesystem. Ok, That is good to know that we are seeing the same problem ;) I'll push the fix through the XFS tree to mainline. Thanks for all your effort helping me understand and debug the problem, Christian. Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Wed May 4 19:17:54 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p450Hr5E103014 for ; Wed, 4 May 2011 19:17:54 -0500 X-ASG-Debug-ID: 1304554889-3795026b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail05.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2A69AC4E681; Wed, 4 May 2011 17:21:29 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id 7FlpZQgzgbsttd3R; Wed, 04 May 2011 17:21:29 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlQEAB3swU15LBzagWdsb2JhbACmHhUBARYmJcgHDoV5BJ4B Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail05.adl6.internode.on.net with ESMTP; 05 May 2011 09:51:28 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QHmJq-0006yk-BT; Thu, 05 May 2011 10:21:26 +1000 Date: Thu, 5 May 2011 10:21:26 +1000 From: Dave Chinner To: linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110505002126.GA26797@dastard> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110504005736.GA2958@cucamonga.audible.transient.net> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail05.adl6.internode.on.net[150.101.137.143] X-Barracuda-Start-Time: 1304554891 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62796 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > Dave Chinner wrote: > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > other assorted feedback wrt this issue? I've had it happen twice now > myself in the past week or so, though I have no reliable reproduction > technique. Just wondering if more data points will help isolate the > cause, and if so, how to be prepared to get them. > > For whatever its worth, my last lockup was while running > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. Can you all try the patch below? I've managed to trigger a couple of xlog_wait() lockups in some controlled load tests. The lockups don't appear to occur with the following patch to he race condition in the AIL workqueue trigger. Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: fix race condition queuing AIL pushes From: Dave Chinner The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The problem is that the use of the XFS_AIL_PUSHING_BIT to determine whether a push is currently in progress is racy. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner --- fs/xfs/xfs_trans_ail.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index acdb92f..b7606d9 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -486,15 +486,19 @@ xfs_ail_worker( ailp->xa_last_pushed_lsn = 0; /* - * Check for an updated push target before clearing the - * XFS_AIL_PUSHING_BIT. If the target changed, we've got more - * work to do. Wait a bit longer before starting that work. + * We clear the XFS_AIL_PUSHING_BIT first before checking + * whether the target has changed. If the target has changed, + * this pushes the requeue race directly onto the result of the + * atomic test/set bit, so we are guaranteed that either the + * the pusher that changed the target or ourselves will requeue + * the work (but not both). */ + clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); smp_rmb(); - if (ailp->xa_target == target) { - clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); + if (ailp->xa_target == target || + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) return; - } + tout = 50; } else if (XFS_LSN_CMP(lsn, target) >= 0) { /* From lists@nerdbynature.de Wed May 4 21:03:51 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4523pD2106600 for ; Wed, 4 May 2011 21:03:51 -0500 X-ASG-Debug-ID: 1304561248-7c8102ea0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from trent.utfs.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 4CF05FBF027 for ; Wed, 4 May 2011 19:07:28 -0700 (PDT) Received: from trent.utfs.org (trent.utfs.org [194.246.123.103]) by cuda.sgi.com with ESMTP id VsftkLgHKOX3WCR3 for ; Wed, 04 May 2011 19:07:28 -0700 (PDT) Received: by trent.utfs.org (Postfix, from userid 8) id ED4733DD40; Thu, 5 May 2011 04:07:27 +0200 (CEST) Received: from trent.utfs.org (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 738153DCEE; Thu, 5 May 2011 04:07:27 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trent.utfs.org (Postfix) with ESMTP id 5EA123DB7E; Thu, 5 May 2011 04:07:27 +0200 (CEST) Date: Wed, 4 May 2011 19:07:27 -0700 (PDT) From: Christian Kujau To: Dave Chinner cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks In-Reply-To: <20110504231549.GA25956@dastard> Message-ID: References: <20110501080149.GD13542@dastard> <20110502121958.GA2978@dastard> <20110503005114.GE2978@dastard> <20110504073615.GD9114@dastard> <20110504111211.GF9114@dastard> <20110504231549.GA25956@dastard> User-Agent: Alpine 2.01 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AV-Checked: ClamAV using ClamSMTP (127.0.0.1) X-Barracuda-Connect: trent.utfs.org[194.246.123.103] X-Barracuda-Start-Time: 1304561249 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62804 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Thu, 5 May 2011 at 09:15, Dave Chinner wrote: > I'll push the fix through the XFS tree to mainline. Thanks for all > your effort helping me understand and debug the problem, Christian. OK, I gave it some more testing and compiled the kernel wit UP and TINY_RCU again and with your patch applied to -rc6 I could not reproduce the issue any more. Plain -rc6 still has the OOM problem. Feel free to add: Tested-By: Christian Kujau Again, thanks so much for staying with me and guiding me through all the debug steps. And all your effort fixing such a rare setup. I hope more people will share the joy :-) Christian. -- BOFH excuse #164: root rot From david@fromorbit.com Wed May 4 21:22:41 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p452Me8F107001 for ; Wed, 4 May 2011 21:22:40 -0500 X-ASG-Debug-ID: 1304562376-568d030a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail05.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D7FD1431566; Wed, 4 May 2011 19:26:17 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id lL7FfKGp9a1JtO1t; Wed, 04 May 2011 19:26:17 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlQEAD8Iwk15LBzagWdsb2JhbACmHhUBARYmJcdQDoV5BJ4B Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail05.adl6.internode.on.net with ESMTP; 05 May 2011 11:56:15 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QHoGb-0007AA-I3; Thu, 05 May 2011 12:26:13 +1000 Date: Thu, 5 May 2011 12:26:13 +1000 From: Dave Chinner To: linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110505022613.GA26837@dastard> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110505002126.GA26797@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail05.adl6.internode.on.net[150.101.137.143] X-Barracuda-Start-Time: 1304562378 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62805 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > Dave Chinner wrote: > > > OK, so the common elements here appears to be root filesystems > > > with small log sizes, which means they are tail pushing all the > > > time metadata operations are in progress. Definitely seems like a > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > reproduce this and cook up a patch to fix it. > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > other assorted feedback wrt this issue? I've had it happen twice now > > myself in the past week or so, though I have no reliable reproduction > > technique. Just wondering if more data points will help isolate the > > cause, and if so, how to be prepared to get them. > > > > For whatever its worth, my last lockup was while running > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > Can you all try the patch below? I've managed to trigger a couple of > xlog_wait() lockups in some controlled load tests. The lockups don't > appear to occur with the following patch to he race condition in > the AIL workqueue trigger. They are still there, just harder to hit. FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" gets the system moving again because that changes the push target. I've found two more bugs, and now my test case is now reliably reproducably a 5-10s pause at ~1M created 1byte files and then hanging at about 1.25M files. So there's yet another problem lurking that I need to get to the bottom of. Cheers, Dave. -- Dave Chinner david@fromorbit.com From bestprofit@ymail.com Wed May 4 21:57:35 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: * X-Spam-Status: No, score=1.2 required=5.0 tests=BAYES_50,FREEMAIL_FROM, J_CHICKENPOX_43,J_CHICKENPOX_61,J_CHICKENPOX_64,J_CHICKENPOX_74 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p452vZMq107937 for ; Wed, 4 May 2011 21:57:35 -0500 X-ASG-Debug-ID: 1304564472-7962020a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hazard.mail.atl.earthlink.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 612B04315E3 for ; Wed, 4 May 2011 20:01:13 -0700 (PDT) Received: from hazard.mail.atl.earthlink.net (hazard.mail.atl.earthlink.net [207.69.200.161]) by cuda.sgi.com with ESMTP id nc9m68dN3zbDbHdc for ; Wed, 04 May 2011 20:01:13 -0700 (PDT) Received: from win.atl.earthlink.net ([64.82.0.228] helo=w3w27000) by hazard.mail.atl.earthlink.net with smtp (Exim 3.36 #1) id 1QHooT-00021j-00 for xfs@oss.sgi.com; Wed, 04 May 2011 23:01:13 -0400 thread-index: AcwK0LEHBjFcXoiRRHGnH51YAmIihw== Thread-Topic: I thought you would enjoy this page on villasitalia.com! From: To: X-ASG-Orig-Subj: I thought you would enjoy this page on villasitalia.com! Subject: I thought you would enjoy this page on villasitalia.com! Date: Wed, 4 May 2011 23:01:12 -0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Microsoft CDO for Windows 2000 Content-Class: urn:content-classes:message Importance: normal Priority: normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.4657 X-Barracuda-Connect: hazard.mail.atl.earthlink.net[207.69.200.161] X-Barracuda-Start-Time: 1304564473 X-Barracuda-Bayes: INNOCENT GLOBAL 0.4044 1.0000 0.0000 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=NO_REAL_NAME X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62807 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 NO_REAL_NAME From: does not include a real name X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean "bestprofit@ymail.com" has recommended that you visit this page: Hello Friend, Your Wealth Miners Source Capital! You Can Earn While You Sleep!!! How Could You Imagine To Send Your Ads To More Than 900 Million Everyday Just a Few Click of Your Mouse Hurry This Limited Hot Business in 2011 First Come First Serve... Congratulations: Get Your Waiting $800 Hot Commissions Now!!! It's So Simple,Dont tell me that you cannot know this! That Even A Ten Year Old Could Learn This In Under 1 Hour!"It Doesn't Matter Where In The World You Are.If You Have An Internet Connection & A PC You Earn The $800 For Just A Few Minutes Of Clicking A Mouse " Imagine waking up at 10am in the morning, having a quick look at your PC and finding the exact information you need to collect a quick $300 by lunchtime. You could take the afternoon off, play some golf, go shopping or spend some quality time with the family. Then do the same thing again in the evening, What a wonderful concept and you could be doing it today. Even during the boom years in any economy it's not possible to find a form of free income, BUT THIS IS EXACTLY THAT! and it will make you free money for the rest of your life. It's Easy To Make Money Everyday Even If You're Starting From Scratch With Zero Knowledge, Experience Or Budget!I'll Show You Exactly How. We've Start putting New 32 Members in YOUR TEAM for the May 1 to 8/2011 weekly commission cycle...and GROWING everyday earn by $100 up to $200 or more. IMPORTANT:Advance Don't delay on May 8/2011,is the Cut-Off day to lock in your position then faster you act the higher commission you will earn!!! Go Here To Secure not less than $800 commission Now and it still growing as many people joining under you. if you secure your position right away:The $800 Commission will Arrive Through your Paypal or Credit Card on June/20/2011...Hurry this limited time, only 8 Positions are available Now. Once your Membership are set up, you will be able to earn $800 in less than 2 hours a day.I will show you how we do that and then I will help you through the process so that YOU SUCCEED! And Enjoy!!! You will access your $800 in any ATM when you Join early our weekly cycle. Click Below!!!And Join Right Now.. https://www.plimus.com/jsp/redirect.jsp?contractId=2757066&referrer=bigprofit2011 >>>> This Snapshots is Proven Earn When You Join Early <<<<<<<< TYPE DATE & TIME ------- NEW MEMBERS ----------- COUNTRY P --- MAY. 3 @ 2:38 AM-- Jenny Lopez------------- United States P --- MAY. 3 @ 2:53 AM-- Andy William ----------- United Kingdom P --- MAY. 3 @ 2:56 AM-- Jeffrey Jacobs---------- Germany M --- MAY. 3 @ 4:19 AM-- Mayeth Thompson--------- Singapore P --- MAY. 3 @ 4:28 AM-- Chandrena White--------- Italy P --- MAY. 3 @ 2:38 AM-- Jinky Buffer------------ United States P --- MAY. 3 @ 2:53 AM-- Ailaine Smith ---------- United Kingdom P --- MAY. 3 @ 2:56 AM-- Mandene Jonhson--------- Germany M --- MAY. 3 @ 4:19 AM-- Cristian Gatmaitan------ Singapore P --- MAY. 2 @ 4:28 AM-- Jhon Carmalon----------- Italy M --- MAY. 2 @ 6:01 AM-- lalaine Anderson-------- Australia P --- MAY. 2 @ 7:11 AM-- Rebecca Underwood------- Hungary P --- MAY. 2 @ 7:39 AM-- Jericho Jackson--------- Canada P --- MAY. 2 @ 9:42 AM-- Thomas Silva ----------- Sri Lanka M --- MAY. 2 @ 9:58 PM-- Grace Taylor------------ United States P --- MAY. 2 @ 10:21 PM-- Gina Henry-------------- New Zealand P --- MAY. 2 @ 11:24 PM-- Mohammed Ahmen --------- Romania M --- MAY. 2 @ 11:33 PM-- Tracey Duncan----------- Puerto Rico P --- MAY. 1 @ 11:41 PM-- Jane Stawrt------------- United States P --- MAY. 1 @ 11:47 PM-- Janice Youngstown------- Taiwan P --- MAY. 1 @ 11:53 PM-- Shirley Ong------------- China P --- MAY. 1 @ 1:45 AM-- Ryann Lambert ---------- Europe M --- MAY. 1 @ 12:34 AM-- Nick Gauci ------------- Calefornia M --- MAY. 1 @ 10:24 AM-- Don Riley -------------- Netherland P --- MAY. 1 @ 10:30 AM-- Lorne Whittaker -------- Swetzerland P --- MAY. 1 @ 02:14 AM-- Ashwani Vohra ---------- Brazil M --- MAY. 1 @ 2:34 AM-- Kevin Hunt ------------- United States P --- MAY. 1 @ 1:54 AM-- Charles Brown----------- United States Therefore, you have a GUARANTEED $800 CommissionS every month from now on! Earn $25Per Process!Each $25 x 32 = $800 Commission will be yours... Be Sure to Copy the link below & Paste into your browser and press enter: To Secure your $800 commission! You will access your $800 in any ATM when you Join early our weekly cycle. Click Below!!!And Join Right Now.. https://www.plimus.com/jsp/redirect.jsp?contractId=2757066&referrer=bigprofit2011 After your simple payment of $25 and you could have earn $800 Remember No one else Can give you this kind of commissions in every 20th of the month.Today is $800 in each Member Who start by the month of May 2011.When your signup is complete & valid... You must UPGRADE right away or before others do.... Business Manager Success, Madona Miller MAIN OFFICE, USA, UK, Australia, Asia, Europe Site privacy policy: This page was sent using the villasitalia.com send-to-a-friend link. Your email has not been added to any list and has not been recorded at our site. From judy@yahoo.com Thu May 5 06:24:39 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_50, FREEMAIL_FORGED_REPLYTO,INVALID_MSGID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p45BOch7129144 for ; Thu, 5 May 2011 06:24:38 -0500 X-ASG-Debug-ID: 1304594894-40eb00400000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hadrian.lunariffic.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 522E4133CDC3 for ; Thu, 5 May 2011 04:28:14 -0700 (PDT) Received: from hadrian.lunariffic.com (hadrian.lunariffic.com [67.210.122.47]) by cuda.sgi.com with ESMTP id 1rPcyS1y51qS3XPB for ; Thu, 05 May 2011 04:28:14 -0700 (PDT) Received: from hadrian.lunariffic.com (hadrian.lunariffic.com [127.0.0.1]) by hadrian.lunariffic.com (8.13.8/8.13.8) with ESMTP id p45BPSdd000462 for ; Thu, 5 May 2011 04:25:28 -0700 Received: (from remix2@localhost) by hadrian.lunariffic.com (8.13.8/8.13.8/Submit) id p45BPS4Z000461; Thu, 5 May 2011 04:25:28 -0700 Date: Thu, 5 May 2011 04:25:28 -0700 X-Authentication-Warning: hadrian.lunariffic.com: remix2 set sender to judy@yahoo.com using -f To: xfs@oss.sgi.com X-ASG-Orig-Subj: Product Recommended by erven Subject: Product Recommended by erven MIME-Version: 1.0 From: erven X-Mailer: CubeCart Mailer Reply-To: judy@yahoo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Message-ID: X-Barracuda-Connect: hadrian.lunariffic.com[67.210.122.47] X-Barracuda-Start-Time: 1304594894 X-Barracuda-Bayes: INNOCENT GLOBAL 0.1788 1.0000 -0.9411 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 1.67 X-Barracuda-Spam-Status: No, SCORE=1.67 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=INVALID_MSGID, INVALID_MSGID_2 X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62840 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.01 INVALID_MSGID Message-Id is not valid, according to RFC 2822 2.60 INVALID_MSGID_2 Message-Id is not valid, according to RFC 2822 X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Dear marketer, Hi, Act immediately and check out Mark Hardy’s amazing traffic-driving software. You may already be too late. Take what you deserve right now: http://wealthbuilder014.co.cc/affecc4.php?e=xfs@oss.sgi.com Ciao, Jean G. click the link below to unsubscribe: http://wealthbuilder014.co.cc/un.php?e=xfs@oss.sgi.com . ~~~~~~~~~~~~~~~~~~~~~~~~~~ To view this product please follow the link below: http://www.remixxit.com/store/index.php?_a=viewProd&productId=8 ~~~~~~~~~~~~~~~~~~~~~~~~~~ This email was sent from http://www.remixxit.com/store Sender's IP Address: 112.201.233.47 From P@draigBrady.com Thu May 5 06:25:58 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p45BPwlj129214 for ; Thu, 5 May 2011 06:25:58 -0500 X-ASG-Debug-ID: 1304594974-18e2035d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail1.slb.deg.dub.stisp.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id BD3B2432330 for ; Thu, 5 May 2011 04:29:35 -0700 (PDT) Received: from mail1.slb.deg.dub.stisp.net (mail1.slb.deg.dub.stisp.net [84.203.253.98]) by cuda.sgi.com with SMTP id jN0PVZ6G6JwPcluP for ; Thu, 05 May 2011 04:29:35 -0700 (PDT) Received: (qmail 29321 invoked from network); 5 May 2011 11:29:33 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 5 May 2011 11:29:33 -0000 Message-ID: <4DC28A00.7010309@draigBrady.com> Date: Thu, 05 May 2011 12:29:04 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: Yongqiang Yang CC: Eric Sandeen , xfs-oss , linux-ext4@vger.kernel.org, coreutils@gnu.org, Markus Trippelsdorf X-ASG-Orig-Subj: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) References: <20110414102608.GA1678@x4.trippels.de> <20110414120635.GB1678@x4.trippels.de> <20110414140222.GB1679@x4.trippels.de> <4DA70BD3.1070409@draigBrady.com> <4DA717B2.3020305@sandeen.net> <4DA7182B.8050409@draigBrady.com> <4DA71920.9@sandeen.net> In-Reply-To: X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Barracuda-Connect: mail1.slb.deg.dub.stisp.net[84.203.253.98] X-Barracuda-Start-Time: 1304594976 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0105 1.0000 -1.9525 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.95 X-Barracuda-Spam-Status: No, SCORE=-1.95 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62840 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 14/04/11 17:10, Yongqiang Yang wrote: > Hi, > > I am off my working computer. Maybe below fix could fix the problem. > > fs/ext4/extent.c > static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block, > 1877 } else if (block >= le32_to_cpu(ex->ee_block)) { > 1878 /* > 1879 * some part of requested space is covered > 1880 * by found extent > 1881 */ > 1882 start = block; > 1883 end = le32_to_cpu(ex->ee_block) > 1884 + ext4_ext_get_actual_len(ex); > 1885 if (block + num < end) > 1886 end = block + num; > + if (!ext4_ext_is_uninitialized(ex)) > 1887 exists = 1; > 1888 } else { > 1889 BUG(); > 1890 } Hi, To follow up on the above. I'm under the impression that ext4 is expected to return extents for what is written, irrespective of whether it's reached the disk or not. I.E. the preallocation case where this fails was an oversite, for which the above might fix. So is the above summary correct, and has there been any more thoughts on a fix? cheers, Pádraig. From xiaoqiangnk@gmail.com Thu May 5 06:43:52 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, MIME_8BIT_HEADER,T_DKIM_INVALID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p45Bhpg0129984 for ; Thu, 5 May 2011 06:43:52 -0500 X-ASG-Debug-ID: 1304596049-410400800000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-vx0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 6FCDF11CD5CE for ; Thu, 5 May 2011 04:47:29 -0700 (PDT) Received: from mail-vx0-f181.google.com (mail-vx0-f181.google.com [209.85.220.181]) by cuda.sgi.com with ESMTP id NqUG9UjaAW1FVb0S for ; Thu, 05 May 2011 04:47:29 -0700 (PDT) Received: by vxb39 with SMTP id 39so2420284vxb.26 for ; Thu, 05 May 2011 04:47:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZbDGYCdizAn9zmf/T6HYEVlYREv4DCXt7uKJYVEp47I=; b=PnFY3tfV6Bx9OeLR+0hQH1eHaDkVMrEd3QbZykkRAMSrtwppUvh3/cG0ty8+y95PLF X0gQGQ9/4pupqMTFray/fsxvjrYGpWihl7Dncsip72urpaZbscy3oyd53tQJ22d6Op6q hBczAG7BkFRGFU6AJSTfXEf83B0JHmmgC9MmQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=eyaIHyljjvtmtrwdKgS8FnKfydVuLGJYbAFlBwDpyBmPJ32CTit7orXuG4dPC9PjBX LPkcthy5dD3Q/u3cQ4OoP0bRJxKRiM4ZnUdnwYLThf/JQDKmE09rTri034bKCuqep9jZ WuQlvjrFDcXZxRVmH0dt6QouP6kvwhe3wn+eA= MIME-Version: 1.0 Received: by 10.220.162.200 with SMTP id w8mr542802vcx.183.1304596049434; Thu, 05 May 2011 04:47:29 -0700 (PDT) Received: by 10.220.170.141 with HTTP; Thu, 5 May 2011 04:47:29 -0700 (PDT) In-Reply-To: <4DC28A00.7010309@draigBrady.com> References: <20110414102608.GA1678@x4.trippels.de> <20110414120635.GB1678@x4.trippels.de> <20110414140222.GB1679@x4.trippels.de> <4DA70BD3.1070409@draigBrady.com> <4DA717B2.3020305@sandeen.net> <4DA7182B.8050409@draigBrady.com> <4DA71920.9@sandeen.net> <4DC28A00.7010309@draigBrady.com> Date: Thu, 5 May 2011 19:47:29 +0800 Message-ID: X-ASG-Orig-Subj: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) From: Yongqiang Yang To: =?ISO-8859-1?Q?P=E1draig_Brady?= Cc: Eric Sandeen , xfs-oss , linux-ext4@vger.kernel.org, coreutils@gnu.org, Markus Trippelsdorf Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: mail-vx0-f181.google.com[209.85.220.181] X-Barracuda-Start-Time: 1304596050 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0005 1.0000 -2.0175 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62842 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean 2011/5/5 P=E1draig Brady : > On 14/04/11 17:10, Yongqiang Yang wrote: >> Hi, >> >> I am off my working computer. =A0Maybe below fix could fix the problem. >> >> fs/ext4/extent.c >> static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block, >> 1877 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else if (block >=3D le32_to_cpu(e= x->ee_block)) { >> 1878 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* >> 1879 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* some part of r= equested space is covered >> 1880 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* by found exten= t >> 1881 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/ >> 1882 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 start =3D block; >> 1883 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 end =3D le32_to_cpu= (ex->ee_block) >> 1884 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 + e= xt4_ext_get_actual_len(ex); >> 1885 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (block + num < e= nd) >> 1886 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 end= =3D block + num; >> =A0 =A0 =A0 =A0+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!ext= 4_ext_is_uninitialized(ex)) >> 1887 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 exists =3D 1; >> 1888 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else { >> 1889 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 BUG(); >> 1890 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > > Hi, > > To follow up on the above. =A0I'm under the impression > that ext4 is expected to return extents for what > is written, irrespective of whether it's reached the > disk or not. I.E. the preallocation case where this fails No. It just returns extent info now - allocated extents and delayed extents. In the preallocation case, it returns unwritten extents. And the code above does not work. > was an oversite, for which the above might fix. > > So is the above summary correct, and has there > been any more thoughts on a fix? > > cheers, > P=E1draig. > --=20 Best Wishes Yongqiang Yang From david@fromorbit.com Thu May 5 07:17:55 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p45CHsI1131184 for ; Thu, 5 May 2011 07:17:54 -0500 X-ASG-Debug-ID: 1304598090-40f6012d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail05.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 867CDC4F354; Thu, 5 May 2011 05:21:30 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id KMaOyC3ZfxmNNSL9; Thu, 05 May 2011 05:21:30 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsQDAO2Uwk15LBzagWdsb2JhbACmNBUBARYmJcYpDoV5BJ4r Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail05.adl6.internode.on.net with ESMTP; 05 May 2011 21:51:29 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QHxYT-0007wm-99; Thu, 05 May 2011 22:21:17 +1000 Date: Thu, 5 May 2011 22:21:17 +1000 From: Dave Chinner To: linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110505122117.GB26837@dastard> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110505022613.GA26837@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail05.adl6.internode.on.net[150.101.137.143] X-Barracuda-Start-Time: 1304598092 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.52 X-Barracuda-Spam-Status: No, SCORE=-1.52 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62844 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > Dave Chinner wrote: > > > > OK, so the common elements here appears to be root filesystems > > > > with small log sizes, which means they are tail pushing all the > > > > time metadata operations are in progress. Definitely seems like a > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > reproduce this and cook up a patch to fix it. > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > other assorted feedback wrt this issue? I've had it happen twice now > > > myself in the past week or so, though I have no reliable reproduction > > > technique. Just wondering if more data points will help isolate the > > > cause, and if so, how to be prepared to get them. > > > > > > For whatever its worth, my last lockup was while running > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > Can you all try the patch below? I've managed to trigger a couple of > > xlog_wait() lockups in some controlled load tests. The lockups don't > > appear to occur with the following patch to he race condition in > > the AIL workqueue trigger. > > They are still there, just harder to hit. > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > gets the system moving again because that changes the push target. > > I've found two more bugs, and now my test case is now reliably > reproducably a 5-10s pause at ~1M created 1byte files and then > hanging at about 1.25M files. So there's yet another problem lurking > that I need to get to the bottom of. Which, of course, was the real regression. The patch below has survived a couple of hours of testing, which fixes all 4 of the problems I found. Please test. Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: fix race conditions and regressions in AIL pushing From: Dave Chinner The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. There are four problems being hit. The first problem is that the use of the XFS_AIL_PUSHING_BIT to determine whether a push is currently in progress is racy. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. The second problem is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. The third problem is that updating the push target is not safe on 32 bit machines. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. THe final problem, and the ultimate cause of the regression is that there are two exit paths from the AIL push work. One does the right thing with clearing the PUSH bit and rechecking the target, while the other simply returns if the AIL is empty when the push work starts. This exit path needs to do the same PUSHING bit clearing and target checking as the normal other "no more work to be done" path. Note: this needs to be split into 4 patches to push into mainline. This is an aggregated patch just for testing. Signed-off-by: Dave Chinner --- fs/xfs/xfs_trans_ail.c | 24 ++++++++++++++---------- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index acdb92f..ab0d045 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -368,8 +368,7 @@ xfs_ail_worker( */ xfs_trans_ail_cursor_done(ailp, cur); spin_unlock(&ailp->xa_lock); - ailp->xa_last_pushed_lsn = 0; - return; + goto out_done; } XFS_STATS_INC(xs_push_ail); @@ -387,7 +386,7 @@ xfs_ail_worker( */ lsn = lip->li_lsn; flush_log = stuck = count = 0; - while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { + while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) { int lock_result; /* * If we can lock the item without sleeping, unlock the AIL @@ -482,19 +481,24 @@ xfs_ail_worker( /* assume we have more work to do in a short while */ tout = 10; if (!count) { +out_done: /* We're past our target or empty, so idle */ ailp->xa_last_pushed_lsn = 0; /* - * Check for an updated push target before clearing the - * XFS_AIL_PUSHING_BIT. If the target changed, we've got more - * work to do. Wait a bit longer before starting that work. + * We clear the XFS_AIL_PUSHING_BIT first before checking + * whether the target has changed. If the target has changed, + * this pushes the requeue race directly onto the result of the + * atomic test/set bit, so we are guaranteed that either the + * the pusher that changed the target or ourselves will requeue + * the work (but not both). */ + clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); smp_rmb(); - if (ailp->xa_target == target) { - clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); + if (ailp->xa_target == target || + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) return; - } + tout = 50; } else if (XFS_LSN_CMP(lsn, target) >= 0) { /* @@ -553,7 +557,7 @@ xfs_ail_push( * the XFS_AIL_PUSHING_BIT. */ smp_wmb(); - ailp->xa_target = threshold_lsn; + xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn); if (!test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags)) queue_delayed_work(xfs_syncd_wq, &ailp->xa_work, 0); } From BATV+df442501ed93e6cf4413+2811+infradead.org+hch@bombadil.srs.infradead.org Thu May 5 07:36:31 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p45CaUAn132007 for ; Thu, 5 May 2011 07:36:30 -0500 X-ASG-Debug-ID: 1304599208-40b901e70000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 361CD14C66EC; Thu, 5 May 2011 05:40:09 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id eejHeLFos63G8jX7; Thu, 05 May 2011 05:40:09 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QHxqZ-00084A-Ae; Thu, 05 May 2011 12:39:59 +0000 Date: Thu, 5 May 2011 08:39:59 -0400 From: Christoph Hellwig To: Dave Chinner Cc: linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno Pr?mont , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110505123959.GA21098@infradead.org> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> <20110505122117.GB26837@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110505122117.GB26837@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304599209 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean > The third problem is that updating the push target is not safe on 32 > bit machines. We cannot copy a 64 bit LSN without the possibility of > corrupting the result when racing with another updating thread. We > have function to do this update safely without needing to care about > 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when > updating the AIL push target. But reading xa_target without xa_lock isn't safe on 32-bit either, is it? For the first read it can trivially be moved into the critical section a few lines below, and the second one should probably use XFS_LSN_CMP. > @@ -482,19 +481,24 @@ xfs_ail_worker( > /* assume we have more work to do in a short while */ > tout = 10; > if (!count) { > +out_done: Jumping into conditionals is really ugly. By initializing count a bit earlier you can just jump in front of the if/else clauses. And while you're there maybe moving the tout = 10; into an else clause would also make the code more readable. an uninitialied used of tout. > + if (ailp->xa_target == target || > + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) no need for braces around the test_and_set_bit call. From bonbons@linux-vserver.org Thu May 5 15:31:51 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p45KVoQZ156809 for ; Thu, 5 May 2011 15:31:51 -0500 X-ASG-Debug-ID: 1304627728-1fd602140000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from legolas.restena.lu (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A3EC51B25E64; Thu, 5 May 2011 13:35:28 -0700 (PDT) Received: from legolas.restena.lu (legolas.restena.lu [158.64.1.34]) by cuda.sgi.com with ESMTP id TGf2atpzJoiCcdHV; Thu, 05 May 2011 13:35:28 -0700 (PDT) Received: from legolas.restena.lu (localhost [127.0.0.1]) by legolas.restena.lu (Postfix) with ESMTP id 9E06C9DD0B; Thu, 5 May 2011 22:35:27 +0200 (CEST) Received: from neptune.home (unknown [158.64.15.115]) by legolas.restena.lu (Postfix) with ESMTP id 37D61A98A4; Thu, 5 May 2011 22:35:27 +0200 (CEST) Date: Thu, 5 May 2011 22:35:13 +0200 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= To: Dave Chinner Cc: linux-kernel@vger.kernel.org, Markus Trippelsdorf , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110505223513.3654c041@neptune.home> In-Reply-To: <20110505122117.GB26837@dastard> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> <20110505122117.GB26837@dastard> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.1; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Scanned: ClamAV X-Barracuda-Connect: legolas.restena.lu[158.64.1.34] X-Barracuda-Start-Time: 1304627729 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62877 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Status: Clean On Thu, 05 May 2011 Dave Chinner wrote: > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > myself in the past week or so, though I have no reliable reproduction > > > > technique. Just wondering if more data points will help isolate the > > > > cause, and if so, how to be prepared to get them. > > > > > > > > For whatever its worth, my last lockup was while running > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > appear to occur with the following patch to he race condition in > > > the AIL workqueue trigger. > > > > They are still there, just harder to hit. > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > gets the system moving again because that changes the push target. > > > > I've found two more bugs, and now my test case is now reliably > > reproducably a 5-10s pause at ~1M created 1byte files and then > > hanging at about 1.25M files. So there's yet another problem lurking > > that I need to get to the bottom of. > > Which, of course, was the real regression. The patch below has > survived a couple of hours of testing, which fixes all 4 of the > problems I found. Please test. Successfully survives my 2-hours session of today. Will continue testing during week-end and see if it also survives the longer whole-day sessions. Will report results at end of week-end (or earlier in case of trouble). Thanks, Bruno > Cheers, > > Dave. From anisse@astier.eu Thu May 5 17:43:27 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_21 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p45MhRng164791 for ; Thu, 5 May 2011 17:43:27 -0500 X-ASG-Debug-ID: 1304635624-601203c10000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-bw0-f53.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A51A3435521 for ; Thu, 5 May 2011 15:47:04 -0700 (PDT) Received: from mail-bw0-f53.google.com (mail-bw0-f53.google.com [209.85.214.53]) by cuda.sgi.com with ESMTP id AicEdQuZbap07EbS for ; Thu, 05 May 2011 15:47:04 -0700 (PDT) Received: by bwg12 with SMTP id 12so2351081bwg.26 for ; Thu, 05 May 2011 15:47:04 -0700 (PDT) Received: by 10.204.118.211 with SMTP id w19mr319424bkq.83.1304635624151; Thu, 05 May 2011 15:47:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.70.81 with HTTP; Thu, 5 May 2011 15:46:44 -0700 (PDT) In-Reply-To: References: <20110419082705.GI23985@dastard> <20110419130737.45beb611@destiny.ordissimo> <4DB084CE.8020600@sandeen.net> <20110422130920.7be686c6@destiny.ordissimo> <20110504091141.GA30330@infradead.org> From: Anisse Astier Date: Fri, 6 May 2011 00:46:44 +0200 Message-ID: X-ASG-Orig-Subj: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5) Subject: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5) To: Christoph Hellwig Cc: Eric Sandeen , xfs@oss.sgi.com, Dave Chinner Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: mail-bw0-f53.google.com[209.85.214.53] X-Barracuda-Start-Time: 1304635625 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62885 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, May 4, 2011 at 12:24 PM, Anisse Astier wrote: > On Wed, May 4, 2011 at 11:11 AM, Christoph Hellwig wr= ote: >> On Fri, Apr 22, 2011 at 01:09:20PM +0200, Anisse Astier wrote: >>> Yep, I figured that much, it just took me a while to get up & running >>> another system capable of building xfsprogs. >>> >>> Now that I have that, and that I commented the do_warn, xfs_repair is >>> still running after the previous failing point: >>> [???] >>> =A0 =A0 =A0 =A0 - agno =3D 17 >>> bad key in bmbt root (is 73434, would reset to 74194) in inode 22831781= 00 data fork >>> bad fwd (right) sibling pointer (saw 145202888 should be NULLDFSBNO) >>> bad data fork in inode 2283178100 >>> would have cleared inode 2283178100 >>> =A0 =A0 =A0 =A0 - agno =3D 18 >>> [???] (ongoing) >>> >>> Once this is done, I'll test with %llu instead of %u. >>> >>> But please be patient, it's a 900GB filesystem (half-full) with just an= 800 >>> MHz ARM9 processor doing the work, so xfs_repair takes hours to complet= e. >>> Plus I won't have time to do many tests before next week. >>> >>> To be continued. >> >> Any updates? > > Well, Dave had it all figured, and replacing %u by %llu fixes indeed > the problem. > > Just for future reference, the stack of crashing process: > #0 =A0strlen () at ../ports/sysdeps/arm/strlen.S:29 > #1 =A00x40204f78 in _IO_vfprintf_internal (s=3D0xbe9a9730, > format=3D0xbe9a7676 "27", ap=3D) at vfprintf.c:1614 > #2 =A00x40205f70 in buffered_vfprintf (s=3D0x402f2668, format=3D0x8816887= 4 >
, args=3D...) at vfprintf.c:2254 > #3 =A00x40201a44 in _IO_vfprintf_internal (s=3D0x402f2668, format=3D0x7c1= 98 > "\tin inode %u (%s fork) bmap btree block %llu\n", ap=3D out>) at vfprintf.c:1306 > #4 =A00x0003cd48 in do_warn (msg=3D0x7b4dc "data") at xfs_repair.c:379 > #5 =A00x00017088 in process_btinode (mp=3D, agno=3D1= 7, > ino=3D, dip=3D, type=3D34387, > dirty=3D0xbe9aa418, tot=3D0x5, > =A0 =A0nex=3D0xbe9aa418, blkmapp=3D0xbe9aa2d8, whichfork=3D-1097162040, > check_dups=3D-1097162016) at dinode.c:1284 > #6 =A00x00017a04 in process_inode_data_fork (mp=3D, > agno=3D17, ino=3D1476724, dino=3D0x1db7800, type=3D5, dirty=3D0xbe9aa418, > totblocks=3D0xbe9aa2d8, nextents=3D0xbe9aa2c8, > =A0 =A0dblkmap=3D0xbe9aa2e0, check_dups=3D0) at dinode.c:2048 > #7 =A00x0001a5f0 in process_dinode_int (mp=3D, > dino=3D0x1db7800, agno=3D, ino=3D, > was_free=3D0, dirty=3D0x1ad34, used=3D0x0, > =A0 =A0verify_mode=3D-1097161704, uncertain=3D0, ino_discovery=3D1, > check_dups=3D0, extra_attr_check=3D1, isa_dir=3D0x0, parent=3D0xbe9aa408)= at > dinode.c:2631 > #8 =A00x0001ad34 in process_dinode (mp=3D0x7c198, dino=3D0x1b, > agno=3D2283178100, ino=3D0, was_free=3D0, dirty=3D0xbe9aa418, used=3D0xbe= 9aa41c, > ino_discovery=3D1, check_dups=3D0, > =A0 =A0extra_attr_check=3D1, isa_dir=3D0xbe9aa414, parent=3D0xbe9aa408) a= t dinode.c:2773 > #9 =A00x00010630 in process_inode_chunk (mp=3D0xbe9aa508, agno=3D17, > num_inos=3D, first_irec=3D, > ino_discovery=3D1, check_dups=3D0, > =A0 =A0extra_attr_check=3D1, bogus=3D0x0) at dino_chunks.c:777 > #10 0x000110ec in process_aginodes (mp=3D0xbe9aa508, pf_args=3D0xed5c8, > agno=3D17, ino_discovery=3D1, check_dups=3D0, extra_attr_check=3D1) at > dino_chunks.c:1024 > #11 0x00028724 in process_ag_func (wq=3D0x400608, agno=3D17, arg=3D0xed5c= 8) > at phase3.c:154 > #12 0x00028e24 in process_ags (mp=3D0xbe9aa508) at phase3.c:193 > #13 phase3 (mp=3D0xbe9aa508) at phase3.c:232 > #14 0x0003ddd8 in main (argc=3D, argv=3D optimized out>) at xfs_repair.c:712 > >> >> In the meantime I cooked up a little patch (below) to add format string >> checking to the repair-internal varargs printing helpers, which produces >> a lot of warnings. =A0A lot of that is different underlying types for >> fixes-size 64-bit types, but there's quite a few legit errors there as >> well. >> >> >> Index: xfsprogs-dev/repair/err_protos.h >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- xfsprogs-dev.orig/repair/err_protos.h =A0 =A0 =A0 2011-04-22 12:45:2= 5.018475622 +0200 >> +++ xfsprogs-dev/repair/err_protos.h =A0 =A02011-04-22 12:47:22.01450846= 7 +0200 >> @@ -17,10 +17,14 @@ >> =A0*/ >> >> =A0/* abort, internal error */ >> -void =A0__attribute__((noreturn)) do_abort(char const *, ...); >> +void =A0__attribute__((noreturn)) do_abort(char const *, ...) >> + =A0 =A0 =A0 __attribute__((format(printf,1,2))); >> =A0/* abort, system error */ >> -void =A0__attribute__((noreturn)) do_error(char const *, ...); >> +void =A0__attribute__((noreturn)) do_error(char const *, ...) >> + =A0 =A0 =A0 __attribute__((format(printf,1,2))); >> =A0/* issue warning */ >> -void do_warn(char const *, ...); >> +void do_warn(char const *, ...) >> + =A0 =A0 =A0 __attribute__((format(printf,1,2))); >> =A0/* issue log message */ >> -void do_log(char const *, ...); >> +void do_log(char const *, ...) >> + =A0 =A0 =A0 __attribute__((format(printf,1,2))); >> > > I'll give it a try. Before: Building repair [DEP] [CC] agheader.o [CC] attr_repair.o [CC] avl.o [CC] avl64.o [CC] bmap.o [CC] btree.o [CC] dino_chunks.o [CC] dinode.o [CC] dir.o [CC] dir2.o [CC] globals.o [CC] incore.o [CC] incore_bmc.o [CC] init.o [CC] incore_ext.o [CC] incore_ino.o [CC] phase1.o [CC] phase2.o [CC] phase3.o [CC] phase4.o [CC] phase5.o [CC] phase6.o [CC] phase7.o [CC] progress.o [CC] prefetch.o [CC] rt.o [CC] sb.o [CC] scan.o [CC] threads.o [CC] versions.o [CC] xfs_repair.o [LD] xfs_repair After: Building repair [DEP] [CC] agheader.o [CC] attr_repair.o [CC] avl.o [CC] avl64.o [CC] bmap.o bmap.c: In function 'blkmap_getn': bmap.c:145: warning: format '%u' expects type 'unsigned int', but argument 2 has type 'xfs_dfilblks_t' [CC] btree.o [CC] dino_chunks.o [CC] dinode.o dinode.c: In function 'process_btinode': dinode.c:1272: warning: format '%d' expects type 'int', but argument 4 has type '__uint64_t' dinode.c:1287: warning: format '%u' expects type 'unsigned int', but argument 2 has type 'long long unsigned int' [CC] dir.o [CC] dir2.o [CC] globals.o [CC] incore.o [CC] incore_bmc.o [CC] init.o [CC] incore_ext.o [CC] incore_ino.o [CC] phase1.o [CC] phase2.o [CC] phase3.o [CC] phase4.o [CC] phase5.o [CC] phase6.o phase6.c: In function 'longform_dir2_entry_check': phase6.c:2479: warning: format '%u' expects type 'unsigned int', but argument 2 has type 'xfs_fsize_t' phase6.c: In function 'shortform_dir_entry_check': phase6.c:2815: warning: too many arguments for format [CC] phase7.o [CC] progress.o [CC] prefetch.o [CC] rt.o [CC] sb.o sb.c: In function 'get_sb': sb.c:491: warning: too many arguments for format [CC] scan.o scan.c: In function 'scanfunc_allocbt': scan.c:567: warning: format '%d' expects type 'int', but argument 4 has type 'const char *' scan.c:573: warning: format '%d' expects type 'int', but argument 4 has type 'const char *' scan.c: In function 'validate_agf': scan.c:1114: warning: format '%u' expects type 'unsigned int', but argument 3 has type '__uint64_t' [CC] threads.o [CC] versions.o [CC] xfs_repair.o xfs_repair.c: In function 'calc_mkfs': xfs_repair.c:457: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'xfs_agino_t' xfs_repair.c:462: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'xfs_agino_t' xfs_repair.c:466: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'xfs_agino_t' xfs_repair.c:480: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'xfs_agino_t' xfs_repair.c:485: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'xfs_agino_t' xfs_repair.c:489: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'xfs_agino_t' xfs_repair.c:503: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'xfs_agino_t' xfs_repair.c:508: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'xfs_agino_t' xfs_repair.c:512: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'xfs_agino_t' [LD] xfs_repair Stating the obvious here, but we can see the supposed crash cause in dinode.c (%u instead %llu) is preceded by a similar error (%d instead of %llu). There are also other warnings that need fixing. Anisse From david@fromorbit.com Thu May 5 20:45:46 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p461jjIj170816 for ; Thu, 5 May 2011 20:45:46 -0500 X-ASG-Debug-ID: 1304646560-1ff801800000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id DC8A412C18EB; Thu, 5 May 2011 18:49:23 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id QfdCvWkC2GsrjCJE; Thu, 05 May 2011 18:49:23 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Au8DAD9Sw015LBzagWdsb2JhbACmPhUBARYmJYhyvQYOhXkEnis Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 06 May 2011 11:19:18 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QIAAE-000134-Oe; Fri, 06 May 2011 11:49:06 +1000 Date: Fri, 6 May 2011 11:49:06 +1000 From: Dave Chinner To: Christoph Hellwig Cc: linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno Pr?mont , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110506014906.GF26837@dastard> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> <20110505122117.GB26837@dastard> <20110505123959.GA21098@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110505123959.GA21098@infradead.org> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304646564 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62898 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Thu, May 05, 2011 at 08:39:59AM -0400, Christoph Hellwig wrote: > > The third problem is that updating the push target is not safe on 32 > > bit machines. We cannot copy a 64 bit LSN without the possibility of > > corrupting the result when racing with another updating thread. We > > have function to do this update safely without needing to care about > > 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when > > updating the AIL push target. > > But reading xa_target without xa_lock isn't safe on 32-bit either, is it? Not sure - I think it depends on the platform. I don't think we protect LSN reads in any specific way on 32 bit platforms. In this case, I don't think it matters so much on read, because if we get a race with a write that mixes upper/lower words of the target we will eventually hit the stop condition and we won't get a match. That will trigger the requeue code and we'll start the push again. The problem with getting such a race on the target write is that we could get a cycle/block pair that is beyond the current head of the log and we'd never be able to push the AIL again as all push thresholds are truncated to the current head LSN on disk... > For the first read it can trivially be moved into the critical > section a few lines below, and the second one should probably use > XFS_LSN_CMP. > > > @@ -482,19 +481,24 @@ xfs_ail_worker( > > /* assume we have more work to do in a short while */ > > tout = 10; > > if (!count) { > > +out_done: > > Jumping into conditionals is really ugly. By initializing count a bit > earlier you can just jump in front of the if/else clauses. And while > you're there maybe moving the tout = 10; into an else clause would > also make the code more readable. > an uninitialied used of tout. Ok, I'll rework that. > > + if (ailp->xa_target == target || > > + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) > > no need for braces around the test_and_set_bit call. *nod*. Left over from developing the fix... I'll split all these and post them to the xfs-list for review... Cheers, Dave. -- Dave Chinner david@fromorbit.com From dave@fromorbit.com Thu May 5 21:50:45 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,LOCAL_GNU_PATCH autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p462ojK3173086 for ; Thu, 5 May 2011 21:50:45 -0500 X-ASG-Debug-ID: 1304650462-18ec02c80000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D8778435964 for ; Thu, 5 May 2011 19:54:22 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id GjzrkfRAfN4ENN8k for ; Thu, 05 May 2011 19:54:22 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArMGAE1gw015LBzagWdsb2JhbACYWY1lFQEBFiYlxX+GBwSeKw Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 06 May 2011 12:24:22 +0930 Received: from chute ([192.168.1.1] helo=disappointment) by dastard with esmtp (Exim 4.72) (envelope-from ) id 1QIBBL-000181-Tj for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:19 +1000 Received: from dave by disappointment with local (Exim 4.75) (envelope-from ) id 1QIBBO-0007PU-O2 for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:22 +1000 From: Dave Chinner To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 4/5] xfs: make AIL target updates and compares 32bit safe. Subject: [PATCH 4/5] xfs: make AIL target updates and compares 32bit safe. Date: Fri, 6 May 2011 12:54:07 +1000 Message-Id: <1304650448-28438-5-git-send-email-david@fromorbit.com> X-Mailer: git-send-email 1.7.4.4 In-Reply-To: <1304650448-28438-1-git-send-email-david@fromorbit.com> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304650463 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62903 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean From: Dave Chinner The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems noticed was that updates of the push target are not 32 bit safe as the target is a 64 bit value. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. Also move the reading of the target in the push work inside the AIL lock, and use XFS_LSN_CMP() for the unlocked comparison during work termination to close read holes as well. Signed-off-by: Dave Chinner --- fs/xfs/xfs_trans_ail.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 9f427c2..d7eebbf 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -354,7 +354,7 @@ xfs_ail_worker( struct xfs_ail_cursor *cur = &ailp->xa_cursors; xfs_log_item_t *lip; xfs_lsn_t lsn; - xfs_lsn_t target = ailp->xa_target; + xfs_lsn_t target; long tout = 10; int flush_log = 0; int stuck = 0; @@ -362,6 +362,7 @@ xfs_ail_worker( int push_xfsbufd = 0; spin_lock(&ailp->xa_lock); + target = ailp->xa_target; xfs_trans_ail_cursor_init(ailp, cur); lip = xfs_trans_ail_cursor_first(ailp, cur, ailp->xa_last_pushed_lsn); if (!lip || XFS_FORCED_SHUTDOWN(mp)) { @@ -491,7 +492,7 @@ out_done: * work to do. Wait a bit longer before starting that work. */ smp_rmb(); - if (ailp->xa_target == target) { + if (XFS_LSN_CMP(ailp->xa_target, target) == 0) { clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); return; } @@ -553,7 +554,7 @@ xfs_ail_push( * the XFS_AIL_PUSHING_BIT. */ smp_wmb(); - ailp->xa_target = threshold_lsn; + xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn); if (!test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags)) queue_delayed_work(xfs_syncd_wq, &ailp->xa_work, 0); } -- 1.7.4.4 From dave@fromorbit.com Thu May 5 21:50:55 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p462os1V173114 for ; Thu, 5 May 2011 21:50:55 -0500 X-ASG-Debug-ID: 1304650472-1ff303cb0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B24D3C593DE for ; Thu, 5 May 2011 19:54:33 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id lRHjQ5DlEuEvPji8 for ; Thu, 05 May 2011 19:54:33 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArMGAE1gw015LBzagWdsb2JhbACYWY1lFQEBFiYlpkefOIYHBJ4r Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 06 May 2011 12:24:30 +0930 Received: from chute ([192.168.1.1] helo=disappointment) by dastard with esmtp (Exim 4.72) (envelope-from ) id 1QIBBL-00017w-NW for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:19 +1000 Received: from dave by disappointment with local (Exim 4.75) (envelope-from ) id 1QIBBO-0007PK-Fe for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:22 +1000 From: Dave Chinner To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 0/5] xfs: regression fixes for 2.6.39-rc6 Subject: [PATCH 0/5] xfs: regression fixes for 2.6.39-rc6 Date: Fri, 6 May 2011 12:54:03 +1000 Message-Id: <1304650448-28438-1-git-send-email-david@fromorbit.com> X-Mailer: git-send-email 1.7.4.4 X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304650473 X-Barracuda-Bayes: INNOCENT GLOBAL 0.4559 1.0000 0.0000 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62902 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean These are the fixes for recent regressions introduced/exposed by the recent workqueue conversions. Please review. From dave@fromorbit.com Thu May 5 21:50:54 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p462osWe173106 for ; Thu, 5 May 2011 21:50:54 -0500 X-ASG-Debug-ID: 1304650471-0cd303a60001-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 6EEFA1E1F002 for ; Thu, 5 May 2011 19:54:32 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id 9kCkuq7jZd1Niug6 for ; Thu, 05 May 2011 19:54:32 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArMGAE1gw015LBzagWdsb2JhbACYWY1lFQEBFiYlxX+GBwSeKw Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 06 May 2011 12:24:30 +0930 Received: from chute ([192.168.1.1] helo=disappointment) by dastard with esmtp (Exim 4.72) (envelope-from ) id 1QIBBV-00017y-P0 for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:29 +1000 Received: from dave by disappointment with local (Exim 4.75) (envelope-from ) id 1QIBBO-0007PO-KO for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:22 +1000 From: Dave Chinner To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 2/5] xfs: exit AIL push work correctly when AIL is empty Subject: [PATCH 2/5] xfs: exit AIL push work correctly when AIL is empty Date: Fri, 6 May 2011 12:54:05 +1000 Message-Id: <1304650448-28438-3-git-send-email-david@fromorbit.com> X-Mailer: git-send-email 1.7.4.4 In-Reply-To: <1304650448-28438-1-git-send-email-david@fromorbit.com> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304650473 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62903 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean From: Dave Chinner The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The main cause is a regression where a work exit path fails to clear the PUSHING state and recheck the target correctly. Make both exit paths do the same PUSHING bit clearing and target checking when the "no more work to be done" condition is hit. Signed-off-by: Dave Chinner --- fs/xfs/xfs_trans_ail.c | 26 +++++++++++++------------- 1 files changed, 13 insertions(+), 13 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index acdb92f..226c58b 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -346,18 +346,20 @@ xfs_ail_delete( */ STATIC void xfs_ail_worker( - struct work_struct *work) + struct work_struct *work) { - struct xfs_ail *ailp = container_of(to_delayed_work(work), + struct xfs_ail *ailp = container_of(to_delayed_work(work), struct xfs_ail, xa_work); - long tout; - xfs_lsn_t target = ailp->xa_target; - xfs_lsn_t lsn; - xfs_log_item_t *lip; - int flush_log, count, stuck; - xfs_mount_t *mp = ailp->xa_mount; + xfs_mount_t *mp = ailp->xa_mount; struct xfs_ail_cursor *cur = &ailp->xa_cursors; - int push_xfsbufd = 0; + xfs_log_item_t *lip; + xfs_lsn_t lsn; + xfs_lsn_t target = ailp->xa_target; + long tout = 10; + int flush_log = 0; + int stuck = 0; + int count = 0; + int push_xfsbufd = 0; spin_lock(&ailp->xa_lock); xfs_trans_ail_cursor_init(ailp, cur); @@ -368,8 +370,7 @@ xfs_ail_worker( */ xfs_trans_ail_cursor_done(ailp, cur); spin_unlock(&ailp->xa_lock); - ailp->xa_last_pushed_lsn = 0; - return; + goto out_done; } XFS_STATS_INC(xs_push_ail); @@ -386,7 +387,6 @@ xfs_ail_worker( * lots of contention on the AIL lists. */ lsn = lip->li_lsn; - flush_log = stuck = count = 0; while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { int lock_result; /* @@ -480,7 +480,7 @@ xfs_ail_worker( } /* assume we have more work to do in a short while */ - tout = 10; +out_done: if (!count) { /* We're past our target or empty, so idle */ ailp->xa_last_pushed_lsn = 0; -- 1.7.4.4 From dave@fromorbit.com Thu May 5 21:50:55 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p462otX3173116 for ; Thu, 5 May 2011 21:50:55 -0500 X-ASG-Debug-ID: 1304650471-0cd303a60002-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 93A271E1F008 for ; Thu, 5 May 2011 19:54:33 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id H0KhYoEHbvIDFlNq for ; Thu, 05 May 2011 19:54:33 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArMGAE1gw015LBzagWdsb2JhbACYWY1lFQEBFiYlxX+GBwSeKw Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 06 May 2011 12:24:30 +0930 Received: from chute ([192.168.1.1] helo=disappointment) by dastard with esmtp (Exim 4.72) (envelope-from ) id 1QIBBV-00017z-Qs for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:29 +1000 Received: from dave by disappointment with local (Exim 4.75) (envelope-from ) id 1QIBBO-0007PR-ME for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:22 +1000 From: Dave Chinner To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 3/5] xfs: always push the AIL to the target Subject: [PATCH 3/5] xfs: always push the AIL to the target Date: Fri, 6 May 2011 12:54:06 +1000 Message-Id: <1304650448-28438-4-git-send-email-david@fromorbit.com> X-Mailer: git-send-email 1.7.4.4 In-Reply-To: <1304650448-28438-1-git-send-email-david@fromorbit.com> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304650474 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62903 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean From: Dave Chinner The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems discovered is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. Fix it by making sure log items with a LSN that matches the target exactly are pushed during the loop. Signed-off-by: Dave Chinner --- fs/xfs/xfs_trans_ail.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 226c58b..9f427c2 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -387,7 +387,7 @@ xfs_ail_worker( * lots of contention on the AIL lists. */ lsn = lip->li_lsn; - while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { + while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) { int lock_result; /* * If we can lock the item without sleeping, unlock the AIL -- 1.7.4.4 From dave@fromorbit.com Thu May 5 21:50:56 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p462otoX173130 for ; Thu, 5 May 2011 21:50:55 -0500 X-ASG-Debug-ID: 1304650472-1ff303cb0001-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A0819C593DE for ; Thu, 5 May 2011 19:54:33 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id GyquYSnDJw46pJy4 for ; Thu, 05 May 2011 19:54:33 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArMGAE1gw015LBzagWdsb2JhbACYWY1lFQEBFiYlxX+GBwSeKw Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 06 May 2011 12:24:31 +0930 Received: from chute ([192.168.1.1] helo=disappointment) by dastard with esmtp (Exim 4.72) (envelope-from ) id 1QIBBV-00018C-VF for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:29 +1000 Received: from dave by disappointment with local (Exim 4.75) (envelope-from ) id 1QIBBO-0007PX-Pr for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:22 +1000 From: Dave Chinner To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 5/5] xfs: fix race condition in AIL push trigger Subject: [PATCH 5/5] xfs: fix race condition in AIL push trigger Date: Fri, 6 May 2011 12:54:08 +1000 Message-Id: <1304650448-28438-6-git-send-email-david@fromorbit.com> X-Mailer: git-send-email 1.7.4.4 In-Reply-To: <1304650448-28438-1-git-send-email-david@fromorbit.com> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304650474 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62902 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean From: Dave Chinner The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One is caused by a race condition in determining whether there is a psh in progress or not. The XFS_AIL_PUSHING_BIT is used to determine whether a push is currently in progress. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner --- fs/xfs/xfs_trans_ail.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index d7eebbf..5fc2380 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -487,15 +487,19 @@ out_done: ailp->xa_last_pushed_lsn = 0; /* - * Check for an updated push target before clearing the - * XFS_AIL_PUSHING_BIT. If the target changed, we've got more - * work to do. Wait a bit longer before starting that work. + * We clear the XFS_AIL_PUSHING_BIT first before checking + * whether the target has changed. If the target has changed, + * this pushes the requeue race directly onto the result of the + * atomic test/set bit, so we are guaranteed that either the + * the pusher that changed the target or ourselves will requeue + * the work (but not both). */ + clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); smp_rmb(); - if (XFS_LSN_CMP(ailp->xa_target, target) == 0) { - clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); + if (XFS_LSN_CMP(ailp->xa_target, target) == 0 || + test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags)) return; - } + tout = 50; } else if (XFS_LSN_CMP(lsn, target) >= 0) { /* -- 1.7.4.4 From dave@fromorbit.com Thu May 5 21:50:58 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p462owtm173104 for ; Thu, 5 May 2011 21:50:58 -0500 X-ASG-Debug-ID: 1304650471-0cd303a60000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 3929E1E1F002 for ; Thu, 5 May 2011 19:54:31 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id lVOBCqhQ4AhvBqQ9 for ; Thu, 05 May 2011 19:54:31 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArMGAE1gw015LBzagWdsb2JhbACYWY1lFQEBFiYlxX+GBwSOCJAj Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 06 May 2011 12:24:30 +0930 Received: from chute ([192.168.1.1] helo=disappointment) by dastard with esmtp (Exim 4.72) (envelope-from ) id 1QIBBV-00017x-NL for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:29 +1000 Received: from dave by disappointment with local (Exim 4.75) (envelope-from ) id 1QIBBO-0007PM-Hg for xfs@oss.sgi.com; Fri, 06 May 2011 12:54:22 +1000 From: Dave Chinner To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 1/5] xfs: ensure reclaim cursor is reset correctly at end of AG Subject: [PATCH 1/5] xfs: ensure reclaim cursor is reset correctly at end of AG Date: Fri, 6 May 2011 12:54:04 +1000 Message-Id: <1304650448-28438-2-git-send-email-david@fromorbit.com> X-Mailer: git-send-email 1.7.4.4 In-Reply-To: <1304650448-28438-1-git-send-email-david@fromorbit.com> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304650472 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62903 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean From: Dave Chinner On a 32 bit highmem PowerPC machine, the XFS inode cache was growing without bound and exhausting low memory causing the OOM killer to be triggered. After some effort, the problem was reproduced on a 32 bit x86 highmem machine. The problem is that the per-ag inode reclaim index cursor was not getting reset to the start of the AG if the radix tree tag lookup found no more reclaimable inodes. Hence every further reclaim attempt started at the same index beyond where any reclaimable inodes lay, and no further background reclaim ever occurred from the AG. Without background inode reclaim the VM driven cache shrinker simply cannot keep up with cache growth, and OOM is the result. While the change that exposed the problem was the conversion of the inode reclaim to use work queues for background reclaim, it was not the cause of the bug. The bug was introduced when the cursor code was added, just waiting for some weird configuration to strike.... Signed-off-by: Dave Chinner Tested-By: Christian Kujau --- fs/xfs/linux-2.6/xfs_sync.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index e0da841..cb1bb20 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -936,6 +936,7 @@ restart: XFS_LOOKUP_BATCH, XFS_ICI_RECLAIM_TAG); if (!nr_found) { + done = 1; rcu_read_unlock(); break; } -- 1.7.4.4 From stefanx@lrz.uni-muenchen.de Fri May 6 07:07:42 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p46C7f9f195588 for ; Fri, 6 May 2011 07:07:41 -0500 X-ASG-Debug-ID: 1304683879-54f6027d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from acheron.ifi.lmu.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 1F5F815D3616 for ; Fri, 6 May 2011 05:11:19 -0700 (PDT) Received: from acheron.ifi.lmu.de (acheron.ifi.lmu.de [129.187.214.135]) by cuda.sgi.com with ESMTP id gk8PdRdfvrqXuCAi for ; Fri, 06 May 2011 05:11:19 -0700 (PDT) Received: from [10.153.74.164] (aukena.pms.ifi.lmu.de [10.153.74.164]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: josko) by acheron.ifi.lmu.de (Postfix) with ESMTP id 8E4E494A089 for ; Fri, 6 May 2011 14:11:18 +0200 (CEST) Message-ID: <4DC3E566.8070601@lrz.uni-muenchen.de> Date: Fri, 06 May 2011 14:11:18 +0200 From: stefanx User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: xfs@oss.sgi.com X-ASG-Orig-Subj: __write_lock_failed Subject: __write_lock_failed Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: acheron.ifi.lmu.de[129.187.214.135] X-Barracuda-Start-Time: 1304683880 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.52 X-Barracuda-Spam-Status: No, SCORE=-1.52 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62940 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hello, some of my virtual machines (kvm/qemu) sometimes crash (xfs, Ubuntu 10.04, 2.6.32-31-x86_64). I think it happens while taking LVM-snapshots from the XFS-Filesystem of that machines: BUG: soft lockup - CPU#7 stuck for 61s! [kswapd0:84] Modules linked in: reiserfs ipt_ULOG ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm xfs exportfs bridge stp ppdev fbcon tileblit font bitblit softcursor parport_pc psmouse serio_raw i5000_edac bnx2 vga16fb vgastate edac_core i5k_amb shpchp lp parport raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 floppy multipath ahci e1000e linear CPU 7: Modules linked in: reiserfs ipt_ULOG ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm xfs exportfs bridge stp ppdev fbcon tileblit font bitblit softcursor parport_pc psmouse serio_raw i5000_edac bnx2 vga16fb vgastate edac_core i5k_amb shpchp lp parport raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 floppy multipath ahci e1000e linear Pid: 84, comm: kswapd0 Not tainted 2.6.32-27-server #49-Ubuntu PRIMERGY TX200 S4 RIP: 0010:[] [] __write_lock_failed+0x9/0x20 RSP: 0018:ffff88062237dbc8 EFLAGS: 00000206 RAX: 0000000000000000 RBX: ffff88062237dbd0 RCX: 0000000000000000 RDX: ffffffffa01c3a00 RSI: 0000000000000000 RDI: ffff8806190c6048 RBP: ffffffff81012cae R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 000000000000007c R12: ffff88061ae424d0 R13: ffff8802b2c18c00 R14: ffff88061ae424d0 R15: 0000000000000202 FS: 0000000000000000(0000) GS:ffff880016fc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [] ? _write_lock+0x17/0x20 [] ? xfs_inode_ag_walk+0x68/0x140 [xfs] [] ? xfs_reclaim_inode+0x0/0x120 [xfs] [] ? xfs_inode_ag_iterator+0x99/0xe0 [xfs] [] ? xfs_reclaim_inode+0x0/0x120 [xfs] [] ? xfs_reclaim_inode_shrink+0xfc/0x140 [xfs] [] ? shrink_slab+0x125/0x190 [] ? balance_pgdat+0x526/0x6d0 [] ? isolate_pages_global+0x0/0x50 [] ? kswapd+0xfe/0x150 [] ? autoremove_wake_function+0x0/0x40 [] ? kswapd+0x0/0x150 [] ? kthread+0x96/0xa0 [] ? child_rip+0xa/0x20 This link seems to describe the same problem with Xen: http://lists.xensource.com/archives/html/xen-users/2010-04/msg01135.html Any ideas ? Thanks Stefan From wkendall@sgi.com Fri May 6 11:39:39 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_42 autolearn=no version=3.4.0-r929098 Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p46Gddot203751 for ; Fri, 6 May 2011 11:39:39 -0500 Received: from estes.americas.sgi.com (estes.americas.sgi.com [128.162.236.10]) by relay1.corp.sgi.com (Postfix) with ESMTP id 2EB8B8F8094 for ; Fri, 6 May 2011 09:43:16 -0700 (PDT) Received: from augusta (augusta.americas.sgi.com [128.162.233.117]) by estes.americas.sgi.com (Postfix) with ESMTP id 21BAC70001DF; Fri, 6 May 2011 11:43:16 -0500 (CDT) Received: by augusta (Postfix, from userid 2022) id D32C81400A16; Fri, 6 May 2011 11:43:15 -0500 (CDT) From: Bill Kendall To: xfs@oss.sgi.com Cc: Bill Kendall Subject: [PATCH] xfsprogs: fix open_by_handle memory leak Date: Fri, 6 May 2011 11:42:57 -0500 Message-Id: <1304700177-8505-1-git-send-email-wkendall@sgi.com> X-Mailer: git-send-email 1.7.0.4 X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean open_by_handle() calls handle_to_fshandle() which allocates an fshandle on the heap, which is never freed by open_by_handle(). There is no need to call handle_to_fshandle() though, just pass the fhandle (rather than fshandle) to handle_to_fsfd(), like the other *_by_handle() functions do. Signed-off-by: Bill Kendall --- libhandle/handle.c | 7 +------ 1 files changed, 1 insertions(+), 6 deletions(-) diff --git a/libhandle/handle.c b/libhandle/handle.c index c3a6129..b1ec5f2 100644 --- a/libhandle/handle.c +++ b/libhandle/handle.c @@ -292,14 +292,9 @@ open_by_handle( { int fsfd; char *path; - void *fshanp; - size_t fshlen; xfs_fsop_handlereq_t hreq; - if (handle_to_fshandle(hanp, hlen, &fshanp, &fshlen) != 0) - return -1; - - if ((fsfd = handle_to_fsfd(fshanp, &path)) < 0) + if ((fsfd = handle_to_fsfd(hanp, &path)) < 0) return -1; hreq.fd = 0; -- 1.7.0.4 From jpiszcz@lucidpixels.com Sat May 7 11:06:07 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p47G67sQ249337 for ; Sat, 7 May 2011 11:06:07 -0500 X-ASG-Debug-ID: 1304784586-0de201230000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from lucidpixels.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0807A164BEF2 for ; Sat, 7 May 2011 09:09:46 -0700 (PDT) Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by cuda.sgi.com with ESMTP id O3DyKQZFKjZvX0tM for ; Sat, 07 May 2011 09:09:46 -0700 (PDT) Received: by lucidpixels.com (Postfix, from userid 1001) id 0BB8A12028E; Sat, 7 May 2011 12:09:46 -0400 (EDT) Date: Sat, 7 May 2011 12:09:46 -0400 (EDT) From: Justin Piszcz To: xfs@oss.sgi.com cc: linux-kernel@vger.kernel.org X-ASG-Orig-Subj: 2.6.38.4: xfs speed problem? Subject: 2.6.38.4: xfs speed problem? Message-ID: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Barracuda-Connect: lucidpixels.com[75.144.35.66] X-Barracuda-Start-Time: 1304784587 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63052 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hello, Using 2.6.38.4 on two hosts: Host 1: $ /usr/bin/time find geocities.data 1> /dev/null 80.92user 417.93system 2:19:07elapsed 5%CPU (0avgtext+0avgdata 105520maxresident)k 0inputs+0outputs (0major+73373minor)pagefaults 0swaps # xfs_db -c frag -f /dev/sda1 actual 40203982, ideal 40088075, fragmentation factor 0.29% meta-data=/dev/sda1 isize=256 agcount=44, agsize=268435455 blks = sectsz=512 attr=2 data = bsize=4096 blocks=11718704640, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 -- Host 2: $ /usr/bin/time find geocities.data 1>/dev/null 54.60user 337.20system 48:42.71elapsed 13%CPU (0avgtext+0avgdata 105632maxresident)k 0inputs+0outputs (1major+72981minor)pagefaults 0swaps # xfs_db -c frag -f /dev/sdb1 actual 37998306, ideal 37939331, fragmentation factor 0.16% meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks = sectsz=512 attr=2 data = bsize=4096 blocks=2441379328, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 -- Host 1: RAID-6 (7200 RPM Drives, 18+1 hot spare) Host 2: RAID-6 (7200 RPM Drives, 12) Each system uses a 3ware 9750-24i4e controller, same settings. Any thoughts why one is > 2x faster than the other? Justin. From david@fromorbit.com Sat May 7 19:29:46 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p480Tk60001858 for ; Sat, 7 May 2011 19:29:46 -0500 X-ASG-Debug-ID: 1304814804-50d003590000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 9BC611D66CD7 for ; Sat, 7 May 2011 17:33:25 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id MpCP3LTH1tifoq1y for ; Sat, 07 May 2011 17:33:25 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8DAMnhxU15LBzagWdsb2JhbACmGxUBARYmJYhxuWwOhX4EnlQ Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 08 May 2011 10:03:23 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QIrw2-00053A-2h; Sun, 08 May 2011 10:33:22 +1000 Date: Sun, 8 May 2011 10:33:22 +1000 From: Dave Chinner To: Justin Piszcz Cc: xfs@oss.sgi.com, linux-kernel@vger.kernel.org X-ASG-Orig-Subj: Re: 2.6.38.4: xfs speed problem? Subject: Re: 2.6.38.4: xfs speed problem? Message-ID: <20110508003321.GI26837@dastard> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304814806 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63085 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sat, May 07, 2011 at 12:09:46PM -0400, Justin Piszcz wrote: > Hello, > > Using 2.6.38.4 on two hosts: > > Host 1: > $ /usr/bin/time find geocities.data 1> /dev/null > 80.92user 417.93system 2:19:07elapsed 5%CPU (0avgtext+0avgdata 105520maxresident)k > 0inputs+0outputs (0major+73373minor)pagefaults 0swaps > > # xfs_db -c frag -f /dev/sda1 > actual 40203982, ideal 40088075, fragmentation factor 0.29% > > meta-data=/dev/sda1 isize=256 agcount=44, agsize=268435455 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=11718704640, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=521728, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > -- > > Host 2: > $ /usr/bin/time find geocities.data 1>/dev/null > 54.60user 337.20system 48:42.71elapsed 13%CPU (0avgtext+0avgdata 105632maxresident)k > 0inputs+0outputs (1major+72981minor)pagefaults 0swaps > > # xfs_db -c frag -f /dev/sdb1 > actual 37998306, ideal 37939331, fragmentation factor 0.16% > > meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=2441379328, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=521728, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > > -- > > Host 1: RAID-6 (7200 RPM Drives, 18+1 hot spare) Those will be 3TB drives > Host 2: RAID-6 (7200 RPM Drives, 12) and those are 1TB drives. Different hardware is guaranteed to give you different performance, especially from a seek capability perspective. > Each system uses a 3ware 9750-24i4e controller, same settings. > > Any thoughts why one is > 2x faster than the other? Different filesystem sizes mean different directory, inode and data layouts, especially if you are using inode64. Cheers, Dave. -- Dave Chinner david@fromorbit.com From jamie@audible.transient.net Sun May 8 00:07:35 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4857ZMA013295 for ; Sun, 8 May 2011 00:07:35 -0500 X-ASG-Debug-ID: 1304831474-4772025e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from audible.transient.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id 995F7C691E9 for ; Sat, 7 May 2011 22:11:14 -0700 (PDT) Received: from audible.transient.net (audible.transient.net [216.254.12.79]) by cuda.sgi.com with SMTP id ae9ZcYZ9OOUydboy for ; Sat, 07 May 2011 22:11:14 -0700 (PDT) Received: (qmail 29161 invoked from network); 8 May 2011 05:11:13 -0000 Received: from cucamonga.audible.transient.net (192.168.2.5) by canarsie.audible.transient.net with QMQP; 8 May 2011 05:11:13 -0000 Received: (nullmailer pid 19254 invoked by uid 1000); Sun, 08 May 2011 05:11:13 -0000 Date: Sun, 8 May 2011 05:11:13 +0000 From: Jamie Heilman To: Dave Chinner Cc: linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110508051113.GH2934@cucamonga.audible.transient.net> Mail-Followup-To: Dave Chinner , linux-kernel@vger.kernel.org, Markus Trippelsdorf , Bruno =?iso-8859-1?Q?Pr=E9mont?= , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> <20110505122117.GB26837@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110505122117.GB26837@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) X-Barracuda-Connect: audible.transient.net[216.254.12.79] X-Barracuda-Start-Time: 1304831475 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63104 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Dave Chinner wrote: > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > myself in the past week or so, though I have no reliable reproduction > > > > technique. Just wondering if more data points will help isolate the > > > > cause, and if so, how to be prepared to get them. > > > > > > > > For whatever its worth, my last lockup was while running > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > appear to occur with the following patch to he race condition in > > > the AIL workqueue trigger. > > > > They are still there, just harder to hit. > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > gets the system moving again because that changes the push target. > > > > I've found two more bugs, and now my test case is now reliably > > reproducably a 5-10s pause at ~1M created 1byte files and then > > hanging at about 1.25M files. So there's yet another problem lurking > > that I need to get to the bottom of. > > Which, of course, was the real regression. The patch below has > survived a couple of hours of testing, which fixes all 4 of the > problems I found. Please test. Well, 61 hours in now, and no lockups. I've written ~204GiB to my xfs volumes in that time, much of which was audacity temp files which are 1037kB each, so not as metadata intensive as your test case, but it's more or less what I'd been doing in the past when the lockups happened. Looks pretty promising at this point. -- Jamie Heilman http://audible.transient.net/~jamie/ From stan@hardwarefreak.com Sun May 8 12:14:53 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p48HErUm039224 for ; Sun, 8 May 2011 12:14:53 -0500 X-ASG-Debug-ID: 1304875113-546200030000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from greer.hardwarefreak.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 73BC4159CD36 for ; Sun, 8 May 2011 10:18:34 -0700 (PDT) Received: from greer.hardwarefreak.com (mo-65-41-216-221.sta.embarqhsd.net [65.41.216.221]) by cuda.sgi.com with ESMTP id ZR0UymAoX0nVdA7N for ; Sun, 08 May 2011 10:18:34 -0700 (PDT) Received: from [192.168.100.53] (gffx.hardwarefreak.com [192.168.100.53]) by greer.hardwarefreak.com (Postfix) with ESMTP id 978016C0B2; Sun, 8 May 2011 12:18:32 -0500 (CDT) Message-ID: <4DC6D067.1080208@hardwarefreak.com> Date: Sun, 08 May 2011 12:18:31 -0500 From: Stan Hoeppner User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Dave Chinner CC: Justin Piszcz , linux-kernel@vger.kernel.org, xfs@oss.sgi.com X-ASG-Orig-Subj: Re: 2.6.38.4: xfs speed problem? Subject: Re: 2.6.38.4: xfs speed problem? References: <20110508003321.GI26837@dastard> In-Reply-To: <20110508003321.GI26837@dastard> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mo-65-41-216-221.sta.embarqhsd.net[65.41.216.221] X-Barracuda-Start-Time: 1304875114 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.92 X-Barracuda-Spam-Status: No, SCORE=-1.92 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=RDNS_DYNAMIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63151 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/7/2011 7:33 PM, Dave Chinner wrote: > On Sat, May 07, 2011 at 12:09:46PM -0400, Justin Piszcz wrote: >> Hello, >> >> Using 2.6.38.4 on two hosts: >> >> Host 1: >> $ /usr/bin/time find geocities.data 1> /dev/null >> 80.92user 417.93system 2:19:07elapsed 5%CPU (0avgtext+0avgdata 105520maxresident)k >> 0inputs+0outputs (0major+73373minor)pagefaults 0swaps >> >> # xfs_db -c frag -f /dev/sda1 >> actual 40203982, ideal 40088075, fragmentation factor 0.29% >> >> meta-data=/dev/sda1 isize=256 agcount=44, agsize=268435455 blks >> = sectsz=512 attr=2 >> data = bsize=4096 blocks=11718704640, imaxpct=5 >> = sunit=0 swidth=0 blks >> naming =version 2 bsize=4096 ascii-ci=0 >> log =internal bsize=4096 blocks=521728, version=2 >> = sectsz=512 sunit=0 blks, lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 >> >> -- >> >> Host 2: >> $ /usr/bin/time find geocities.data 1>/dev/null >> 54.60user 337.20system 48:42.71elapsed 13%CPU (0avgtext+0avgdata 105632maxresident)k >> 0inputs+0outputs (1major+72981minor)pagefaults 0swaps >> >> # xfs_db -c frag -f /dev/sdb1 >> actual 37998306, ideal 37939331, fragmentation factor 0.16% >> >> meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks >> = sectsz=512 attr=2 >> data = bsize=4096 blocks=2441379328, imaxpct=5 >> = sunit=0 swidth=0 blks >> naming =version 2 bsize=4096 ascii-ci=0 >> log =internal bsize=4096 blocks=521728, version=2 >> = sectsz=512 sunit=0 blks, lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 How much would it help, if any, with this specific 'test', or with overall XFS performance, if Justin were to... >> Host 1: RAID-6 (7200 RPM Drives, 18+1 hot spare) remake the fs on the above device with 'sw=16' or remount with appropriate sunit and swidth values? >> Host 2: RAID-6 (7200 RPM Drives, 12) remake the fs on the above device with 'sw=10' or remount with appropriate sunit and swidth values? -- Stan From bonbons@linux-vserver.org Mon May 9 00:53:28 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p495rSpA066727 for ; Mon, 9 May 2011 00:53:28 -0500 X-ASG-Debug-ID: 1304920627-365402510000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtprelay.restena.lu (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B8EB843C8C6; Sun, 8 May 2011 22:57:08 -0700 (PDT) Received: from smtprelay.restena.lu (smtprelay.restena.lu [158.64.1.62]) by cuda.sgi.com with ESMTP id Tz00TVBvLBypKeRD; Sun, 08 May 2011 22:57:08 -0700 (PDT) Received: from smtprelay.restena.lu (localhost [127.0.0.1]) by smtprelay.restena.lu (Postfix) with ESMTP id 7DC8E109FA; Mon, 9 May 2011 07:57:06 +0200 (CEST) Received: from pluto.restena.lu (pluto.restena.lu [IPv6:2001:a18:1:8:230:5ff:fefe:5152]) by smtprelay.restena.lu (Postfix) with ESMTPS id 5718D106CB; Mon, 9 May 2011 07:57:06 +0200 (CEST) Date: Mon, 9 May 2011 07:57:09 +0200 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= To: Dave Chinner Cc: linux-kernel@vger.kernel.org, Markus Trippelsdorf , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner X-ASG-Orig-Subj: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110509075709.3c527fd2@pluto.restena.lu> In-Reply-To: <20110505223513.3654c041@neptune.home> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> <20110505122117.GB26837@dastard> <20110505223513.3654c041@neptune.home> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Scanned: ClamAV X-Barracuda-Connect: smtprelay.restena.lu[158.64.1.62] X-Barracuda-Start-Time: 1304920628 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63203 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Status: Clean On Thu, 5 May 2011 22:35:13 Bruno Pr=C3=A9mont wrote: > On Thu, 05 May 2011 Dave Chinner wrote: > > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > > Dave Chinner wrote: > > > > > > OK, so the common elements here appears to be root filesystems > > > > > > with small log sizes, which means they are tail pushing all the > > > > > > time metadata operations are in progress. Definitely seems like= a > > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > > reproduce this and cook up a patch to fix it. > > > > >=20 > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, = and > > > > > other assorted feedback wrt this issue? I've had it happen twice= now > > > > > myself in the past week or so, though I have no reliable reproduc= tion > > > > > technique. Just wondering if more data points will help isolate = the > > > > > cause, and if so, how to be prepared to get them. > > > > >=20 > > > > > For whatever its worth, my last lockup was while running > > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > >=20 > > > > Can you all try the patch below? I've managed to trigger a couple of > > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > > appear to occur with the following patch to he race condition in > > > > the AIL workqueue trigger. > > >=20 > > > They are still there, just harder to hit. > > >=20 > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > > gets the system moving again because that changes the push target. > > >=20 > > > I've found two more bugs, and now my test case is now reliably > > > reproducably a 5-10s pause at ~1M created 1byte files and then > > > hanging at about 1.25M files. So there's yet another problem lurking > > > that I need to get to the bottom of. > >=20 > > Which, of course, was the real regression. The patch below has > > survived a couple of hours of testing, which fixes all 4 of the > > problems I found. Please test. >=20 > Successfully survives my 2-hours session of today. Will continue testing > during week-end and see if it also survives the longer whole-day sessions. >=20 > Will report results at end of week-end (or earlier in case of trouble). Also survived the whole week-end (at least twice 10 hours) with normal desktop work as well as a few hours of software compilation. (without the patch it would probably have frozen at least twice a day) So looks really good! Thanks, Bruno From michael.monnerie@is.it-management.at Mon May 9 02:50:06 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p497o2Ek070983 for ; Mon, 9 May 2011 02:50:04 -0500 X-ASG-Debug-ID: 1304927620-76ed01e90000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mailsrv14.zmi.at (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 8D2BE15A0911 for ; Mon, 9 May 2011 00:53:41 -0700 (PDT) Received: from mailsrv14.zmi.at (mailsrv1.zmi.at [212.69.164.54]) by cuda.sgi.com with ESMTP id 6UG1rnPZNaaxuOPA for ; Mon, 09 May 2011 00:53:41 -0700 (PDT) Received: from mailsrv.i.zmi.at (h081217106033.dyn.cm.kabsi.at [81.217.106.33]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailsrv2.i.zmi.at", Issuer "power4u.zmi.at" (not verified)) by mailsrv14.zmi.at (Postfix) with ESMTPSA id B6BE0522 for ; Mon, 9 May 2011 09:53:38 +0200 (CEST) Received: from saturn.localnet (saturn.i.zmi.at [10.72.27.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mailsrv.i.zmi.at (Postfix) with ESMTPSA id 4794A401C3A for ; Mon, 9 May 2011 09:53:35 +0200 (CEST) From: Michael Monnerie Organization: it-management http://it-management.at To: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: 2.6.38.4: xfs speed problem? Subject: Re: 2.6.38.4: xfs speed problem? Date: Mon, 9 May 2011 09:53:34 +0200 User-Agent: KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; ) References: <20110508003321.GI26837@dastard> <4DC6D067.1080208@hardwarefreak.com> In-Reply-To: <4DC6D067.1080208@hardwarefreak.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1586613.nOVb0Pkeg8"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <201105090953.35090@zmi.at> X-Barracuda-Connect: mailsrv1.zmi.at[212.69.164.54] X-Barracuda-Start-Time: 1304927621 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63211 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean --nextPart1586613.nOVb0Pkeg8 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable [removed some recipients] On Sonntag, 8. Mai 2011 Stan Hoeppner wrote: > remake the fs on the above device with 'sw=3D16' or remount with=20 > appropriate sunit and swidth values? A remount wouldn't help the existing metadata layout. Would it be=20 sufficient to remount with sw=3D16 and then create a top-level dir,=20 wherein you recreate all existing dirs new, then hard-link each file and=20 remove the old directory structure? Or would it be needed to copy the files too to get advantage of the new=20 sw? =2D-=20 mit freundlichen Gr=FCssen, Michael Monnerie, Ing. BSc it-management Internet Services: Prot=E9ger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 // ****** Radiointerview zum Thema Spam ****** // http://www.it-podcast.at/archiv.html#podcast-100716 //=20 // Haus zu verkaufen: http://zmi.at/langegg/ --nextPart1586613.nOVb0Pkeg8 Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) iEYEABECAAYFAk3HnX8ACgkQzhSR9xwSCbR+xACbBRgo5xZgFVrkQm0R45Db9Pqb mn0AoOvVtgxw8Wd3A3v+JhfQsss5J7c+ =CN/n -----END PGP SIGNATURE----- --nextPart1586613.nOVb0Pkeg8-- From BATV+d760e6d74090d2e56591+2815+infradead.org+hch@bombadil.srs.infradead.org Mon May 9 09:07:01 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p49E6wTh012520 for ; Mon, 9 May 2011 09:07:01 -0500 X-ASG-Debug-ID: 1304950017-282b01900000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id AC0011E22D9C; Mon, 9 May 2011 07:06:57 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id BVaIF7cVLHnk9Q28; Mon, 09 May 2011 07:06:57 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QJR6v-0003Q7-91; Mon, 09 May 2011 14:06:57 +0000 Date: Mon, 9 May 2011 10:06:57 -0400 From: Christoph Hellwig To: Bill Kendall Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH] xfsprogs: fix open_by_handle memory leak Subject: Re: [PATCH] xfsprogs: fix open_by_handle memory leak Message-ID: <20110509140656.GA12262@infradead.org> References: <1304700177-8505-1-git-send-email-wkendall@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1304700177-8505-1-git-send-email-wkendall@sgi.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304950017 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, May 06, 2011 at 11:42:57AM -0500, Bill Kendall wrote: > open_by_handle() calls handle_to_fshandle() which > allocates an fshandle on the heap, which is never > freed by open_by_handle(). There is no need to > call handle_to_fshandle() though, just pass the > fhandle (rather than fshandle) to handle_to_fsfd(), > like the other *_by_handle() functions do. > > Signed-off-by: Bill Kendall Looks good, Reviewed-by: Christoph Hellwig From BATV+d760e6d74090d2e56591+2815+infradead.org+hch@bombadil.srs.infradead.org Mon May 9 09:07:24 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p49E7OlA012543 for ; Mon, 9 May 2011 09:07:24 -0500 X-ASG-Debug-ID: 1304950043-341e01690000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C909543D707 for ; Mon, 9 May 2011 07:07:23 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id btCJq3GReGh61lt7 for ; Mon, 09 May 2011 07:07:23 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QJR7J-0003S4-Vu; Mon, 09 May 2011 14:07:22 +0000 Date: Mon, 9 May 2011 10:07:21 -0400 From: Christoph Hellwig To: Dave Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH 1/5] xfs: ensure reclaim cursor is reset correctly at end of AG Subject: Re: [PATCH 1/5] xfs: ensure reclaim cursor is reset correctly at end of AG Message-ID: <20110509140721.GB12262@infradead.org> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> <1304650448-28438-2-git-send-email-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1304650448-28438-2-git-send-email-david@fromorbit.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304950043 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, May 06, 2011 at 12:54:04PM +1000, Dave Chinner wrote: > From: Dave Chinner > > On a 32 bit highmem PowerPC machine, the XFS inode cache was growing > without bound and exhausting low memory causing the OOM killer to be > triggered. After some effort, the problem was reproduced on a 32 bit > x86 highmem machine. > > The problem is that the per-ag inode reclaim index cursor was not > getting reset to the start of the AG if the radix tree tag lookup > found no more reclaimable inodes. Hence every further reclaim > attempt started at the same index beyond where any reclaimable > inodes lay, and no further background reclaim ever occurred from the > AG. > > Without background inode reclaim the VM driven cache shrinker > simply cannot keep up with cache growth, and OOM is the result. > > While the change that exposed the problem was the conversion of the > inode reclaim to use work queues for background reclaim, it was not > the cause of the bug. The bug was introduced when the cursor code > was added, just waiting for some weird configuration to strike.... Looks good, Reviewed-by: Christoph Hellwig From BATV+d760e6d74090d2e56591+2815+infradead.org+hch@bombadil.srs.infradead.org Mon May 9 09:08:13 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p49E8Dpc012579 for ; Mon, 9 May 2011 09:08:13 -0500 X-ASG-Debug-ID: 1304950092-284001990000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C46BB1E22BE5 for ; Mon, 9 May 2011 07:08:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id oONIc4BJSTnMYWyf for ; Mon, 09 May 2011 07:08:12 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QJR88-0003Uw-C8; Mon, 09 May 2011 14:08:12 +0000 Date: Mon, 9 May 2011 10:08:12 -0400 From: Christoph Hellwig To: Dave Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH 2/5] xfs: exit AIL push work correctly when AIL is empty Subject: Re: [PATCH 2/5] xfs: exit AIL push work correctly when AIL is empty Message-ID: <20110509140812.GC12262@infradead.org> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> <1304650448-28438-3-git-send-email-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1304650448-28438-3-git-send-email-david@fromorbit.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304950092 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, May 06, 2011 at 12:54:05PM +1000, Dave Chinner wrote: > From: Dave Chinner > > The recent conversion of the xfsaild functionality to a work queue > introduced a hard-to-hit log space grant hang. The main cause is a > regression where a work exit path fails to clear the PUSHING state > and recheck the target correctly. > > Make both exit paths do the same PUSHING bit clearing and target > checking when the "no more work to be done" condition is hit. > > Signed-off-by: Dave Chinner Looks good, Reviewed-by: Christoph Hellwig From BATV+d760e6d74090d2e56591+2815+infradead.org+hch@bombadil.srs.infradead.org Mon May 9 09:13:42 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p49EDgc6012736 for ; Mon, 9 May 2011 09:13:42 -0500 X-ASG-Debug-ID: 1304950421-282d01cc0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C59301E230C1 for ; Mon, 9 May 2011 07:13:41 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id HnMxX276f431tLc0 for ; Mon, 09 May 2011 07:13:41 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QJRDR-0004vW-Cx; Mon, 09 May 2011 14:13:41 +0000 Date: Mon, 9 May 2011 10:13:41 -0400 From: Christoph Hellwig To: Dave Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH 3/5] xfs: always push the AIL to the target Subject: Re: [PATCH 3/5] xfs: always push the AIL to the target Message-ID: <20110509141341.GD12262@infradead.org> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> <1304650448-28438-4-git-send-email-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1304650448-28438-4-git-send-email-david@fromorbit.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304950421 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, May 06, 2011 at 12:54:06PM +1000, Dave Chinner wrote: > From: Dave Chinner > > The recent conversion of the xfsaild functionality to a work queue > introduced a hard-to-hit log space grant hang. One of the problems > discovered is a target mismatch between the item pushing loop and > the target itself. > > The push trigger checks for the target increasing (i.e. new target > > current) while the push loop only pushes items that have a LSN < > current. As a result, we can get the situation where the push target > is X, the items at the tail of the AIL have LSN X and they don't get > pushed. The push work then completes thinking it is done, and cannot > be restarted until the push target increases to >= X + 1. If the > push target then never increases (because the tail is not moving), > then we never run the push work again and we stall. > > Fix it by making sure log items with a LSN that matches the target > exactly are pushed during the loop. > > Signed-off-by: Dave Chinner Looks good, Reviewed-by: Christoph Hellwig From BATV+d760e6d74090d2e56591+2815+infradead.org+hch@bombadil.srs.infradead.org Mon May 9 09:14:03 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p49EE3N1012763 for ; Mon, 9 May 2011 09:14:03 -0500 X-ASG-Debug-ID: 1304950442-3403019e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 93ADB43DB95 for ; Mon, 9 May 2011 07:14:02 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id kdHQGP9RC0AxnoHw for ; Mon, 09 May 2011 07:14:02 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QJRDm-0004yZ-6W; Mon, 09 May 2011 14:14:02 +0000 Date: Mon, 9 May 2011 10:14:02 -0400 From: Christoph Hellwig To: Dave Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH 4/5] xfs: make AIL target updates and compares 32bit safe. Subject: Re: [PATCH 4/5] xfs: make AIL target updates and compares 32bit safe. Message-ID: <20110509141402.GE12262@infradead.org> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> <1304650448-28438-5-git-send-email-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1304650448-28438-5-git-send-email-david@fromorbit.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304950442 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good, Reviewed-by: Christoph Hellwig From BATV+d760e6d74090d2e56591+2815+infradead.org+hch@bombadil.srs.infradead.org Mon May 9 09:16:31 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p49EGUHK012867 for ; Mon, 9 May 2011 09:16:31 -0500 X-ASG-Debug-ID: 1304950590-3415017e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B15AE43D22F for ; Mon, 9 May 2011 07:16:30 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id jDmWBGnYpFNkJhEQ for ; Mon, 09 May 2011 07:16:30 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QJRGA-0005mL-Bg; Mon, 09 May 2011 14:16:30 +0000 Date: Mon, 9 May 2011 10:16:30 -0400 From: Christoph Hellwig To: Dave Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH 5/5] xfs: fix race condition in AIL push trigger Subject: Re: [PATCH 5/5] xfs: fix race condition in AIL push trigger Message-ID: <20110509141630.GF12262@infradead.org> References: <1304650448-28438-1-git-send-email-david@fromorbit.com> <1304650448-28438-6-git-send-email-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1304650448-28438-6-git-send-email-david@fromorbit.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304950590 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good, Reviewed-by: Christoph Hellwig From BATV+d760e6d74090d2e56591+2815+infradead.org+hch@bombadil.srs.infradead.org Mon May 9 09:18:48 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p49EIm5J012964 for ; Mon, 9 May 2011 09:18:48 -0500 X-ASG-Debug-ID: 1304950727-6fc2014a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2F59CC68B9C for ; Mon, 9 May 2011 07:18:47 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id 8wNoKp7LLuMMBXq3 for ; Mon, 09 May 2011 07:18:47 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QJRIM-0006Cw-Ac; Mon, 09 May 2011 14:18:46 +0000 Date: Mon, 9 May 2011 10:18:46 -0400 From: Christoph Hellwig To: stefanx Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: __write_lock_failed Subject: Re: __write_lock_failed Message-ID: <20110509141846.GG12262@infradead.org> References: <4DC3E566.8070601@lrz.uni-muenchen.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DC3E566.8070601@lrz.uni-muenchen.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1304950728 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, May 06, 2011 at 02:11:18PM +0200, stefanx wrote: > > Hello, > > some of my virtual machines (kvm/qemu) sometimes crash (xfs, Ubuntu 10.04, 2.6.32-31-x86_64). > I think it happens while taking LVM-snapshots from the XFS-Filesystem of that machines: This area has gotten a lot of changes since 2.6.32, including removing the lock that's causing softlockups for you. I don't think there's much of a chance for help with a kernel that old as people's minds have moved on to much more recent codebases. From aelder@oss.sgi.com Mon May 9 19:29:20 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4A0TKLx034830 for ; Mon, 9 May 2011 19:29:20 -0500 Received: (from aelder@localhost) by oss.sgi.com (8.14.3/8.14.3/Submit) id p4A0TJfe034782; Mon, 9 May 2011 19:29:19 -0500 Date: Mon, 9 May 2011 19:29:19 -0500 Message-Id: <201105100029.p4A0TJfe034782@oss.sgi.com> From: xfs@oss.sgi.com To: xfs@oss.sgi.com Subject: [XFS updates] XFS development tree branch, master, updated. v2.6.38-10128-ge4d3c4a X-Git-Refname: refs/heads/master X-Git-Reftype: branch X-Git-Oldrev: 8c1fdd0be5498f852e00c5fbd9cb0c3969e46cc6 X-Git-Newrev: e4d3c4a43b595d5124ae824d300626e6489ae857 This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "XFS development tree". The branch, master has been updated e4d3c4a xfs: fix race condition in AIL push trigger fd5670f xfs: make AIL target updates and compares 32bit safe. cb64026 xfs: always push the AIL to the target ea35a20 xfs: exit AIL push work correctly when AIL is empty b223221 xfs: ensure reclaim cursor is reset correctly at end of AG from 8c1fdd0be5498f852e00c5fbd9cb0c3969e46cc6 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit e4d3c4a43b595d5124ae824d300626e6489ae857 Author: Dave Chinner Date: Fri May 6 02:54:08 2011 +0000 xfs: fix race condition in AIL push trigger The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One is caused by a race condition in determining whether there is a psh in progress or not. The XFS_AIL_PUSHING_BIT is used to determine whether a push is currently in progress. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder commit fd5670f22fce247754243cf2ed41941e5762d990 Author: Dave Chinner Date: Fri May 6 02:54:07 2011 +0000 xfs: make AIL target updates and compares 32bit safe. The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems noticed was that updates of the push target are not 32 bit safe as the target is a 64 bit value. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. Also move the reading of the target in the push work inside the AIL lock, and use XFS_LSN_CMP() for the unlocked comparison during work termination to close read holes as well. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder commit cb64026b6e8af50db598ec7c3f59d504259b00bb Author: Dave Chinner Date: Fri May 6 02:54:06 2011 +0000 xfs: always push the AIL to the target The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems discovered is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. Fix it by making sure log items with a LSN that matches the target exactly are pushed during the loop. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder commit ea35a20021f8497390d05b93271b4d675516c654 Author: Dave Chinner Date: Fri May 6 02:54:05 2011 +0000 xfs: exit AIL push work correctly when AIL is empty The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The main cause is a regression where a work exit path fails to clear the PUSHING state and recheck the target correctly. Make both exit paths do the same PUSHING bit clearing and target checking when the "no more work to be done" condition is hit. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder commit b223221956675ce8a7b436d198ced974bb388571 Author: Dave Chinner Date: Fri May 6 02:54:04 2011 +0000 xfs: ensure reclaim cursor is reset correctly at end of AG On a 32 bit highmem PowerPC machine, the XFS inode cache was growing without bound and exhausting low memory causing the OOM killer to be triggered. After some effort, the problem was reproduced on a 32 bit x86 highmem machine. The problem is that the per-ag inode reclaim index cursor was not getting reset to the start of the AG if the radix tree tag lookup found no more reclaimable inodes. Hence every further reclaim attempt started at the same index beyond where any reclaimable inodes lay, and no further background reclaim ever occurred from the AG. Without background inode reclaim the VM driven cache shrinker simply cannot keep up with cache growth, and OOM is the result. While the change that exposed the problem was the conversion of the inode reclaim to use work queues for background reclaim, it was not the cause of the bug. The bug was introduced when the cursor code was added, just waiting for some weird configuration to strike.... Signed-off-by: Dave Chinner Tested-By: Christian Kujau Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder ----------------------------------------------------------------------- Summary of changes: fs/xfs/linux-2.6/xfs_sync.c | 1 + fs/xfs/xfs_trans_ail.c | 47 +++++++++++++++++++++++------------------- 2 files changed, 27 insertions(+), 21 deletions(-) hooks/post-receive -- XFS development tree From aelder@oss.sgi.com Mon May 9 19:29:41 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4A0Tf6f034950 for ; Mon, 9 May 2011 19:29:41 -0500 Received: (from aelder@localhost) by oss.sgi.com (8.14.3/8.14.3/Submit) id p4A0Tdoa034902; Mon, 9 May 2011 19:29:39 -0500 Date: Mon, 9 May 2011 19:29:39 -0500 Message-Id: <201105100029.p4A0Tdoa034902@oss.sgi.com> From: xfs@oss.sgi.com To: xfs@oss.sgi.com Subject: [XFS updates] XFS development tree branch, for-linus, updated. v2.6.38-10750-g7ac9565 X-Git-Refname: refs/heads/for-linus X-Git-Reftype: branch X-Git-Oldrev: 3eff1268994f72266b660782e87f215720c29639 X-Git-Newrev: 7ac956576d0ce8f97450a39c2f304db8eea01647 This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "XFS development tree". The branch, for-linus has been updated 7ac9565 xfs: fix race condition in AIL push trigger fe0da76 xfs: make AIL target updates and compares 32bit safe. 50e8668 xfs: always push the AIL to the target 9e7004e xfs: exit AIL push work correctly when AIL is empty 228d62d xfs: ensure reclaim cursor is reset correctly at end of AG from 3eff1268994f72266b660782e87f215720c29639 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 7ac956576d0ce8f97450a39c2f304db8eea01647 Author: Dave Chinner Date: Fri May 6 02:54:08 2011 +0000 xfs: fix race condition in AIL push trigger The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One is caused by a race condition in determining whether there is a psh in progress or not. The XFS_AIL_PUSHING_BIT is used to determine whether a push is currently in progress. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder (cherry picked from commit e4d3c4a43b595d5124ae824d300626e6489ae857) commit fe0da767311933d1c1907cb8d326beea7a3cbd9c Author: Dave Chinner Date: Fri May 6 02:54:07 2011 +0000 xfs: make AIL target updates and compares 32bit safe. The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems noticed was that updates of the push target are not 32 bit safe as the target is a 64 bit value. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. Also move the reading of the target in the push work inside the AIL lock, and use XFS_LSN_CMP() for the unlocked comparison during work termination to close read holes as well. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder (cherry picked from commit fd5670f22fce247754243cf2ed41941e5762d990) commit 50e86686dfb287d720af8b0f977202d205c04215 Author: Dave Chinner Date: Fri May 6 02:54:06 2011 +0000 xfs: always push the AIL to the target The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems discovered is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. Fix it by making sure log items with a LSN that matches the target exactly are pushed during the loop. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder (cherry picked from commit cb64026b6e8af50db598ec7c3f59d504259b00bb) commit 9e7004e741de0b2daabbbadafbaf11ff1a94e00c Author: Dave Chinner Date: Fri May 6 02:54:05 2011 +0000 xfs: exit AIL push work correctly when AIL is empty The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The main cause is a regression where a work exit path fails to clear the PUSHING state and recheck the target correctly. Make both exit paths do the same PUSHING bit clearing and target checking when the "no more work to be done" condition is hit. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder (cherry picked from commit ea35a20021f8497390d05b93271b4d675516c654) commit 228d62dd3f74734b9801c789b5addc57fdfc208f Author: Dave Chinner Date: Fri May 6 02:54:04 2011 +0000 xfs: ensure reclaim cursor is reset correctly at end of AG On a 32 bit highmem PowerPC machine, the XFS inode cache was growing without bound and exhausting low memory causing the OOM killer to be triggered. After some effort, the problem was reproduced on a 32 bit x86 highmem machine. The problem is that the per-ag inode reclaim index cursor was not getting reset to the start of the AG if the radix tree tag lookup found no more reclaimable inodes. Hence every further reclaim attempt started at the same index beyond where any reclaimable inodes lay, and no further background reclaim ever occurred from the AG. Without background inode reclaim the VM driven cache shrinker simply cannot keep up with cache growth, and OOM is the result. While the change that exposed the problem was the conversion of the inode reclaim to use work queues for background reclaim, it was not the cause of the bug. The bug was introduced when the cursor code was added, just waiting for some weird configuration to strike.... Signed-off-by: Dave Chinner Tested-By: Christian Kujau Reviewed-by: Christoph Hellwig Reviewed-by: Alex Elder (cherry picked from commit b223221956675ce8a7b436d198ced974bb388571) ----------------------------------------------------------------------- Summary of changes: fs/xfs/linux-2.6/xfs_sync.c | 1 + fs/xfs/xfs_trans_ail.c | 47 +++++++++++++++++++++++------------------- 2 files changed, 27 insertions(+), 21 deletions(-) hooks/post-receive -- XFS development tree From dave@fromorbit.com Mon May 9 21:05:59 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4A25wbx039263 for ; Mon, 9 May 2011 21:05:59 -0500 X-ASG-Debug-ID: 1304993156-131502850000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 1107944013A for ; Mon, 9 May 2011 19:05:56 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id cMcmMUTyxqMmisDA for ; Mon, 09 May 2011 19:05:56 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuQFAMKbyE15LBzagWdsb2JhbACYEo1vFQEBFiYlpjCfWoYMBJ5U Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail07.adl2.internode.on.net with ESMTP; 10 May 2011 11:35:55 +0930 Received: from chute ([192.168.1.1] helo=disappointment) by dastard with esmtp (Exim 4.72) (envelope-from ) id 1QJcKb-0001AP-Pw for xfs@oss.sgi.com; Tue, 10 May 2011 12:05:49 +1000 Received: from dave by disappointment with local (Exim 4.75) (envelope-from ) id 1QJcKc-0003zl-UU for xfs@oss.sgi.com; Tue, 10 May 2011 12:05:50 +1000 From: Dave Chinner To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH] xfs: fix duplicate workqueue initialisation Subject: [PATCH] xfs: fix duplicate workqueue initialisation Date: Tue, 10 May 2011 12:05:50 +1000 Message-Id: <1304993150-15327-1-git-send-email-david@fromorbit.com> X-Mailer: git-send-email 1.7.4.4 X-Barracuda-Connect: ipmail07.adl2.internode.on.net[150.101.137.131] X-Barracuda-Start-Time: 1304993158 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.52 X-Barracuda-Spam-Status: No, SCORE=-1.52 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63283 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean From: Dave Chinner The workqueue initialisation function is called twice when initialising the XFS subsystem. Remove the second initialisation call. Signed-off-by: Dave Chinner --- fs/xfs/linux-2.6/xfs_super.c | 4 ---- 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c index 29c0ccb..365375c 100644 --- a/fs/xfs/linux-2.6/xfs_super.c +++ b/fs/xfs/linux-2.6/xfs_super.c @@ -1800,10 +1800,6 @@ init_xfs_fs(void) if (error) goto out_cleanup_procfs; - error = xfs_init_workqueues(); - if (error) - goto out_sysctl_unregister; - vfs_initquota(); error = register_filesystem(&xfs_fs_type); -- 1.7.4.4 From BATV+a3100e3fc17e61e197a9+2816+infradead.org+hch@bombadil.srs.infradead.org Tue May 10 00:33:20 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4A5XKZP048887 for ; Tue, 10 May 2011 00:33:20 -0500 X-ASG-Debug-ID: 1305005599-0d79036a0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 259AF1E244C7 for ; Mon, 9 May 2011 22:33:20 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id bvfSVdNISPIpDHct for ; Mon, 09 May 2011 22:33:20 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QJfZP-0001kV-Gv; Tue, 10 May 2011 05:33:19 +0000 Date: Tue, 10 May 2011 01:33:19 -0400 From: Christoph Hellwig To: Dave Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH] xfs: fix duplicate workqueue initialisation Subject: Re: [PATCH] xfs: fix duplicate workqueue initialisation Message-ID: <20110510053319.GA3047@infradead.org> References: <1304993150-15327-1-git-send-email-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1304993150-15327-1-git-send-email-david@fromorbit.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305005600 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 10, 2011 at 12:05:50PM +1000, Dave Chinner wrote: > From: Dave Chinner > > The workqueue initialisation function is called twice when > initialising the XFS subsystem. Remove the second initialisation > call. Not sure how we got there, but it looks indeed buggy. I'm kinda surprised the workqueue code didn't blow up with the duplicate names. From u-kusaka@wm.jp.nec.com Tue May 10 00:43:11 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4A5hAQF049184 for ; Tue, 10 May 2011 00:43:11 -0500 X-ASG-Debug-ID: 1305006188-1925028f0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from tyo201.gate.nec.co.jp (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 65F711E2452E for ; Mon, 9 May 2011 22:43:08 -0700 (PDT) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193]) by cuda.sgi.com with ESMTP id rXwkahees5o5Sdfw for ; Mon, 09 May 2011 22:43:08 -0700 (PDT) Received: from mailgate3.nec.co.jp ([10.7.69.195]) by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id p4A5h6Cx003968 for ; Tue, 10 May 2011 14:43:06 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id p4A5h6b28725 for xfs@oss.sgi.com; Tue, 10 May 2011 14:43:06 +0900 (JST) Received: from mail02.kamome.nec.co.jp (mail02.kamome.nec.co.jp [10.25.43.5]) by mailsv3.nec.co.jp (8.13.8/8.13.4) with ESMTP id p4A5h6Wv027445 for ; Tue, 10 May 2011 14:43:06 +0900 (JST) Received: from kaishu.jp.nec.com ([10.26.220.5] [10.26.220.5]) by mail03.kamome.nec.co.jp with ESMTP id BT-MMP-913295; Tue, 10 May 2011 14:41:52 +0900 Received: from [10.64.168.199] ([10.64.168.199] [10.64.168.199]) by mail.jp.nec.com with ESMTP; Tue, 10 May 2011 14:41:51 +0900 Message-ID: <4DC8D01F.5060704@wm.jp.nec.com> Date: Tue, 10 May 2011 14:41:51 +0900 From: Utako Kusaka User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; ja; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: xfs X-ASG-Orig-Subj: direct IO question Subject: direct IO question Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-Barracuda-Connect: TYO201.gate.nec.co.jp[202.32.8.193] X-Barracuda-Start-Time: 1305006189 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=ISO2022JP_CHARSET X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63297 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 ISO2022JP_CHARSET ISO-2022-JP message X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hi, When I tested concurrent mmap write and direct IO to the same file, it was corrupted. Kernel version is 2.6.39-rc4. I have two questions concerning xfs direct IO. The first is dirty pages are released in direct read. xfs direct IO uses xfs_flushinval_pages(), which writes out and releases dirty pages. If pages are marked as dirty after filemap_write_and_wait_range(), they will be released in truncate_inode_pages_range() without writing out. sys_read() vfs_read() do_sync_read() xfs_file_aio_read() xfs_flushinval_pages() filemap_write_and_wait_range() truncate_inode_pages_range() <--- generic_file_aio_read() filemap_write_and_wait_range() xfs_vm_direct_IO() ext3 calls generic_file_aio_read() only and does not call truncate_inode_pages_range(). sys_read() vfs_read() do_sync_read() generic_file_aio_read() filemap_write_and_wait_range() ext3_direct_IO() xfs_file_aio_read() and xfs_file_dio_aio_write() call generic function. And both xfs functions and generic functions call filemap_write_and_wait_range(). So I wonder whether xfs_flushinval_pages() is necessary. Then, the write range in xfs_flushinval_pages() called from direct IO is from start pos to -1, or LLONG_MAX, and is not IO range. Is there any reason? In generic_file_aio_read and generic_file_direct_write(), it is from start pos to (pos + len - 1). I think xfs_flushinval_pages() should be called with same range. Regards, Utako From ms@citd.de Tue May 10 05:57:29 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4AAvSdu061899 for ; Tue, 10 May 2011 05:57:29 -0500 X-ASG-Debug-ID: 1305025047-1c4d011e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from enyo.dsw2k3.info (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 75C4D440F68 for ; Tue, 10 May 2011 03:57:27 -0700 (PDT) Received: from enyo.dsw2k3.info (enyo.dsw2k3.info [195.71.86.239]) by cuda.sgi.com with ESMTP id pqLsODuYog7kbVMl for ; Tue, 10 May 2011 03:57:27 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by enyo.dsw2k3.info (Postfix) with ESMTP id 3C34EA65C35 for ; Tue, 10 May 2011 12:57:26 +0200 (CEST) X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Scanned: Debian amavisd-new at enyo.dsw2k3.info Received: from enyo.dsw2k3.info ([127.0.0.1]) by localhost (enyo.dsw2k3.info [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 7FYt-f5-4x9M for ; Tue, 10 May 2011 12:57:10 +0200 (CEST) Received: from citd.de (p5B05D88B.dip.t-dialin.net [91.5.216.139]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by enyo.dsw2k3.info (Postfix) with ESMTPSA id 41D75A65C34 for ; Tue, 10 May 2011 12:57:09 +0200 (CEST) Date: Tue, 10 May 2011 12:57:00 +0200 From: Matthias Schniedermeyer To: xfs@oss.sgi.com X-ASG-Orig-Subj: Files appear too big in `du` Subject: Files appear too big in `du` Message-ID: <20110510105700.GA20307@citd.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Barracuda-Connect: enyo.dsw2k3.info[195.71.86.239] X-Barracuda-Start-Time: 1305025048 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-ASG-Whitelist: HEADER (^X-Barracuda-Connect: [^ ]+\.dsw2k3\.info\[) X-Virus-Status: Clean Hi Since a few weeks i'm experiencing an annoying 'thing' where files are often too big in `du` and directory totals are to high in `ls -l`. I appears that files, which are in the process of beeing copied/downloaded/whatever, grow in large chunks ahead of time, while the actual file-content is beeing copied into the files. And then it appears that the last chunk isn't shrunk after the process is finished. Neither xfs_bmap (Version 3.1.5) nor filefrag show anything beyond the extent that compromises the actual file-content. I've noticed this at least with: - "git gc" - cp -a - rsync - downloads with firefox (technically Iceweasel) Kernel is currently 2.6.38.5, Distribution is an up-to-date Debian-SID. mount is with default-parameters, except "noatime". Any idea how to debug this, or is this a known bug and waiting a few days for 2.6.39 should fix this? Bis denn -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. From david@fromorbit.com Tue May 10 08:17:12 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4ADHBRE067213 for ; Tue, 10 May 2011 08:17:12 -0500 X-ASG-Debug-ID: 1305033429-786b003c0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D651AC7A8D9 for ; Tue, 10 May 2011 06:17:09 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id yNqgFF8Z4TWDSvWy for ; Tue, 10 May 2011 06:17:09 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjEEABw4yU15LBzagWdsb2JhbAClfRUBARYmJYhwvGEOhgEEnmA Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 10 May 2011 22:47:08 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QJmoE-0002JS-33; Tue, 10 May 2011 23:17:06 +1000 Date: Tue, 10 May 2011 23:17:06 +1000 From: Dave Chinner To: Matthias Schniedermeyer Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: Files appear too big in `du` Subject: Re: Files appear too big in `du` Message-ID: <20110510131705.GE19446@dastard> References: <20110510105700.GA20307@citd.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110510105700.GA20307@citd.de> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1305033430 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63328 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 10, 2011 at 12:57:00PM +0200, Matthias Schniedermeyer wrote: > Hi > > > Since a few weeks i'm experiencing an annoying 'thing' where files are > often too big in `du` and directory totals are to high in `ls -l`. > > I appears that files, which are in the process of beeing > copied/downloaded/whatever, grow in large chunks ahead of time, while > the actual file-content is beeing copied into the files. It's supposed to work like this. It's called speculative allocation beyond end of file. XFS has always done this, but we've recently made it more aggressive to prevent excessive fragmentation on concurrent large file workloads when there is lots of disk space free. > And then it > appears that the last chunk isn't shrunk after the process is finished. It should be truncated away when the file descriptor is closed and the last reference goes away. > Neither xfs_bmap (Version 3.1.5) nor filefrag show anything beyond the > extent that compromises the actual file-content. what is the output of xfs_bmap -vvp on a file that apparently hasn't been shrunk? How do you know it hasn't been shrunk? Does it persist forever in this state, or does doing something like dropping caches (echo 3 > /proc/sys/vm/drop_caches) cause the specualtive preallocation to disappear? > Any idea how to debug this, or is this a known bug and waiting a few > days for 2.6.39 should fix this? It doesn't appear to be doing anything wrong from your description. Remember that XFS is optimised for high end storage and server configurations and workloads, not typical desktop usage... Cheers, Dave. -- Dave Chinner david@fromorbit.com From david@fromorbit.com Tue May 10 08:23:05 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4ADN5Lm067459 for ; Tue, 10 May 2011 08:23:05 -0500 X-ASG-Debug-ID: 1305033782-2e0700730000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 79B344415D4 for ; Tue, 10 May 2011 06:23:03 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id vCmu4Y1PBp4cbYTb for ; Tue, 10 May 2011 06:23:03 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjEEAJc7yU15LBzagWdsb2JhbAClfRUBARYmJYhwvQYOhgEEnmA Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 10 May 2011 22:53:01 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QJmtx-0002KL-0y; Tue, 10 May 2011 23:23:01 +1000 Date: Tue, 10 May 2011 23:23:00 +1000 From: Dave Chinner To: Utako Kusaka Cc: xfs X-ASG-Orig-Subj: Re: direct IO question Subject: Re: direct IO question Message-ID: <20110510132300.GF19446@dastard> References: <4DC8D01F.5060704@wm.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DC8D01F.5060704@wm.jp.nec.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1305033784 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63329 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 10, 2011 at 02:41:51PM +0900, Utako Kusaka wrote: > Hi, > > When I tested concurrent mmap write and direct IO to the same file, > it was corrupted. Kernel version is 2.6.39-rc4. Long time problem of the mmap_sem being held while .page_mkwrite is called, which means we can't use the i_mutex or xfs inode iolock for serialisation against reads and writes because the mmap_sem can be taken on page faults during read or write. Hence we've got the choice of deadlocks or no serialisation between direct Io and mmap... > I have two questions concerning xfs direct IO. > > The first is dirty pages are released in direct read. xfs direct IO uses > xfs_flushinval_pages(), which writes out and releases dirty pages. Yup - once you bypass the page cache, it is stale and needs to be removed from memory so it can be reread from disk when the next buffered IO occurs. > If pages are marked as dirty after filemap_write_and_wait_range(), > they will be released in truncate_inode_pages_range() without writing out. If .page_mkwrite could take either the iolock or the i_mutex, it would be protected against this like all other operations are. > > sys_read() > vfs_read() > do_sync_read() > xfs_file_aio_read() > xfs_flushinval_pages() > filemap_write_and_wait_range() > truncate_inode_pages_range() <--- > generic_file_aio_read() > filemap_write_and_wait_range() > xfs_vm_direct_IO() > > ext3 calls generic_file_aio_read() only and does not call > truncate_inode_pages_range(). > > sys_read() > vfs_read() > do_sync_read() > generic_file_aio_read() > filemap_write_and_wait_range() > ext3_direct_IO() ext3 is vastly different w.r.t. direct IO functionality, and so can't be directly compared against XFS behaviour. > xfs_file_aio_read() and xfs_file_dio_aio_write() call generic function. And > both xfs functions and generic functions call filemap_write_and_wait_range(). > So I wonder whether xfs_flushinval_pages() is necessary. The data corruption it fixed long ago woul dprobably return in some form... > Then, the write range in xfs_flushinval_pages() called from direct IO is > from start pos to -1, or LLONG_MAX, and is not IO range. Is there any reason? > In generic_file_aio_read and generic_file_direct_write(), it is from start pos > to (pos + len - 1). > I think xfs_flushinval_pages() should be called with same range. Probably should be, but it will need significant testing to ensure that it doesn't intorduce a new coherency/corruption corner case... Cheers, Dave. -- Dave Chinner david@fromorbit.com From aelder@sgi.com Tue May 10 09:43:31 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4AEhVox070702 for ; Tue, 10 May 2011 09:43:31 -0500 Received: from cas.corp.sgi.com (pv-excas2-dc21-nlb.corp.sgi.com [137.38.102.197]) by relay2.corp.sgi.com (Postfix) with ESMTP id 4C3547F0F8D; Tue, 10 May 2011 07:43:28 -0700 (PDT) Received: from [127.0.0.1] (198.149.20.12) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Tue, 10 May 2011 09:43:28 -0500 Subject: Re: [PATCH] xfs: fix duplicate workqueue initialisation From: Alex Elder Reply-To: To: Dave Chinner CC: , Christoph Hellwig In-Reply-To: <20110510053319.GA3047@infradead.org> References: <1304993150-15327-1-git-send-email-david@fromorbit.com> <20110510053319.GA3047@infradead.org> Content-Type: text/plain; charset="UTF-8" Date: Tue, 10 May 2011 09:43:27 -0500 Message-ID: <1305038607.2962.10.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Originating-IP: [198.149.20.12] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, 2011-05-10 at 01:33 -0400, Christoph Hellwig wrote: > On Tue, May 10, 2011 at 12:05:50PM +1000, Dave Chinner wrote: > > From: Dave Chinner > > > > The workqueue initialisation function is called twice when > > initialising the XFS subsystem. Remove the second initialisation > > call. > > Not sure how we got there, but it looks indeed buggy. I'm kinda > surprised the workqueue code didn't blow up with the duplicate names. Maybe duplicated by a merge along the way? The second set of workqueues prevails, leaving the originals unused and leaked. In fact, an xfs module can probably even unload cleanly because it doesn't really "own" the workqueue, which by itself is an interesting artifact. This is clearly a bug but I don't think it is truly harmful, so I am not going to send it to Linus for 2.6.39. If you disagree, let me know and I'll send it separately. Reviewed-by: Alex Elder From ms@citd.de Tue May 10 10:33:19 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4AFXJcA072435 for ; Tue, 10 May 2011 10:33:19 -0500 X-ASG-Debug-ID: 1305041598-5ddf03060000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from enyo.dsw2k3.info (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E7B9C1B88278 for ; Tue, 10 May 2011 08:33:18 -0700 (PDT) Received: from enyo.dsw2k3.info (enyo.dsw2k3.info [195.71.86.239]) by cuda.sgi.com with ESMTP id HWK7ch1BB0P61Iz0 for ; Tue, 10 May 2011 08:33:18 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by enyo.dsw2k3.info (Postfix) with ESMTP id 20DD4A65C36; Tue, 10 May 2011 17:33:17 +0200 (CEST) X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Scanned: Debian amavisd-new at enyo.dsw2k3.info Received: from enyo.dsw2k3.info ([127.0.0.1]) by localhost (enyo.dsw2k3.info [127.0.0.1]) (amavisd-new, port 10024) with LMTP id CEwkia-YfhMK; Tue, 10 May 2011 17:33:08 +0200 (CEST) Received: from citd.de (p5B05D88B.dip.t-dialin.net [91.5.216.139]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by enyo.dsw2k3.info (Postfix) with ESMTPSA id 70302A65C34; Tue, 10 May 2011 17:33:06 +0200 (CEST) Date: Tue, 10 May 2011 17:33:00 +0200 From: Matthias Schniedermeyer To: Dave Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: Files appear too big in `du` Subject: Re: Files appear too big in `du` Message-ID: <20110510153300.GA5764@citd.de> References: <20110510105700.GA20307@citd.de> <20110510131705.GE19446@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110510131705.GE19446@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) X-Barracuda-Connect: enyo.dsw2k3.info[195.71.86.239] X-Barracuda-Start-Time: 1305041598 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-ASG-Whitelist: HEADER (^X-Barracuda-Connect: [^ ]+\.dsw2k3\.info\[) X-Virus-Status: Clean On 10.05.2011 23:17, Dave Chinner wrote: > On Tue, May 10, 2011 at 12:57:00PM +0200, Matthias Schniedermeyer wrote: > > Hi > > > > > > Since a few weeks i'm experiencing an annoying 'thing' where files are > > often too big in `du` and directory totals are to high in `ls -l`. > > > > I appears that files, which are in the process of beeing > > copied/downloaded/whatever, grow in large chunks ahead of time, while > > the actual file-content is beeing copied into the files. > > It's supposed to work like this. It's called speculative allocation > beyond end of file. XFS has always done this, but we've recently > made it more aggressive to prevent excessive fragmentation on > concurrent large file workloads when there is lots of disk space > free. OK. > > And then it > > appears that the last chunk isn't shrunk after the process is finished. > > It should be truncated away when the file descriptor is closed and > the last reference goes away. > > > Neither xfs_bmap (Version 3.1.5) nor filefrag show anything beyond the > > extent that compromises the actual file-content. > > what is the output of xfs_bmap -vvp on a file that apparently hasn't > been shrunk? How do you know it hasn't been shrunk? Does it persist du > forever in this state, or does doing something like dropping caches > (echo 3 > /proc/sys/vm/drop_caches) cause the specualtive > preallocation to disappear? This works: sync ; echo 3 > /proc/sys/vm/drop_caches At least in several tries the `du` output shrunk to the size of the original. > > Any idea how to debug this, or is this a known bug and waiting a few > > days for 2.6.39 should fix this? > > It doesn't appear to be doing anything wrong from your description. > Remember that XFS is optimised for high end storage and server > configurations and workloads, not typical desktop usage... I would call it a regression. I reguarly follow copying/downloading with `du`, the speculative preallocation makes that more or less useless. Especially downloading someting big from the internet which @ 231kb/s isn't exactly fast and shows identical `du`s for increasingly longer periods of time. (Or "--apparent-size" should be made default, but that falls short with sparse-files) IMHO `du`/`ls -l` should not be able to 'see' the speculative preallocation. Bis denn -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. From aelder@sgi.com Tue May 10 13:48:47 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4AImlHp079416 for ; Tue, 10 May 2011 13:48:47 -0500 Received: from stout.americas.sgi.com (stout.americas.sgi.com [128.162.232.50]) by relay2.corp.sgi.com (Postfix) with ESMTP id 9071B304051; Tue, 10 May 2011 11:48:44 -0700 (PDT) Received: from stout.americas.sgi.com (localhost6.localdomain6 [127.0.0.1]) by stout.americas.sgi.com (8.14.4/8.14.2) with ESMTP id p4AImibB021858; Tue, 10 May 2011 13:48:44 -0500 Received: (from aelder@localhost) by stout.americas.sgi.com (8.14.4/8.14.4/Submit) id p4AImgJF021856; Tue, 10 May 2011 13:48:42 -0500 From: Alex Elder Message-Id: <201105101848.p4AImgJF021856@stout.americas.sgi.com> Date: Tue, 10 May 2011 13:48:41 -0500 To: torvalds@linux-foundation.org Subject: [GIT PULL] XFS update for 2.6.39 Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com, akpm@linux-foundation.org User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Linus, I'm very sorry I couldn't provide you these regression fixes before yesterday. They do address regressions that are pretty important so I hope you'll consider pulling them for inclusion in 2.6.39. The first one fixes a problem that can cause reclaim to stall on an XFS allocation group, which prevents inode cache shrinking from making progress and ultimately can cause the OOM killer to kick in. The other four address small but distinct problems that can lead to filesystem hangs when attempting to grant log space: - The only real change in the first of these is the "goto out_done" that causes a block of common code to be used for a case that was not clearing a flag as it should. - The second of these fixes an off-by-one condition check problem. - The third addresses a few problems that can arise due to unsafe assignment of 64-bit values on 32-bit architectures. - The last one is the most subtle. Previously a flag bit was getting cleared conditionally but not atomically. A second location used test_and_set_bit() to conditionally set the flag. The flag could get cleared in the one spot just after a successful test and set in another; the net result basically hung the affected filesystem. The fix modifies the non-atomic spot to clear the flag unconditionally, then re-set it using atomic test-and-set it if appropriate. Dave Chinner deserves a lot of credit for sorting through these--especially the intersecting causes of the common "filesystem hang" symptom that the last four address. Thanks a lot. -Alex The following changes since commit 693d92a1bbc9e42681c42ed190bd42b636ca876f: Linux 2.6.39-rc7 (2011-05-09 19:33:54 -0700) are available in the git repository at: git://oss.sgi.com/xfs/xfs for-linus Dave Chinner (5): xfs: ensure reclaim cursor is reset correctly at end of AG xfs: exit AIL push work correctly when AIL is empty xfs: always push the AIL to the target xfs: make AIL target updates and compares 32bit safe. xfs: fix race condition in AIL push trigger fs/xfs/linux-2.6/xfs_sync.c | 1 + fs/xfs/xfs_trans_ail.c | 47 +++++++++++++++++++++++------------------- 2 files changed, 27 insertions(+), 21 deletions(-) From aelder@sgi.com Tue May 10 16:41:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.4.0-r929098 Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4ALfDTu086492 for ; Tue, 10 May 2011 16:41:14 -0500 Received: from cas.corp.sgi.com (pv-excas2-dc21-nlb.corp.sgi.com [137.38.102.197]) by relay1.corp.sgi.com (Postfix) with ESMTP id 787178F8039; Tue, 10 May 2011 14:41:10 -0700 (PDT) Received: from [127.0.0.1] (198.149.20.12) by xmail.sgi.com (137.38.102.30) with Microsoft SMTP Server (TLS) id 14.1.289.1; Tue, 10 May 2011 16:41:10 -0500 Subject: Re: [PATCH] [xfsprogs]: Don't translate command name. From: Alex Elder Reply-To: To: Arkadiusz =?UTF-8?Q?Mi=C5=9Bkiewicz?= CC: In-Reply-To: <1302858438-22215-1-git-send-email-arekm@maven.pl> References: <1302858438-22215-1-git-send-email-arekm@maven.pl> Content-Type: text/plain; charset="UTF-8" Date: Tue, 10 May 2011 16:41:09 -0500 Message-ID: <1305063669.2962.20.camel@doink> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 8bit X-Originating-IP: [198.149.20.12] X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, 2011-04-15 at 11:07 +0200, Arkadiusz MiÅ›kiewicz wrote: > Command names should never be translated. Currently there is > 'xfs_quota -x -c "project"...' in one locale (C) while > 'xfs_quota -x -c "projekt"...' in another (pl_PL). > > Signed-off-by: Arkadiusz MiÅ›kiewicz Looks good. I will commit this for you. Thanks. Reviewed-by: Alex Elder From lists@nabble.com Wed May 11 00:22:02 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.1 required=5.0 tests=BAYES_00,FORGED_YAHOO_RCVD, FREEMAIL_FROM,T_TO_NO_BRKTS_FREEMAIL autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4B5M298106789 for ; Wed, 11 May 2011 00:22:02 -0500 X-ASG-Debug-ID: 1305091321-10a703280000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from sam.nabble.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7B00F444233 for ; Tue, 10 May 2011 22:22:01 -0700 (PDT) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by cuda.sgi.com with ESMTP id 00CLFeZfbCxCXYdw for ; Tue, 10 May 2011 22:22:01 -0700 (PDT) Received: from isper.nabble.com ([192.168.236.156]) by sam.nabble.com with esmtp (Exim 4.69) (envelope-from ) id 1QK1s0-00068d-Pe for linux-xfs@oss.sgi.com; Tue, 10 May 2011 22:22:00 -0700 Message-ID: <31591438.post@talk.nabble.com> Date: Tue, 10 May 2011 22:22:00 -0700 (PDT) From: stress_buster To: linux-xfs@oss.sgi.com X-ASG-Orig-Subj: corrupt inode Subject: corrupt inode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: leo1783@yahoo.com X-Barracuda-Connect: sam.nabble.com[216.139.236.26] X-Barracuda-Start-Time: 1305091321 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -0.11 X-Barracuda-Spam-Status: No, SCORE=-0.11 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M, FORGED_YAHOO_RCVD, FORGED_YAHOO_RCVD_2 X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63393 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 FORGED_YAHOO_RCVD 'From' yahoo.com does not match 'Received' headers 1.41 FORGED_YAHOO_RCVD_2 'From' yahoo.com does not match 'Received' headers 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean I see the following traces in dmesg: Filesystem "cciss/c0d0": corrupt inode 3758098465 (bad size 460626 for loca= l inode). Unmount and run xfs_repair. 00000000: 49 4e 41 c9 01 01 00 03 00 00 00 00 00 00 00 00 INA=C3=89.......= ..... Filesystem "cciss/c0d0": XFS internal error xfs_iformat(5) at line 419 of file fs/xfs/xfs_inode.c. Caller 0xffffffff881d4026 Call Trace: [] :xfs:xfs_iformat+0xe6/0x3d7 [] :xfs:xfs_iread+0xe4/0x1ea [] :xfs:xfs_iread+0xe4/0x1ea [] :xfs:xfs_iget_core+0x2f2/0x563 [] alloc_inode+0xeb/0x192 [] :xfs:xfs_iget+0xd2/0x17a [] :xfs:xfs_lookup+0x76/0xa8 [] dput+0x2c/0x114 [] :xfs:xfs_vn_lookup+0x3d/0x7b [] do_lookup+0xe5/0x1e6 [] __link_path_walk+0xa01/0xf42 [] link_path_walk+0x42/0xb2 [] do_path_lookup+0x275/0x2f1 [] getname+0x15b/0x1c2 [] __user_walk_fd+0x37/0x4c [] vfs_stat_fd+0x1b/0x4a [] sys_newstat+0x19/0x31 [] system_call+0x7e/0x83 Filesystem "cciss/c0d0": XFS internal error xfs_iformat(7) at line 440 of file fs/xfs/xfs_inode.c. Caller 0xffffffff881d4026 Call Trace: [] :xfs:xfs_iformat+0x2ef/0x3d7 [] :xfs:xfs_iread+0xe4/0x1ea [] :xfs:xfs_iget_core+0x2f2/0x563 [] alloc_inode+0xeb/0x192 [] :xfs:xfs_iget+0xd2/0x17a [] :xfs:xfs_lookup+0x76/0xa8 [] dput+0x2c/0x114 [] :xfs:xfs_vn_lookup+0x3d/0x7b [] do_lookup+0xe5/0x1e6 [] __link_path_walk+0xa01/0xf42 [] link_path_walk+0x42/0xb2 [] do_path_lookup+0x275/0x2f1 [] getname+0x15b/0x1c2 [] __user_walk_fd+0x37/0x4c [] vfs_stat_fd+0x1b/0x4a [] sys_newstat+0x19/0x31 [] system_call+0x7e/0x83 I'm losing connectivity to iscsi targets randomly . This maybe unrelated to the fs errors though. Any thoughts on what the above backtraces are pointing to? Thanks in advance --=20 View this message in context: http://old.nabble.com/corrupt-inode-tp3159143= 8p31591438.html Sent from the linux-xfs mailing list archive at Nabble.com. From branto@redhat.com Wed May 11 09:00:03 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_43, J_CHICKENPOX_92 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BE03j6125495 for ; Wed, 11 May 2011 09:00:03 -0500 X-ASG-Debug-ID: 1305122402-089e02e80000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id DE50F1E278E1 for ; Wed, 11 May 2011 07:00:02 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id C9snFXgX1TLsRmo8 for ; Wed, 11 May 2011 07:00:02 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4BE01ex005576 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 11 May 2011 10:00:01 -0400 Received: from [10.34.26.208] (dhcp-26-208.brq.redhat.com [10.34.26.208]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4BE00Gl019954 for ; Wed, 11 May 2011 10:00:00 -0400 X-ASG-Orig-Subj: xfstests: print the message that fallocate is not supported to stdout Subject: xfstests: print the message that fallocate is not supported to stdout From: Boris Ranto To: xfs Content-Type: text/plain; charset="UTF-8" Date: Wed, 11 May 2011 15:59:59 +0200 Message-ID: <1305122399.22267.28.camel@dhcp-31-190.brq.redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1305122402 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean ltp/fsx.c tests whether the filesystem it is run on supports fallocate. If it is not supported the fsx will print warning to stderr. This leads to fails of tests 075, 112, 127 for the filesystems that do not support fallocate. The tests use ltp/fsx but do not filter out stderr. Since ltp/fsx.c can work without fallocate support I propose to move this message to stdout. This simple patch fixes the issue for me: Signed-off-by: Boris Ranto diff --git a/ltp/fsx.c b/ltp/fsx.c index fe072d3..d45e8dd 100644 --- a/ltp/fsx.c +++ b/ltp/fsx.c @@ -1424,7 +1424,7 @@ main(int argc, char **argv) #ifdef FALLOCATE if (!lite && fallocate_calls) { if (fallocate(fd, 0, 0, 1) && errno == EOPNOTSUPP) { - warn("main: filesystem does not support fallocate, disabling"); + prt("fsx: main: filesystem does not support fallocate, disabling \n"); fallocate_calls = 0; } else ftruncate(fd, 0); From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:13 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7DM8128054 for ; Wed, 11 May 2011 10:07:13 -0500 X-ASG-Debug-ID: 1305126433-1e5303540000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 5C6FD445774 for ; Wed, 11 May 2011 08:07:13 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id fZZaLGWhDOUo6IR3 for ; Wed, 11 May 2011 08:07:13 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0K-0005h3-SC for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:12 +0000 Message-Id: <20110511150712.830693893@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:10 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 8/9] xfs: fix up asserts in xfs_iflush_fork Subject: [PATCH 8/9] xfs: fix up asserts in xfs_iflush_fork References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-fix-ep-access-5 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126433 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Remove asserts in xfs_iflush_fork that would call xfs_iext_get_ext with a potentially invalid extent buffer index. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_inode.c =================================================================== --- xfs.orig/fs/xfs/xfs_inode.c 2011-05-11 10:18:39.555233397 +0200 +++ xfs/fs/xfs/xfs_inode.c 2011-05-11 12:04:24.099733330 +0200 @@ -2557,12 +2557,9 @@ xfs_iflush_fork( case XFS_DINODE_FMT_EXTENTS: ASSERT((ifp->if_flags & XFS_IFEXTENTS) || !(iip->ili_format.ilf_fields & extflag[whichfork])); - ASSERT((xfs_iext_get_ext(ifp, 0) != NULL) || - (ifp->if_bytes == 0)); - ASSERT((xfs_iext_get_ext(ifp, 0) == NULL) || - (ifp->if_bytes > 0)); if ((iip->ili_format.ilf_fields & extflag[whichfork]) && (ifp->if_bytes > 0)) { + ASSERT(xfs_iext_get_ext(ifp, 0)); ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) > 0); (void)xfs_iextents_copy(ip, (xfs_bmbt_rec_t *)cp, whichfork); From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_63 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7DlQ128053 for ; Wed, 11 May 2011 10:07:14 -0500 X-ASG-Debug-ID: 1305126432-0c4b03b60000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2E5091EABF96 for ; Wed, 11 May 2011 08:07:13 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id q5UCxsZGGC2Ie7PR for ; Wed, 11 May 2011 08:07:13 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0K-0005gV-M9 for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:12 +0000 Message-Id: <20110511150712.651478046@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:09 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 7/9] xfs: do not do pointer arithmetics on extent records Subject: [PATCH 7/9] xfs: do not do pointer arithmetics on extent records References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-fix-ep-access-3 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126433 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean We need to call xfs_iext_get_ext for the previous extent to get a valid pointer, and can't just do pointer arithmetics as they might be in different pages. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_bmap.c =================================================================== --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-11 10:16:58.847733078 +0200 +++ xfs/fs/xfs/xfs_bmap.c 2011-05-11 10:17:04.803235692 +0200 @@ -5145,9 +5145,12 @@ xfs_bunmapi( */ ASSERT(bno >= del.br_blockcount); bno -= del.br_blockcount; - if (bno < got.br_startoff) { - if (--lastx >= 0) - xfs_bmbt_get_all(--ep, &got); + if (got.br_startoff > bno) { + if (--lastx >= 0) { + ep = xfs_iext_get_ext(ifp, + lastx); + xfs_bmbt_get_all(ep, &got); + } } continue; } else if (del.br_state == XFS_EXT_UNWRITTEN) { From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7Ch9128043 for ; Wed, 11 May 2011 10:07:14 -0500 X-ASG-Debug-ID: 1305126432-1fa402e40000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 69116445772 for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id ibAON6r7xsxSx60z for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0J-0005eP-R3 for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:11 +0000 Message-Id: <20110511150711.786279651@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:05 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 3/9] xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent Subject: [PATCH 3/9] xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-fix-ep-access X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126432 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean The code in xfs_bmap_del_extent does not correctly decrement the extent buffer index when deleting a whole extent. Most of the time this gets caught by checks in xfs_bmapi that work around it and decrement it manually and thus wasn't noticed so far. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_bmap.c =================================================================== --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-10 17:11:21.212901236 +0200 +++ xfs/fs/xfs/xfs_bmap.c 2011-05-10 17:13:36.177399627 +0200 @@ -2916,8 +2916,10 @@ xfs_bmap_del_extent( */ xfs_iext_remove(ip, *idx, 1, whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0); + --*idx; if (delay) break; + XFS_IFORK_NEXT_SET(ip, whichfork, XFS_IFORK_NEXTENTS(ip, whichfork) - 1); flags |= XFS_ILOG_CORE; From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_63 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7Esf128069 for ; Wed, 11 May 2011 10:07:14 -0500 X-ASG-Debug-ID: 1305126433-2796025c0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B16B214CA3BA for ; Wed, 11 May 2011 08:07:13 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id J0cHR3hMFwsXjET5 for ; Wed, 11 May 2011 08:07:13 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0L-0005hc-2k for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:13 +0000 Message-Id: <20110511150713.039506186@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:11 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 9/9] xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec Subject: [PATCH 9/9] xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-iext-asserts-1 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126433 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_inode.c =================================================================== --- xfs.orig/fs/xfs/xfs_inode.c 2011-05-11 12:05:12.943735034 +0200 +++ xfs/fs/xfs/xfs_inode.c 2011-05-11 12:05:28.327733646 +0200 @@ -3108,6 +3108,8 @@ xfs_iext_get_ext( xfs_extnum_t idx) /* index of target extent */ { ASSERT(idx >= 0); + ASSERT(idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t)); + if ((ifp->if_flags & XFS_IFEXTIREC) && (idx == 0)) { return ifp->if_u1.if_ext_irec->er_extbuf; } else if (ifp->if_flags & XFS_IFEXTIREC) { @@ -3881,8 +3883,10 @@ xfs_iext_idx_to_irec( xfs_extnum_t page_idx = *idxp; /* extent index in target list */ ASSERT(ifp->if_flags & XFS_IFEXTIREC); - ASSERT(page_idx >= 0 && page_idx <= - ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)); + ASSERT(page_idx >= 0); + ASSERT(page_idx <= ifp->if_bytes / sizeof(xfs_bmbt_rec_t)); + ASSERT(page_idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t) || realloc); + nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ; erp_idx = 0; low = 0; From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7Dke128050 for ; Wed, 11 May 2011 10:07:13 -0500 X-ASG-Debug-ID: 1305126432-189503d70000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 24C0A14CA3BA for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id DOSkj1cA6b4dr636 for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0K-0005fz-Eb for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:12 +0000 Message-Id: <20110511150712.421348825@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:08 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 6/9] xfs: do not use unchecked extent indices in xfs_bunmapi Subject: [PATCH 6/9] xfs: do not use unchecked extent indices in xfs_bunmapi References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-fix-ep-access-4 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126433 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Make sure to only call xfs_iext_get_ext after we've validate the extent index when moving on to the next index in xfs_bunmapi. Also remove the old workaround for too large indices that has been superceeded by the proper fix in xfs_bmap_del_extent. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_bmap.c =================================================================== --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-11 10:17:04.803235692 +0200 +++ xfs/fs/xfs/xfs_bmap.c 2011-05-11 10:17:06.432734169 +0200 @@ -5247,17 +5247,17 @@ xfs_bunmapi( nodelete: /* * If not done go on to the next (previous) record. - * Reset ep in case the extents array was re-alloced. */ - ep = xfs_iext_get_ext(ifp, lastx); if (bno != (xfs_fileoff_t)-1 && bno >= start) { - if (lastx >= XFS_IFORK_NEXTENTS(ip, whichfork) || - xfs_bmbt_get_startoff(ep) > bno) { - if (--lastx >= 0) - ep = xfs_iext_get_ext(ifp, lastx); - } - if (lastx >= 0) + if (lastx >= 0) { + ep = xfs_iext_get_ext(ifp, lastx); + if (xfs_bmbt_get_startoff(ep) > bno) { + if (--lastx >= 0) + ep = xfs_iext_get_ext(ifp, + lastx); + } xfs_bmbt_get_all(ep, &got); + } extno++; } } From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7D3b128058 for ; Wed, 11 May 2011 10:07:13 -0500 X-ASG-Debug-ID: 1305126433-0c47038c0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 5578C1EABFBC for ; Wed, 11 May 2011 08:07:13 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id busvqgcoqf070rUc for ; Wed, 11 May 2011 08:07:13 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0J-0005ci-7W for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:11 +0000 Message-Id: <20110511150402.258164661@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:02 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 0/9] extent buffer indexing fixes Subject: [PATCH 0/9] extent buffer indexing fixes X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126433 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean I recently ran into some extent buffer indexing issue which turned to be my fault in code I was working on. But while looking into these I found an old patch from Lachlan McIlroy that tried to fix various issue in that area. I went through them slowly to understand what's going on and ended up with this series. The first patch is not actually related but touches the area and was in my queue so I've decided to include it. The second patch removes the if_lastex field in struct xfs_ifork as it's not actually needed and just makes the code using it confusing. The following patches fixes various places that feed too large indices into xfs_iext_get_ext or xfs_iext_idx_to_irec, and the last patch finally adds asserts into these to catch the incorrect accesses. From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:16 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7Fdr128125 for ; Wed, 11 May 2011 10:07:16 -0500 X-ASG-Debug-ID: 1305126435-1ab203900000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 85868445778 for ; Wed, 11 May 2011 08:07:15 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id m58tvoebVUYfZAE2 for ; Wed, 11 May 2011 08:07:15 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0K-0005ew-1c for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:12 +0000 Message-Id: <20110511150711.989383617@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:06 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 4/9] xfs: do not use unchecked extent indices in xfs_bmap_add_extent_* Subject: [PATCH 4/9] xfs: do not use unchecked extent indices in xfs_bmap_add_extent_* References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-fix-ep-access-1 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126435 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Make sure to only call xfs_iext_get_ext after we've validate the extent index in the various xfs_bmap_add_extent_* helpers. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_bmap.c =================================================================== --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-10 13:57:12.297088697 +0200 +++ xfs/fs/xfs/xfs_bmap.c 2011-05-10 14:00:16.405087271 +0200 @@ -1629,7 +1629,6 @@ xfs_bmap_add_extent_hole_delay( xfs_bmbt_irec_t *new, /* new data to add to file extents */ int *logflagsp) /* inode logging flags */ { - xfs_bmbt_rec_host_t *ep; /* extent record for idx */ xfs_ifork_t *ifp; /* inode fork pointer */ xfs_bmbt_irec_t left; /* left neighbor extent entry */ xfs_filblks_t newlen=0; /* new indirect size */ @@ -1639,7 +1638,6 @@ xfs_bmap_add_extent_hole_delay( xfs_filblks_t temp=0; /* temp for indirect calculations */ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); - ep = xfs_iext_get_ext(ifp, *idx); state = 0; ASSERT(isnullstartblock(new->br_startblock)); @@ -1660,7 +1658,7 @@ xfs_bmap_add_extent_hole_delay( */ if (*idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) { state |= BMAP_RIGHT_VALID; - xfs_bmbt_get_all(ep, &right); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx), &right); if (isnullstartblock(right.br_startblock)) state |= BMAP_RIGHT_DELAY; @@ -1740,7 +1738,8 @@ xfs_bmap_add_extent_hole_delay( oldlen = startblockval(new->br_startblock) + startblockval(right.br_startblock); newlen = xfs_bmap_worst_indlen(ip, temp); - xfs_bmbt_set_allf(ep, new->br_startoff, + xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, *idx), + new->br_startoff, nullstartblock((int)newlen), temp, right.br_state); trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); break; @@ -1780,7 +1779,6 @@ xfs_bmap_add_extent_hole_real( int *logflagsp, /* inode logging flags */ int whichfork) /* data or attr fork */ { - xfs_bmbt_rec_host_t *ep; /* pointer to extent entry ins. point */ int error; /* error return value */ int i; /* temp state */ xfs_ifork_t *ifp; /* inode fork pointer */ @@ -1791,7 +1789,6 @@ xfs_bmap_add_extent_hole_real( ifp = XFS_IFORK_PTR(ip, whichfork); ASSERT(*idx <= ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)); - ep = xfs_iext_get_ext(ifp, *idx); state = 0; if (whichfork == XFS_ATTR_FORK) @@ -1813,7 +1810,7 @@ xfs_bmap_add_extent_hole_real( */ if (*idx < ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) { state |= BMAP_RIGHT_VALID; - xfs_bmbt_get_all(ep, &right); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx), &right); if (isnullstartblock(right.br_startblock)) state |= BMAP_RIGHT_DELAY; } @@ -1925,7 +1922,8 @@ xfs_bmap_add_extent_hole_real( * Merge the new allocation with the right neighbor. */ trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); - xfs_bmbt_set_allf(ep, new->br_startoff, new->br_startblock, + xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, *idx), + new->br_startoff, new->br_startblock, new->br_blockcount + right.br_blockcount, right.br_state); trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,J_CHICKENPOX_63, J_CHICKENPOX_65,J_CHICKENPOX_66,LOCAL_GNU_PATCH autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7DAM128047 for ; Wed, 11 May 2011 10:07:14 -0500 X-ASG-Debug-ID: 1305126432-27a702470000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B60A814CA3BA for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id r06KxEBMesJUqAdd for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0J-0005dI-DI for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:11 +0000 Message-Id: <20110511150711.367977044@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:03 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 1/9] xfs: remove the unused XFS_BMAPI_RSVBLOCKS flag Subject: [PATCH 1/9] xfs: remove the unused XFS_BMAPI_RSVBLOCKS flag References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-kill-XFS_BMAPI_RSVBLOCKS X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126432 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean The XFS_BMAPI_RSVBLOCKS is unused, and as far as I can see has always been. Remove it to simplify the bmapi implementation and conserve stack space. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_bmap.c =================================================================== --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-10 12:05:00.662952211 +0200 +++ xfs/fs/xfs/xfs_bmap.c 2011-05-10 12:05:19.866950315 +0200 @@ -101,8 +101,7 @@ xfs_bmap_add_extent( xfs_fsblock_t *first, /* pointer to firstblock variable */ xfs_bmap_free_t *flist, /* list of extents to be freed */ int *logflagsp, /* inode logging flags */ - int whichfork, /* data or attr fork */ - int rsvd); /* OK to allocate reserved blocks */ + int whichfork); /* data or attr fork */ /* * Called by xfs_bmap_add_extent to handle cases converting a delayed @@ -117,8 +116,7 @@ xfs_bmap_add_extent_delay_real( xfs_filblks_t *dnew, /* new delayed-alloc indirect blocks */ xfs_fsblock_t *first, /* pointer to firstblock variable */ xfs_bmap_free_t *flist, /* list of extents to be freed */ - int *logflagsp, /* inode logging flags */ - int rsvd); /* OK to allocate reserved blocks */ + int *logflagsp); /* inode logging flags */ /* * Called by xfs_bmap_add_extent to handle cases converting a hole @@ -129,8 +127,7 @@ xfs_bmap_add_extent_hole_delay( xfs_inode_t *ip, /* incore inode pointer */ xfs_extnum_t idx, /* extent number to update/insert */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ - int *logflagsp,/* inode logging flags */ - int rsvd); /* OK to allocate reserved blocks */ + int *logflagsp); /* inode logging flags */ /* * Called by xfs_bmap_add_extent to handle cases converting a hole @@ -180,22 +177,6 @@ xfs_bmap_btree_to_extents( int whichfork); /* data or attr fork */ /* - * Called by xfs_bmapi to update file extent records and the btree - * after removing space (or undoing a delayed allocation). - */ -STATIC int /* error */ -xfs_bmap_del_extent( - xfs_inode_t *ip, /* incore inode pointer */ - xfs_trans_t *tp, /* current trans pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ - xfs_bmap_free_t *flist, /* list of extents to be freed */ - xfs_btree_cur_t *cur, /* if null, not a btree */ - xfs_bmbt_irec_t *new, /* new data to add to file extents */ - int *logflagsp,/* inode logging flags */ - int whichfork, /* data or attr fork */ - int rsvd); /* OK to allocate reserved blocks */ - -/* * Remove the entry "free" from the free item list. Prev points to the * previous entry, unless "free" is the head of the list. */ @@ -480,8 +461,7 @@ xfs_bmap_add_extent( xfs_fsblock_t *first, /* pointer to firstblock variable */ xfs_bmap_free_t *flist, /* list of extents to be freed */ int *logflagsp, /* inode logging flags */ - int whichfork, /* data or attr fork */ - int rsvd) /* OK to use reserved data blocks */ + int whichfork) /* data or attr fork */ { xfs_btree_cur_t *cur; /* btree cursor or null */ xfs_filblks_t da_new; /* new count del alloc blocks used */ @@ -522,8 +502,8 @@ xfs_bmap_add_extent( if (cur) ASSERT((cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL) == 0); - if ((error = xfs_bmap_add_extent_hole_delay(ip, idx, new, - &logflags, rsvd))) + error = xfs_bmap_add_extent_hole_delay(ip, idx, new, &logflags); + if (error) goto done; } /* @@ -557,9 +537,10 @@ xfs_bmap_add_extent( if (cur) ASSERT(cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL); - if ((error = xfs_bmap_add_extent_delay_real(ip, - idx, &cur, new, &da_new, first, flist, - &logflags, rsvd))) + error = xfs_bmap_add_extent_delay_real(ip, idx, + &cur, new, &da_new, first, + flist, &logflags); + if (error) goto done; } else if (new->br_state == XFS_EXT_NORM) { ASSERT(new->br_state == XFS_EXT_NORM); @@ -615,7 +596,7 @@ xfs_bmap_add_extent( ASSERT(nblks <= da_old); if (nblks < da_old) xfs_icsb_modify_counters(ip->i_mount, XFS_SBS_FDBLOCKS, - (int64_t)(da_old - nblks), rsvd); + (int64_t)(da_old - nblks), 0); } /* * Clear out the allocated field, done with it now in any case. @@ -646,8 +627,7 @@ xfs_bmap_add_extent_delay_real( xfs_filblks_t *dnew, /* new delayed-alloc indirect blocks */ xfs_fsblock_t *first, /* pointer to firstblock variable */ xfs_bmap_free_t *flist, /* list of extents to be freed */ - int *logflagsp, /* inode logging flags */ - int rsvd) /* OK to use reserved data block allocation */ + int *logflagsp) /* inode logging flags */ { xfs_btree_cur_t *cur; /* btree cursor */ int diff; /* temp value */ @@ -1097,7 +1077,7 @@ xfs_bmap_add_extent_delay_real( (cur ? cur->bc_private.b.allocated : 0)); if (diff > 0 && xfs_icsb_modify_counters(ip->i_mount, XFS_SBS_FDBLOCKS, - -((int64_t)diff), rsvd)) { + -((int64_t)diff), 0)) { /* * Ick gross gag me with a spoon. */ @@ -1109,7 +1089,7 @@ xfs_bmap_add_extent_delay_real( if (!diff || !xfs_icsb_modify_counters(ip->i_mount, XFS_SBS_FDBLOCKS, - -((int64_t)diff), rsvd)) + -((int64_t)diff), 0)) break; } if (temp2) { @@ -1118,7 +1098,7 @@ xfs_bmap_add_extent_delay_real( if (!diff || !xfs_icsb_modify_counters(ip->i_mount, XFS_SBS_FDBLOCKS, - -((int64_t)diff), rsvd)) + -((int64_t)diff), 0)) break; } } @@ -1652,8 +1632,7 @@ xfs_bmap_add_extent_hole_delay( xfs_inode_t *ip, /* incore inode pointer */ xfs_extnum_t idx, /* extent number to update/insert */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ - int *logflagsp, /* inode logging flags */ - int rsvd) /* OK to allocate reserved blocks */ + int *logflagsp) /* inode logging flags */ { xfs_bmbt_rec_host_t *ep; /* extent record for idx */ xfs_ifork_t *ifp; /* inode fork pointer */ @@ -1787,7 +1766,7 @@ xfs_bmap_add_extent_hole_delay( if (oldlen != newlen) { ASSERT(oldlen > newlen); xfs_icsb_modify_counters(ip->i_mount, XFS_SBS_FDBLOCKS, - (int64_t)(oldlen - newlen), rsvd); + (int64_t)(oldlen - newlen), 0); /* * Nothing to do for disk quota accounting here. */ @@ -2838,8 +2817,7 @@ xfs_bmap_del_extent( xfs_btree_cur_t *cur, /* if null, not a btree */ xfs_bmbt_irec_t *del, /* data to remove from extents */ int *logflagsp, /* inode logging flags */ - int whichfork, /* data or attr fork */ - int rsvd) /* OK to allocate reserved blocks */ + int whichfork) /* data or attr fork */ { xfs_filblks_t da_new; /* new delay-alloc indirect blocks */ xfs_filblks_t da_old; /* old delay-alloc indirect blocks */ @@ -3142,7 +3120,7 @@ xfs_bmap_del_extent( ASSERT(da_old >= da_new); if (da_old > da_new) { xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, - (int64_t)(da_old - da_new), rsvd); + (int64_t)(da_old - da_new), 0); } done: *logflagsp = flags; @@ -4562,29 +4540,24 @@ xfs_bmapi( if (rt) { error = xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS, - -((int64_t)extsz), (flags & - XFS_BMAPI_RSVBLOCKS)); + -((int64_t)extsz), 0); } else { error = xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, - -((int64_t)alen), (flags & - XFS_BMAPI_RSVBLOCKS)); + -((int64_t)alen), 0); } if (!error) { error = xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, - -((int64_t)indlen), (flags & - XFS_BMAPI_RSVBLOCKS)); + -((int64_t)indlen), 0); if (error && rt) xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS, - (int64_t)extsz, (flags & - XFS_BMAPI_RSVBLOCKS)); + (int64_t)extsz, 0); else if (error) xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, - (int64_t)alen, (flags & - XFS_BMAPI_RSVBLOCKS)); + (int64_t)alen, 0); } if (error) { @@ -4703,7 +4676,7 @@ xfs_bmapi( } error = xfs_bmap_add_extent(ip, lastx, &cur, &got, firstblock, flist, &tmp_logflags, - whichfork, (flags & XFS_BMAPI_RSVBLOCKS)); + whichfork); logflags |= tmp_logflags; if (error) goto error0; @@ -4805,7 +4778,7 @@ xfs_bmapi( : XFS_EXT_UNWRITTEN; error = xfs_bmap_add_extent(ip, lastx, &cur, mval, firstblock, flist, &tmp_logflags, - whichfork, (flags & XFS_BMAPI_RSVBLOCKS)); + whichfork); logflags |= tmp_logflags; if (error) goto error0; @@ -5026,7 +4999,6 @@ xfs_bunmapi( int tmp_logflags; /* partial logging flags */ int wasdel; /* was a delayed alloc extent */ int whichfork; /* data or attribute fork */ - int rsvd; /* OK to allocate reserved blocks */ xfs_fsblock_t sum; trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_); @@ -5044,7 +5016,7 @@ xfs_bunmapi( mp = ip->i_mount; if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); - rsvd = (flags & XFS_BMAPI_RSVBLOCKS) != 0; + ASSERT(len > 0); ASSERT(nexts >= 0); ASSERT(ifp->if_ext_max == @@ -5162,7 +5134,7 @@ xfs_bunmapi( del.br_state = XFS_EXT_UNWRITTEN; error = xfs_bmap_add_extent(ip, lastx, &cur, &del, firstblock, flist, &logflags, - XFS_DATA_FORK, 0); + XFS_DATA_FORK); if (error) goto error0; goto nodelete; @@ -5216,7 +5188,7 @@ xfs_bunmapi( prev.br_state = XFS_EXT_UNWRITTEN; error = xfs_bmap_add_extent(ip, lastx - 1, &cur, &prev, firstblock, flist, &logflags, - XFS_DATA_FORK, 0); + XFS_DATA_FORK); if (error) goto error0; goto nodelete; @@ -5225,7 +5197,7 @@ xfs_bunmapi( del.br_state = XFS_EXT_UNWRITTEN; error = xfs_bmap_add_extent(ip, lastx, &cur, &del, firstblock, flist, &logflags, - XFS_DATA_FORK, 0); + XFS_DATA_FORK); if (error) goto error0; goto nodelete; @@ -5240,13 +5212,13 @@ xfs_bunmapi( rtexts = XFS_FSB_TO_B(mp, del.br_blockcount); do_div(rtexts, mp->m_sb.sb_rextsize); xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS, - (int64_t)rtexts, rsvd); + (int64_t)rtexts, 0); (void)xfs_trans_reserve_quota_nblks(NULL, ip, -((long)del.br_blockcount), 0, XFS_QMOPT_RES_RTBLKS); } else { xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, - (int64_t)del.br_blockcount, rsvd); + (int64_t)del.br_blockcount, 0); (void)xfs_trans_reserve_quota_nblks(NULL, ip, -((long)del.br_blockcount), 0, XFS_QMOPT_RES_REGBLKS); @@ -5278,7 +5250,7 @@ xfs_bunmapi( goto error0; } error = xfs_bmap_del_extent(ip, tp, lastx, flist, cur, &del, - &tmp_logflags, whichfork, rsvd); + &tmp_logflags, whichfork); logflags |= tmp_logflags; if (error) goto error0; Index: xfs/fs/xfs/xfs_bmap.h =================================================================== --- xfs.orig/fs/xfs/xfs_bmap.h 2011-04-24 20:52:34.171232490 +0200 +++ xfs/fs/xfs/xfs_bmap.h 2011-05-10 12:05:19.870955293 +0200 @@ -69,7 +69,6 @@ typedef struct xfs_bmap_free #define XFS_BMAPI_ENTIRE 0x004 /* return entire extent, not trimmed */ #define XFS_BMAPI_METADATA 0x008 /* mapping metadata not user data */ #define XFS_BMAPI_ATTRFORK 0x010 /* use attribute fork not data */ -#define XFS_BMAPI_RSVBLOCKS 0x020 /* OK to alloc. reserved data blocks */ #define XFS_BMAPI_PREALLOC 0x040 /* preallocation op: unwritten space */ #define XFS_BMAPI_IGSTATE 0x080 /* Ignore state - */ /* combine contig. space */ @@ -87,7 +86,6 @@ typedef struct xfs_bmap_free { XFS_BMAPI_ENTIRE, "ENTIRE" }, \ { XFS_BMAPI_METADATA, "METADATA" }, \ { XFS_BMAPI_ATTRFORK, "ATTRFORK" }, \ - { XFS_BMAPI_RSVBLOCKS, "RSVBLOCKS" }, \ { XFS_BMAPI_PREALLOC, "PREALLOC" }, \ { XFS_BMAPI_IGSTATE, "IGSTATE" }, \ { XFS_BMAPI_CONTIG, "CONTIG" }, \ From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7D6i128048 for ; Wed, 11 May 2011 10:07:13 -0500 X-ASG-Debug-ID: 1305126432-186603cc0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E02A714CA448 for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id 2ThvtI3tQwTqOzj2 for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0K-0005fT-8g for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:12 +0000 Message-Id: <20110511150712.222802741@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:07 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 5/9] xfs: do not use unchecked extent indices in xfs_bmapi Subject: [PATCH 5/9] xfs: do not use unchecked extent indices in xfs_bmapi References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-fix-ep-access-2 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126432 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Make sure to only call xfs_iext_get_ext after we've validate the extent index when moving on to the next index in xfs_bmapi. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_bmap.c =================================================================== --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-11 10:16:58.831733512 +0200 +++ xfs/fs/xfs/xfs_bmap.c 2011-05-11 10:16:58.847733078 +0200 @@ -4827,12 +4827,13 @@ xfs_bmapi( /* * Else go on to the next record. */ - ep = xfs_iext_get_ext(ifp, ++lastx); prev = got; - if (lastx >= nextents) - eof = 1; - else + if (++lastx < nextents) { + ep = xfs_iext_get_ext(ifp, lastx); xfs_bmbt_get_all(ep, &got); + } else { + eof = 1; + } } *nmap = n; /* From BATV+396e5a4f0004ff61be6e+2817+infradead.org+hch@bombadil.srs.infradead.org Wed May 11 10:07:14 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_62, J_CHICKENPOX_63,LOCAL_GNU_PATCH autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7Cee128045 for ; Wed, 11 May 2011 10:07:14 -0500 X-ASG-Debug-ID: 1305126431-1e5503520000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7E9DC445774 for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id vmtLJbjCp3jUU6XH for ; Wed, 11 May 2011 08:07:12 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.72 #1 (Red Hat Linux)) id 1QKB0J-0005dt-JY for xfs@oss.sgi.com; Wed, 11 May 2011 15:07:11 +0000 Message-Id: <20110511150711.549194744@bombadil.infradead.org> User-Agent: quilt/0.48-1 Date: Wed, 11 May 2011 11:04:04 -0400 From: Christoph Hellwig To: xfs@oss.sgi.com X-ASG-Orig-Subj: [PATCH 2/9] xfs: remove if_lastex Subject: [PATCH 2/9] xfs: remove if_lastex References: <20110511150402.258164661@bombadil.infradead.org> Content-Disposition: inline; filename=xfs-kill-if_lastex X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305126432 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean The if_lastex field in struct xfs_ifork is only used as a temporary index during xfs_bmapi and xfs_bmapi. Instead of using the inode fork to store it keep it local in the callchain. Fortunately this is very easy as we already pass a stack copy of it down the whole chain which can simplify be changed to be passed by reference. Signed-off-by: Christoph Hellwig Index: xfs/fs/xfs/xfs_bmap.c =================================================================== --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-10 13:20:21.182349200 +0200 +++ xfs/fs/xfs/xfs_bmap.c 2011-05-10 13:29:40.618349159 +0200 @@ -89,28 +89,13 @@ xfs_bmap_add_attrfork_local( int *flags); /* inode logging flags */ /* - * Called by xfs_bmapi to update file extent records and the btree - * after allocating space (or doing a delayed allocation). - */ -STATIC int /* error */ -xfs_bmap_add_extent( - xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ - xfs_btree_cur_t **curp, /* if *curp is null, not a btree */ - xfs_bmbt_irec_t *new, /* new data to add to file extents */ - xfs_fsblock_t *first, /* pointer to firstblock variable */ - xfs_bmap_free_t *flist, /* list of extents to be freed */ - int *logflagsp, /* inode logging flags */ - int whichfork); /* data or attr fork */ - -/* * Called by xfs_bmap_add_extent to handle cases converting a delayed * allocation to a real allocation. */ STATIC int /* error */ xfs_bmap_add_extent_delay_real( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_btree_cur_t **curp, /* if *curp is null, not a btree */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ xfs_filblks_t *dnew, /* new delayed-alloc indirect blocks */ @@ -125,7 +110,7 @@ xfs_bmap_add_extent_delay_real( STATIC int /* error */ xfs_bmap_add_extent_hole_delay( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ int *logflagsp); /* inode logging flags */ @@ -136,7 +121,7 @@ xfs_bmap_add_extent_hole_delay( STATIC int /* error */ xfs_bmap_add_extent_hole_real( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_btree_cur_t *cur, /* if null, not a btree */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ int *logflagsp, /* inode logging flags */ @@ -149,7 +134,7 @@ xfs_bmap_add_extent_hole_real( STATIC int /* error */ xfs_bmap_add_extent_unwritten_real( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_btree_cur_t **curp, /* if *curp is null, not a btree */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ int *logflagsp); /* inode logging flags */ @@ -455,7 +440,7 @@ xfs_bmap_add_attrfork_local( STATIC int /* error */ xfs_bmap_add_extent( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_btree_cur_t **curp, /* if *curp is null, not a btree */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ xfs_fsblock_t *first, /* pointer to firstblock variable */ @@ -472,23 +457,27 @@ xfs_bmap_add_extent( xfs_extnum_t nextents; /* number of extents in file now */ XFS_STATS_INC(xs_add_exlist); + cur = *curp; ifp = XFS_IFORK_PTR(ip, whichfork); nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t); - ASSERT(idx <= nextents); da_old = da_new = 0; error = 0; + + ASSERT(*idx >= 0); + ASSERT(*idx <= nextents); + /* * This is the first extent added to a new/empty file. * Special case this one, so other routines get to assume there are * already extents in the list. */ if (nextents == 0) { - xfs_iext_insert(ip, 0, 1, new, + xfs_iext_insert(ip, *idx, 1, new, whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0); ASSERT(cur == NULL); - ifp->if_lastex = 0; + if (!isnullstartblock(new->br_startblock)) { XFS_IFORK_NEXT_SET(ip, whichfork, 1); logflags = XFS_ILOG_CORE | xfs_ilog_fext(whichfork); @@ -502,27 +491,25 @@ xfs_bmap_add_extent( if (cur) ASSERT((cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL) == 0); - error = xfs_bmap_add_extent_hole_delay(ip, idx, new, &logflags); - if (error) - goto done; + error = xfs_bmap_add_extent_hole_delay(ip, idx, new, + &logflags); } /* * Real allocation off the end of the file. */ - else if (idx == nextents) { + else if (*idx == nextents) { if (cur) ASSERT((cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL) == 0); - if ((error = xfs_bmap_add_extent_hole_real(ip, idx, cur, new, - &logflags, whichfork))) - goto done; + error = xfs_bmap_add_extent_hole_real(ip, idx, cur, new, + &logflags, whichfork); } else { xfs_bmbt_irec_t prev; /* old extent at offset idx */ /* * Get the record referred to by idx. */ - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, idx), &prev); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx), &prev); /* * If it's a real allocation record, and the new allocation ends * after the start of the referred to record, then we're filling @@ -537,23 +524,18 @@ xfs_bmap_add_extent( if (cur) ASSERT(cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL); - error = xfs_bmap_add_extent_delay_real(ip, idx, - &cur, new, &da_new, first, - flist, &logflags); - if (error) - goto done; - } else if (new->br_state == XFS_EXT_NORM) { - ASSERT(new->br_state == XFS_EXT_NORM); - if ((error = xfs_bmap_add_extent_unwritten_real( - ip, idx, &cur, new, &logflags))) - goto done; + error = xfs_bmap_add_extent_delay_real(ip, + idx, &cur, new, &da_new, + first, flist, &logflags); } else { - ASSERT(new->br_state == XFS_EXT_UNWRITTEN); - if ((error = xfs_bmap_add_extent_unwritten_real( - ip, idx, &cur, new, &logflags))) + ASSERT(new->br_state == XFS_EXT_NORM || + new->br_state == XFS_EXT_UNWRITTEN); + + error = xfs_bmap_add_extent_unwritten_real(ip, + idx, &cur, new, &logflags); + if (error) goto done; } - ASSERT(*curp == cur || *curp == NULL); } /* * Otherwise we're filling in a hole with an allocation. @@ -562,13 +544,15 @@ xfs_bmap_add_extent( if (cur) ASSERT((cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL) == 0); - if ((error = xfs_bmap_add_extent_hole_real(ip, idx, cur, - new, &logflags, whichfork))) - goto done; + error = xfs_bmap_add_extent_hole_real(ip, idx, cur, + new, &logflags, whichfork); } } + if (error) + goto done; ASSERT(*curp == cur || *curp == NULL); + /* * Convert to a btree if necessary. */ @@ -621,7 +605,7 @@ done: STATIC int /* error */ xfs_bmap_add_extent_delay_real( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_btree_cur_t **curp, /* if *curp is null, not a btree */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ xfs_filblks_t *dnew, /* new delayed-alloc indirect blocks */ @@ -653,7 +637,7 @@ xfs_bmap_add_extent_delay_real( */ cur = *curp; ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); - ep = xfs_iext_get_ext(ifp, idx); + ep = xfs_iext_get_ext(ifp, *idx); xfs_bmbt_get_all(ep, &PREV); new_endoff = new->br_startoff + new->br_blockcount; ASSERT(PREV.br_startoff <= new->br_startoff); @@ -672,9 +656,9 @@ xfs_bmap_add_extent_delay_real( * Check and set flags if this segment has a left neighbor. * Don't set contiguous if the combined extent would be too large. */ - if (idx > 0) { + if (*idx > 0) { state |= BMAP_LEFT_VALID; - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, idx - 1), &LEFT); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx - 1), &LEFT); if (isnullstartblock(LEFT.br_startblock)) state |= BMAP_LEFT_DELAY; @@ -692,9 +676,9 @@ xfs_bmap_add_extent_delay_real( * Don't set contiguous if the combined extent would be too large. * Also check for all-three-contiguous being too large. */ - if (idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t) - 1) { + if (*idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t) - 1) { state |= BMAP_RIGHT_VALID; - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, idx + 1), &RIGHT); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx + 1), &RIGHT); if (isnullstartblock(RIGHT.br_startblock)) state |= BMAP_RIGHT_DELAY; @@ -725,14 +709,13 @@ xfs_bmap_add_extent_delay_real( * Filling in all of a previously delayed allocation extent. * The left and right neighbors are both contiguous with new. */ - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), + trace_xfs_bmap_pre_update(ip, *idx - 1, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx - 1), LEFT.br_blockcount + PREV.br_blockcount + RIGHT.br_blockcount); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx - 1, state, _THIS_IP_); - xfs_iext_remove(ip, idx, 2, state); - ip->i_df.if_lastex = idx - 1; + xfs_iext_remove(ip, *idx, 2, state); ip->i_d.di_nextents--; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -756,6 +739,8 @@ xfs_bmap_add_extent_delay_real( RIGHT.br_blockcount, LEFT.br_state))) goto done; } + + --*idx; *dnew = 0; break; @@ -764,13 +749,12 @@ xfs_bmap_add_extent_delay_real( * Filling in all of a previously delayed allocation extent. * The left neighbor is contiguous, the right is not. */ - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), + trace_xfs_bmap_pre_update(ip, *idx - 1, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx - 1), LEFT.br_blockcount + PREV.br_blockcount); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx - 1, state, _THIS_IP_); - ip->i_df.if_lastex = idx - 1; - xfs_iext_remove(ip, idx, 1, state); + xfs_iext_remove(ip, *idx, 1, state); if (cur == NULL) rval = XFS_ILOG_DEXT; else { @@ -786,6 +770,8 @@ xfs_bmap_add_extent_delay_real( PREV.br_blockcount, LEFT.br_state))) goto done; } + + --*idx; *dnew = 0; break; @@ -794,14 +780,13 @@ xfs_bmap_add_extent_delay_real( * Filling in all of a previously delayed allocation extent. * The right neighbor is contiguous, the left is not. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_startblock(ep, new->br_startblock); xfs_bmbt_set_blockcount(ep, PREV.br_blockcount + RIGHT.br_blockcount); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - ip->i_df.if_lastex = idx; - xfs_iext_remove(ip, idx + 1, 1, state); + xfs_iext_remove(ip, *idx + 1, 1, state); if (cur == NULL) rval = XFS_ILOG_DEXT; else { @@ -817,6 +802,7 @@ xfs_bmap_add_extent_delay_real( RIGHT.br_blockcount, PREV.br_state))) goto done; } + *dnew = 0; break; @@ -826,11 +812,10 @@ xfs_bmap_add_extent_delay_real( * Neither the left nor right neighbors are contiguous with * the new one. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_startblock(ep, new->br_startblock); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - ip->i_df.if_lastex = idx; ip->i_d.di_nextents++; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -846,6 +831,7 @@ xfs_bmap_add_extent_delay_real( goto done; XFS_WANT_CORRUPTED_GOTO(i == 1, done); } + *dnew = 0; break; @@ -854,17 +840,16 @@ xfs_bmap_add_extent_delay_real( * Filling in the first part of a previous delayed allocation. * The left neighbor is contiguous. */ - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), + trace_xfs_bmap_pre_update(ip, *idx - 1, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx - 1), LEFT.br_blockcount + new->br_blockcount); xfs_bmbt_set_startoff(ep, PREV.br_startoff + new->br_blockcount); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx - 1, state, _THIS_IP_); temp = PREV.br_blockcount - new->br_blockcount; - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, temp); - ip->i_df.if_lastex = idx - 1; if (cur == NULL) rval = XFS_ILOG_DEXT; else { @@ -884,7 +869,9 @@ xfs_bmap_add_extent_delay_real( temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp), startblockval(PREV.br_startblock)); xfs_bmbt_set_startblock(ep, nullstartblock((int)temp)); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + + --*idx; *dnew = temp; break; @@ -893,12 +880,11 @@ xfs_bmap_add_extent_delay_real( * Filling in the first part of a previous delayed allocation. * The left neighbor is not contiguous. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_startoff(ep, new_endoff); temp = PREV.br_blockcount - new->br_blockcount; xfs_bmbt_set_blockcount(ep, temp); - xfs_iext_insert(ip, idx, 1, new, state); - ip->i_df.if_lastex = idx; + xfs_iext_insert(ip, *idx, 1, new, state); ip->i_d.di_nextents++; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -926,9 +912,10 @@ xfs_bmap_add_extent_delay_real( temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp), startblockval(PREV.br_startblock) - (cur ? cur->bc_private.b.allocated : 0)); - ep = xfs_iext_get_ext(ifp, idx + 1); + ep = xfs_iext_get_ext(ifp, *idx + 1); xfs_bmbt_set_startblock(ep, nullstartblock((int)temp)); - trace_xfs_bmap_post_update(ip, idx + 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx + 1, state, _THIS_IP_); + *dnew = temp; break; @@ -938,15 +925,14 @@ xfs_bmap_add_extent_delay_real( * The right neighbor is contiguous with the new allocation. */ temp = PREV.br_blockcount - new->br_blockcount; - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); - trace_xfs_bmap_pre_update(ip, idx + 1, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx + 1, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, temp); - xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, idx + 1), + xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, *idx + 1), new->br_startoff, new->br_startblock, new->br_blockcount + RIGHT.br_blockcount, RIGHT.br_state); - trace_xfs_bmap_post_update(ip, idx + 1, state, _THIS_IP_); - ip->i_df.if_lastex = idx + 1; + trace_xfs_bmap_post_update(ip, *idx + 1, state, _THIS_IP_); if (cur == NULL) rval = XFS_ILOG_DEXT; else { @@ -966,7 +952,9 @@ xfs_bmap_add_extent_delay_real( temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp), startblockval(PREV.br_startblock)); xfs_bmbt_set_startblock(ep, nullstartblock((int)temp)); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + + ++*idx; *dnew = temp; break; @@ -976,10 +964,9 @@ xfs_bmap_add_extent_delay_real( * The right neighbor is not contiguous. */ temp = PREV.br_blockcount - new->br_blockcount; - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, temp); - xfs_iext_insert(ip, idx + 1, 1, new, state); - ip->i_df.if_lastex = idx + 1; + xfs_iext_insert(ip, *idx + 1, 1, new, state); ip->i_d.di_nextents++; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -1007,9 +994,11 @@ xfs_bmap_add_extent_delay_real( temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp), startblockval(PREV.br_startblock) - (cur ? cur->bc_private.b.allocated : 0)); - ep = xfs_iext_get_ext(ifp, idx); + ep = xfs_iext_get_ext(ifp, *idx); xfs_bmbt_set_startblock(ep, nullstartblock((int)temp)); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + + ++*idx; *dnew = temp; break; @@ -1036,7 +1025,7 @@ xfs_bmap_add_extent_delay_real( */ temp = new->br_startoff - PREV.br_startoff; temp2 = PREV.br_startoff + PREV.br_blockcount - new_endoff; - trace_xfs_bmap_pre_update(ip, idx, 0, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, 0, _THIS_IP_); xfs_bmbt_set_blockcount(ep, temp); /* truncate PREV */ LEFT = *new; RIGHT.br_state = PREV.br_state; @@ -1045,8 +1034,7 @@ xfs_bmap_add_extent_delay_real( RIGHT.br_startoff = new_endoff; RIGHT.br_blockcount = temp2; /* insert LEFT (r[0]) and RIGHT (r[1]) at the same time */ - xfs_iext_insert(ip, idx + 1, 2, &LEFT, state); - ip->i_df.if_lastex = idx + 1; + xfs_iext_insert(ip, *idx + 1, 2, &LEFT, state); ip->i_d.di_nextents++; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -1103,13 +1091,15 @@ xfs_bmap_add_extent_delay_real( } } } - ep = xfs_iext_get_ext(ifp, idx); + ep = xfs_iext_get_ext(ifp, *idx); xfs_bmbt_set_startblock(ep, nullstartblock((int)temp)); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); - trace_xfs_bmap_pre_update(ip, idx + 2, state, _THIS_IP_); - xfs_bmbt_set_startblock(xfs_iext_get_ext(ifp, idx + 2), + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx + 2, state, _THIS_IP_); + xfs_bmbt_set_startblock(xfs_iext_get_ext(ifp, *idx + 2), nullstartblock((int)temp2)); - trace_xfs_bmap_post_update(ip, idx + 2, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx + 2, state, _THIS_IP_); + + ++*idx; *dnew = temp + temp2; break; @@ -1141,7 +1131,7 @@ done: STATIC int /* error */ xfs_bmap_add_extent_unwritten_real( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_btree_cur_t **curp, /* if *curp is null, not a btree */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ int *logflagsp) /* inode logging flags */ @@ -1168,7 +1158,7 @@ xfs_bmap_add_extent_unwritten_real( error = 0; cur = *curp; ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); - ep = xfs_iext_get_ext(ifp, idx); + ep = xfs_iext_get_ext(ifp, *idx); xfs_bmbt_get_all(ep, &PREV); newext = new->br_state; oldext = (newext == XFS_EXT_UNWRITTEN) ? @@ -1191,9 +1181,9 @@ xfs_bmap_add_extent_unwritten_real( * Check and set flags if this segment has a left neighbor. * Don't set contiguous if the combined extent would be too large. */ - if (idx > 0) { + if (*idx > 0) { state |= BMAP_LEFT_VALID; - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, idx - 1), &LEFT); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx - 1), &LEFT); if (isnullstartblock(LEFT.br_startblock)) state |= BMAP_LEFT_DELAY; @@ -1211,9 +1201,9 @@ xfs_bmap_add_extent_unwritten_real( * Don't set contiguous if the combined extent would be too large. * Also check for all-three-contiguous being too large. */ - if (idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t) - 1) { + if (*idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t) - 1) { state |= BMAP_RIGHT_VALID; - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, idx + 1), &RIGHT); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx + 1), &RIGHT); if (isnullstartblock(RIGHT.br_startblock)) state |= BMAP_RIGHT_DELAY; } @@ -1242,14 +1232,15 @@ xfs_bmap_add_extent_unwritten_real( * Setting all of a previous oldext extent to newext. * The left and right neighbors are both contiguous with new. */ - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), + --*idx; + + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx), LEFT.br_blockcount + PREV.br_blockcount + RIGHT.br_blockcount); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - xfs_iext_remove(ip, idx, 2, state); - ip->i_df.if_lastex = idx - 1; + xfs_iext_remove(ip, *idx + 1, 2, state); ip->i_d.di_nextents -= 2; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -1285,13 +1276,14 @@ xfs_bmap_add_extent_unwritten_real( * Setting all of a previous oldext extent to newext. * The left neighbor is contiguous, the right is not. */ - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), + --*idx; + + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx), LEFT.br_blockcount + PREV.br_blockcount); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - ip->i_df.if_lastex = idx - 1; - xfs_iext_remove(ip, idx, 1, state); + xfs_iext_remove(ip, *idx + 1, 1, state); ip->i_d.di_nextents--; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -1321,13 +1313,12 @@ xfs_bmap_add_extent_unwritten_real( * Setting all of a previous oldext extent to newext. * The right neighbor is contiguous, the left is not. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, PREV.br_blockcount + RIGHT.br_blockcount); xfs_bmbt_set_state(ep, newext); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); - ip->i_df.if_lastex = idx; - xfs_iext_remove(ip, idx + 1, 1, state); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + xfs_iext_remove(ip, *idx + 1, 1, state); ip->i_d.di_nextents--; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -1358,11 +1349,10 @@ xfs_bmap_add_extent_unwritten_real( * Neither the left nor right neighbors are contiguous with * the new one. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_state(ep, newext); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - ip->i_df.if_lastex = idx; if (cur == NULL) rval = XFS_ILOG_DEXT; else { @@ -1384,21 +1374,22 @@ xfs_bmap_add_extent_unwritten_real( * Setting the first part of a previous oldext extent to newext. * The left neighbor is contiguous. */ - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), + trace_xfs_bmap_pre_update(ip, *idx - 1, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx - 1), LEFT.br_blockcount + new->br_blockcount); xfs_bmbt_set_startoff(ep, PREV.br_startoff + new->br_blockcount); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx - 1, state, _THIS_IP_); - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_startblock(ep, new->br_startblock + new->br_blockcount); xfs_bmbt_set_blockcount(ep, PREV.br_blockcount - new->br_blockcount); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + + --*idx; - ip->i_df.if_lastex = idx - 1; if (cur == NULL) rval = XFS_ILOG_DEXT; else { @@ -1429,17 +1420,16 @@ xfs_bmap_add_extent_unwritten_real( * Setting the first part of a previous oldext extent to newext. * The left neighbor is not contiguous. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); ASSERT(ep && xfs_bmbt_get_state(ep) == oldext); xfs_bmbt_set_startoff(ep, new_endoff); xfs_bmbt_set_blockcount(ep, PREV.br_blockcount - new->br_blockcount); xfs_bmbt_set_startblock(ep, new->br_startblock + new->br_blockcount); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - xfs_iext_insert(ip, idx, 1, new, state); - ip->i_df.if_lastex = idx; + xfs_iext_insert(ip, *idx, 1, new, state); ip->i_d.di_nextents++; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -1468,17 +1458,19 @@ xfs_bmap_add_extent_unwritten_real( * Setting the last part of a previous oldext extent to newext. * The right neighbor is contiguous with the new allocation. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); - trace_xfs_bmap_pre_update(ip, idx + 1, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, PREV.br_blockcount - new->br_blockcount); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); - xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, idx + 1), + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + + ++*idx; + + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); + xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, *idx), new->br_startoff, new->br_startblock, new->br_blockcount + RIGHT.br_blockcount, newext); - trace_xfs_bmap_post_update(ip, idx + 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - ip->i_df.if_lastex = idx + 1; if (cur == NULL) rval = XFS_ILOG_DEXT; else { @@ -1508,13 +1500,14 @@ xfs_bmap_add_extent_unwritten_real( * Setting the last part of a previous oldext extent to newext. * The right neighbor is not contiguous. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, PREV.br_blockcount - new->br_blockcount); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + + ++*idx; + xfs_iext_insert(ip, *idx, 1, new, state); - xfs_iext_insert(ip, idx + 1, 1, new, state); - ip->i_df.if_lastex = idx + 1; ip->i_d.di_nextents++; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -1548,10 +1541,10 @@ xfs_bmap_add_extent_unwritten_real( * newext. Contiguity is impossible here. * One extent becomes three extents. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, new->br_startoff - PREV.br_startoff); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); r[0] = *new; r[1].br_startoff = new_endoff; @@ -1559,8 +1552,10 @@ xfs_bmap_add_extent_unwritten_real( PREV.br_startoff + PREV.br_blockcount - new_endoff; r[1].br_startblock = new->br_startblock + new->br_blockcount; r[1].br_state = oldext; - xfs_iext_insert(ip, idx + 1, 2, &r[0], state); - ip->i_df.if_lastex = idx + 1; + + ++*idx; + xfs_iext_insert(ip, *idx, 2, &r[0], state); + ip->i_d.di_nextents += 2; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; @@ -1630,7 +1625,7 @@ done: STATIC int /* error */ xfs_bmap_add_extent_hole_delay( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ int *logflagsp) /* inode logging flags */ { @@ -1644,16 +1639,16 @@ xfs_bmap_add_extent_hole_delay( xfs_filblks_t temp=0; /* temp for indirect calculations */ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); - ep = xfs_iext_get_ext(ifp, idx); + ep = xfs_iext_get_ext(ifp, *idx); state = 0; ASSERT(isnullstartblock(new->br_startblock)); /* * Check and set flags if this segment has a left neighbor */ - if (idx > 0) { + if (*idx > 0) { state |= BMAP_LEFT_VALID; - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, idx - 1), &left); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx - 1), &left); if (isnullstartblock(left.br_startblock)) state |= BMAP_LEFT_DELAY; @@ -1663,7 +1658,7 @@ xfs_bmap_add_extent_hole_delay( * Check and set flags if the current (right) segment exists. * If it doesn't exist, we're converting the hole at end-of-file. */ - if (idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) { + if (*idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) { state |= BMAP_RIGHT_VALID; xfs_bmbt_get_all(ep, &right); @@ -1698,21 +1693,21 @@ xfs_bmap_add_extent_hole_delay( * on the left and on the right. * Merge all three into a single extent record. */ + --*idx; temp = left.br_blockcount + new->br_blockcount + right.br_blockcount; - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), temp); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx), temp); oldlen = startblockval(left.br_startblock) + startblockval(new->br_startblock) + startblockval(right.br_startblock); newlen = xfs_bmap_worst_indlen(ip, temp); - xfs_bmbt_set_startblock(xfs_iext_get_ext(ifp, idx - 1), + xfs_bmbt_set_startblock(xfs_iext_get_ext(ifp, *idx), nullstartblock((int)newlen)); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - xfs_iext_remove(ip, idx, 1, state); - ip->i_df.if_lastex = idx - 1; + xfs_iext_remove(ip, *idx + 1, 1, state); break; case BMAP_LEFT_CONTIG: @@ -1721,17 +1716,17 @@ xfs_bmap_add_extent_hole_delay( * on the left. * Merge the new allocation with the left neighbor. */ + --*idx; temp = left.br_blockcount + new->br_blockcount; - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), temp); + + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx), temp); oldlen = startblockval(left.br_startblock) + startblockval(new->br_startblock); newlen = xfs_bmap_worst_indlen(ip, temp); - xfs_bmbt_set_startblock(xfs_iext_get_ext(ifp, idx - 1), + xfs_bmbt_set_startblock(xfs_iext_get_ext(ifp, *idx), nullstartblock((int)newlen)); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); - - ip->i_df.if_lastex = idx - 1; + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); break; case BMAP_RIGHT_CONTIG: @@ -1740,16 +1735,14 @@ xfs_bmap_add_extent_hole_delay( * on the right. * Merge the new allocation with the right neighbor. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); temp = new->br_blockcount + right.br_blockcount; oldlen = startblockval(new->br_startblock) + startblockval(right.br_startblock); newlen = xfs_bmap_worst_indlen(ip, temp); xfs_bmbt_set_allf(ep, new->br_startoff, nullstartblock((int)newlen), temp, right.br_state); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); - - ip->i_df.if_lastex = idx; + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); break; case 0: @@ -1759,8 +1752,7 @@ xfs_bmap_add_extent_hole_delay( * Insert a new entry. */ oldlen = newlen = 0; - xfs_iext_insert(ip, idx, 1, new, state); - ip->i_df.if_lastex = idx; + xfs_iext_insert(ip, *idx, 1, new, state); break; } if (oldlen != newlen) { @@ -1782,7 +1774,7 @@ xfs_bmap_add_extent_hole_delay( STATIC int /* error */ xfs_bmap_add_extent_hole_real( xfs_inode_t *ip, /* incore inode pointer */ - xfs_extnum_t idx, /* extent number to update/insert */ + xfs_extnum_t *idx, /* extent number to update/insert */ xfs_btree_cur_t *cur, /* if null, not a btree */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ int *logflagsp, /* inode logging flags */ @@ -1798,8 +1790,8 @@ xfs_bmap_add_extent_hole_real( int state; /* state bits, accessed thru macros */ ifp = XFS_IFORK_PTR(ip, whichfork); - ASSERT(idx <= ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)); - ep = xfs_iext_get_ext(ifp, idx); + ASSERT(*idx <= ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)); + ep = xfs_iext_get_ext(ifp, *idx); state = 0; if (whichfork == XFS_ATTR_FORK) @@ -1808,9 +1800,9 @@ xfs_bmap_add_extent_hole_real( /* * Check and set flags if this segment has a left neighbor. */ - if (idx > 0) { + if (*idx > 0) { state |= BMAP_LEFT_VALID; - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, idx - 1), &left); + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx - 1), &left); if (isnullstartblock(left.br_startblock)) state |= BMAP_LEFT_DELAY; } @@ -1819,7 +1811,7 @@ xfs_bmap_add_extent_hole_real( * Check and set flags if this segment has a current value. * Not true if we're inserting into the "hole" at eof. */ - if (idx < ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) { + if (*idx < ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) { state |= BMAP_RIGHT_VALID; xfs_bmbt_get_all(ep, &right); if (isnullstartblock(right.br_startblock)) @@ -1858,14 +1850,15 @@ xfs_bmap_add_extent_hole_real( * left and on the right. * Merge all three into a single extent record. */ - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), + --*idx; + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx), left.br_blockcount + new->br_blockcount + right.br_blockcount); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + + xfs_iext_remove(ip, *idx + 1, 1, state); - xfs_iext_remove(ip, idx, 1, state); - ifp->if_lastex = idx - 1; XFS_IFORK_NEXT_SET(ip, whichfork, XFS_IFORK_NEXTENTS(ip, whichfork) - 1); if (cur == NULL) { @@ -1900,12 +1893,12 @@ xfs_bmap_add_extent_hole_real( * on the left. * Merge the new allocation with the left neighbor. */ - trace_xfs_bmap_pre_update(ip, idx - 1, state, _THIS_IP_); - xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, idx - 1), + --*idx; + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); + xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx), left.br_blockcount + new->br_blockcount); - trace_xfs_bmap_post_update(ip, idx - 1, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - ifp->if_lastex = idx - 1; if (cur == NULL) { rval = xfs_ilog_fext(whichfork); } else { @@ -1931,13 +1924,12 @@ xfs_bmap_add_extent_hole_real( * on the right. * Merge the new allocation with the right neighbor. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_allf(ep, new->br_startoff, new->br_startblock, new->br_blockcount + right.br_blockcount, right.br_state); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); - ifp->if_lastex = idx; if (cur == NULL) { rval = xfs_ilog_fext(whichfork); } else { @@ -1963,8 +1955,7 @@ xfs_bmap_add_extent_hole_real( * real allocation. * Insert a new entry. */ - xfs_iext_insert(ip, idx, 1, new, state); - ifp->if_lastex = idx; + xfs_iext_insert(ip, *idx, 1, new, state); XFS_IFORK_NEXT_SET(ip, whichfork, XFS_IFORK_NEXTENTS(ip, whichfork) + 1); if (cur == NULL) { @@ -2812,7 +2803,7 @@ STATIC int /* error */ xfs_bmap_del_extent( xfs_inode_t *ip, /* incore inode pointer */ xfs_trans_t *tp, /* current transaction pointer */ - xfs_extnum_t idx, /* extent number to update/delete */ + xfs_extnum_t *idx, /* extent number to update/delete */ xfs_bmap_free_t *flist, /* list of extents to be freed */ xfs_btree_cur_t *cur, /* if null, not a btree */ xfs_bmbt_irec_t *del, /* data to remove from extents */ @@ -2848,10 +2839,10 @@ xfs_bmap_del_extent( mp = ip->i_mount; ifp = XFS_IFORK_PTR(ip, whichfork); - ASSERT((idx >= 0) && (idx < ifp->if_bytes / + ASSERT((*idx >= 0) && (*idx < ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))); ASSERT(del->br_blockcount > 0); - ep = xfs_iext_get_ext(ifp, idx); + ep = xfs_iext_get_ext(ifp, *idx); xfs_bmbt_get_all(ep, &got); ASSERT(got.br_startoff <= del->br_startoff); del_endoff = del->br_startoff + del->br_blockcount; @@ -2925,9 +2916,8 @@ xfs_bmap_del_extent( /* * Matches the whole extent. Delete the entry. */ - xfs_iext_remove(ip, idx, 1, + xfs_iext_remove(ip, *idx, 1, whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0); - ifp->if_lastex = idx; if (delay) break; XFS_IFORK_NEXT_SET(ip, whichfork, @@ -2946,21 +2936,20 @@ xfs_bmap_del_extent( /* * Deleting the first part of the extent. */ - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_startoff(ep, del_endoff); temp = got.br_blockcount - del->br_blockcount; xfs_bmbt_set_blockcount(ep, temp); - ifp->if_lastex = idx; if (delay) { temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp), da_old); xfs_bmbt_set_startblock(ep, nullstartblock((int)temp)); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); da_new = temp; break; } xfs_bmbt_set_startblock(ep, del_endblock); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); if (!cur) { flags |= xfs_ilog_fext(whichfork); break; @@ -2976,18 +2965,17 @@ xfs_bmap_del_extent( * Deleting the last part of the extent. */ temp = got.br_blockcount - del->br_blockcount; - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, temp); - ifp->if_lastex = idx; if (delay) { temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp), da_old); xfs_bmbt_set_startblock(ep, nullstartblock((int)temp)); - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); da_new = temp; break; } - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); if (!cur) { flags |= xfs_ilog_fext(whichfork); break; @@ -3004,7 +2992,7 @@ xfs_bmap_del_extent( * Deleting the middle of the extent. */ temp = del->br_startoff - got.br_startoff; - trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); + trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); xfs_bmbt_set_blockcount(ep, temp); new.br_startoff = del_endoff; temp2 = got_endoff - del_endoff; @@ -3091,9 +3079,9 @@ xfs_bmap_del_extent( } } } - trace_xfs_bmap_post_update(ip, idx, state, _THIS_IP_); - xfs_iext_insert(ip, idx + 1, 1, &new, state); - ifp->if_lastex = idx + 1; + trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); + xfs_iext_insert(ip, *idx + 1, 1, &new, state); + ++*idx; break; } /* @@ -4674,13 +4662,12 @@ xfs_bmapi( if (!wasdelay && (flags & XFS_BMAPI_PREALLOC)) got.br_state = XFS_EXT_UNWRITTEN; } - error = xfs_bmap_add_extent(ip, lastx, &cur, &got, + error = xfs_bmap_add_extent(ip, &lastx, &cur, &got, firstblock, flist, &tmp_logflags, whichfork); logflags |= tmp_logflags; if (error) goto error0; - lastx = ifp->if_lastex; ep = xfs_iext_get_ext(ifp, lastx); nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t); xfs_bmbt_get_all(ep, &got); @@ -4776,13 +4763,12 @@ xfs_bmapi( mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN) ? XFS_EXT_NORM : XFS_EXT_UNWRITTEN; - error = xfs_bmap_add_extent(ip, lastx, &cur, mval, + error = xfs_bmap_add_extent(ip, &lastx, &cur, mval, firstblock, flist, &tmp_logflags, whichfork); logflags |= tmp_logflags; if (error) goto error0; - lastx = ifp->if_lastex; ep = xfs_iext_get_ext(ifp, lastx); nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t); xfs_bmbt_get_all(ep, &got); @@ -4848,7 +4834,6 @@ xfs_bmapi( else xfs_bmbt_get_all(ep, &got); } - ifp->if_lastex = lastx; *nmap = n; /* * Transform from btree to extents, give it cur. @@ -4957,7 +4942,6 @@ xfs_bmapi_single( ASSERT(!isnullstartblock(got.br_startblock)); ASSERT(bno < got.br_startoff + got.br_blockcount); *fsb = got.br_startblock + (bno - got.br_startoff); - ifp->if_lastex = lastx; return 0; } @@ -5132,7 +5116,7 @@ xfs_bunmapi( del.br_blockcount = mod; } del.br_state = XFS_EXT_UNWRITTEN; - error = xfs_bmap_add_extent(ip, lastx, &cur, &del, + error = xfs_bmap_add_extent(ip, &lastx, &cur, &del, firstblock, flist, &logflags, XFS_DATA_FORK); if (error) @@ -5186,7 +5170,8 @@ xfs_bunmapi( prev.br_startoff = start; } prev.br_state = XFS_EXT_UNWRITTEN; - error = xfs_bmap_add_extent(ip, lastx - 1, &cur, + lastx--; + error = xfs_bmap_add_extent(ip, &lastx, &cur, &prev, firstblock, flist, &logflags, XFS_DATA_FORK); if (error) @@ -5195,7 +5180,7 @@ xfs_bunmapi( } else { ASSERT(del.br_state == XFS_EXT_NORM); del.br_state = XFS_EXT_UNWRITTEN; - error = xfs_bmap_add_extent(ip, lastx, &cur, + error = xfs_bmap_add_extent(ip, &lastx, &cur, &del, firstblock, flist, &logflags, XFS_DATA_FORK); if (error) @@ -5249,14 +5234,13 @@ xfs_bunmapi( error = XFS_ERROR(ENOSPC); goto error0; } - error = xfs_bmap_del_extent(ip, tp, lastx, flist, cur, &del, + error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del, &tmp_logflags, whichfork); logflags |= tmp_logflags; if (error) goto error0; bno = del.br_startoff - 1; nodelete: - lastx = ifp->if_lastex; /* * If not done go on to the next (previous) record. * Reset ep in case the extents array was re-alloced. @@ -5273,7 +5257,6 @@ nodelete: extno++; } } - ifp->if_lastex = lastx; *done = bno == (xfs_fileoff_t)-1 || bno < start || lastx < 0; ASSERT(ifp->if_ext_max == XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t)); Index: xfs/fs/xfs/xfs_inode.c =================================================================== --- xfs.orig/fs/xfs/xfs_inode.c 2011-05-10 13:29:36.446348941 +0200 +++ xfs/fs/xfs/xfs_inode.c 2011-05-10 13:29:40.622349777 +0200 @@ -920,7 +920,6 @@ xfs_iread_extents( /* * We know that the size is valid (it's checked in iformat_btree) */ - ifp->if_lastex = NULLEXTNUM; ifp->if_bytes = ifp->if_real_bytes = 0; ifp->if_flags |= XFS_IFEXTENTS; xfs_iext_add(ifp, 0, nextents); @@ -3191,7 +3190,6 @@ xfs_iext_add( } ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext; ifp->if_real_bytes = 0; - ifp->if_lastex = nextents + ext_diff; } /* * Otherwise use a linear (direct) extent list. Index: xfs/fs/xfs/xfs_inode.h =================================================================== --- xfs.orig/fs/xfs/xfs_inode.h 2011-05-10 13:29:36.462348766 +0200 +++ xfs/fs/xfs/xfs_inode.h 2011-05-10 13:29:40.622349777 +0200 @@ -67,7 +67,6 @@ typedef struct xfs_ifork { short if_broot_bytes; /* bytes allocated for root */ unsigned char if_flags; /* per-fork flags */ unsigned char if_ext_max; /* max # of extent records */ - xfs_extnum_t if_lastex; /* last if_extents used */ union { xfs_bmbt_rec_host_t *if_extents;/* linear map file exts */ xfs_ext_irec_t *if_ext_irec; /* irec map file exts */ From branto@redhat.com Wed May 11 10:07:29 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_43, J_CHICKENPOX_92 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BF7TOx128196 for ; Wed, 11 May 2011 10:07:29 -0500 X-ASG-Debug-ID: 1305126448-0c5303850000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D929D1EAC3DB for ; Wed, 11 May 2011 08:07:28 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id twNm3ZkEmEP6mIL9 for ; Wed, 11 May 2011 08:07:28 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4BF7RC2030496 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 11 May 2011 11:07:27 -0400 Received: from [10.34.26.208] (dhcp-26-208.brq.redhat.com [10.34.26.208]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p4BF7QNn006116 for ; Wed, 11 May 2011 11:07:27 -0400 X-ASG-Orig-Subj: xfstests: print the message that fallocate is not supported to stdout unless quiet output Subject: xfstests: print the message that fallocate is not supported to stdout unless quiet output From: Boris Ranto To: xfs Content-Type: text/plain; charset="UTF-8" Date: Wed, 11 May 2011 17:07:25 +0200 Message-ID: <1305126445.22267.37.camel@dhcp-31-190.brq.redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1305126448 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean ltp/fsx.c tests whether the filesystem it is run on supports fallocate. If it is not supported the fsx will print warning to stderr. This leads to fails of tests 075, 112, 127 for the filesystems that do not support fallocate. The tests use ltp/fsx but do not filter out stderr. Since ltp/fsx.c can work without fallocate support I propose to move this message to stdout unless quiet output is not requested. Previous patch printed the message even if -q flag was used. This patch honours the flag. This simple patch fixes the issue for me, tested on all the mentioned tests: Signed-off-by: Boris Ranto diff --git a/ltp/fsx.c b/ltp/fsx.c index fe072d3..43e7efe 100644 --- a/ltp/fsx.c +++ b/ltp/fsx.c @@ -1424,7 +1424,8 @@ main(int argc, char **argv) #ifdef FALLOCATE if (!lite && fallocate_calls) { if (fallocate(fd, 0, 0, 1) && errno == EOPNOTSUPP) { - warn("main: filesystem does not support fallocate, disabling"); + if(!quiet) + prt("fsx: main: filesystem does not support fallocate, disabling\n"); fallocate_calls = 0; } else ftruncate(fd, 0); From sandeen@sandeen.net Wed May 11 11:55:35 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_43, J_CHICKENPOX_92 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4BGtYCO131975 for ; Wed, 11 May 2011 11:55:35 -0500 X-ASG-Debug-ID: 1305132933-2b2801a20000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id EA7781BDCE68 for ; Wed, 11 May 2011 09:55:33 -0700 (PDT) Received: from mail.sandeen.net (sandeen.net [63.231.237.45]) by cuda.sgi.com with ESMTP id kx6rZ7XwkZHvBCfA for ; Wed, 11 May 2011 09:55:33 -0700 (PDT) Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sandeen.net (Postfix) with ESMTP id 0B4BD4964600; Wed, 11 May 2011 11:55:33 -0500 (CDT) Message-ID: <4DCABF85.4080705@sandeen.net> Date: Wed, 11 May 2011 11:55:33 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Boris Ranto CC: xfs X-ASG-Orig-Subj: Re: xfstests: print the message that fallocate is not supported to stdout unless quiet output Subject: Re: xfstests: print the message that fallocate is not supported to stdout unless quiet output References: <1305126445.22267.37.camel@dhcp-31-190.brq.redhat.com> In-Reply-To: <1305126445.22267.37.camel@dhcp-31-190.brq.redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: sandeen.net[63.231.237.45] X-Barracuda-Start-Time: 1305132933 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63439 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/11/11 10:07 AM, Boris Ranto wrote: > ltp/fsx.c tests whether the filesystem it is run on supports fallocate. If it is not supported the fsx will print warning to stderr. > This leads to fails of tests 075, 112, 127 for the filesystems that do not support fallocate. The tests use ltp/fsx but do not filter out stderr. > Since ltp/fsx.c can work without fallocate support I propose to move this message to stdout unless quiet output is not requested. > Previous patch printed the message even if -q flag was used. This patch honours the flag. > > This simple patch fixes the issue for me, tested on all the mentioned tests: > > Signed-off-by: Boris Ranto Good idea, sorry about that! I'll commit this with my signoff. -Eric > diff --git a/ltp/fsx.c b/ltp/fsx.c > index fe072d3..43e7efe 100644 > --- a/ltp/fsx.c > +++ b/ltp/fsx.c > @@ -1424,7 +1424,8 @@ main(int argc, char **argv) > #ifdef FALLOCATE > if (!lite && fallocate_calls) { > if (fallocate(fd, 0, 0, 1) && errno == EOPNOTSUPP) { > - warn("main: filesystem does not support fallocate, disabling"); > + if(!quiet) > + prt("fsx: main: filesystem does not support fallocate, disabling\n"); > fallocate_calls = 0; > } else > ftruncate(fd, 0); > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > From lmcilroy@redhat.com Thu May 12 01:50:20 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C6oKk4162406 for ; Thu, 12 May 2011 01:50:20 -0500 X-ASG-Debug-ID: 1305183018-576500960000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx3-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A81161D674F7 for ; Wed, 11 May 2011 23:50:18 -0700 (PDT) Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) by cuda.sgi.com with ESMTP id AtmHKT4bCOvQfl5l for ; Wed, 11 May 2011 23:50:18 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C6oA5e020214; Thu, 12 May 2011 02:50:10 -0400 Date: Thu, 12 May 2011 02:50:10 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Christoph Hellwig Cc: xfs@oss.sgi.com Message-ID: <2082652758.471187.1305183010394.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110511150711.786279651@bombadil.infradead.org> X-ASG-Orig-Subj: Re: [PATCH 3/9] xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent Subject: Re: [PATCH 3/9] xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.71] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx3-phx2.redhat.com[209.132.183.24] X-Barracuda-Start-Time: 1305183019 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean ----- Original Message ----- > The code in xfs_bmap_del_extent does not correctly decrement the > extent buffer > index when deleting a whole extent. Most of the time this gets caught > by > checks in xfs_bmapi that work around it and decrement it manually and > thus > wasn't noticed so far. > > Based on an earlier patch from Lachlan McIlroy. > > Signed-off-by: Christoph Hellwig > > Index: xfs/fs/xfs/xfs_bmap.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-10 17:11:21.212901236 +0200 > +++ xfs/fs/xfs/xfs_bmap.c 2011-05-10 17:13:36.177399627 +0200 > @@ -2916,8 +2916,10 @@ xfs_bmap_del_extent( > */ > xfs_iext_remove(ip, *idx, 1, > whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0); > + --*idx; I can see why this is needed but if we remove extent at idx 0 then wont this go negative and confuse the next call to xfs_iext_get_ext()? > if (delay) > break; > + > XFS_IFORK_NEXT_SET(ip, whichfork, > XFS_IFORK_NEXTENTS(ip, whichfork) - 1); > flags |= XFS_ILOG_CORE; > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From lmcilroy@redhat.com Thu May 12 01:54:45 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C6siwj162625 for ; Thu, 12 May 2011 01:54:45 -0500 X-ASG-Debug-ID: 1305183283-4ff700b00000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx3-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 8EE794492CD for ; Wed, 11 May 2011 23:54:44 -0700 (PDT) Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) by cuda.sgi.com with ESMTP id mGM1Rh232xUapstC for ; Wed, 11 May 2011 23:54:44 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C6sfXS020688; Thu, 12 May 2011 02:54:41 -0400 Date: Thu, 12 May 2011 02:54:41 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Lachlan McIlroy Cc: xfs@oss.sgi.com, Christoph Hellwig Message-ID: <591073632.471215.1305183281646.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <2082652758.471187.1305183010394.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> X-ASG-Orig-Subj: Re: [PATCH 3/9] xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent Subject: Re: [PATCH 3/9] xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.72] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx3-phx2.redhat.com[209.132.183.24] X-Barracuda-Start-Time: 1305183284 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean ----- Original Message ----- > ----- Original Message ----- > > The code in xfs_bmap_del_extent does not correctly decrement the > > extent buffer > > index when deleting a whole extent. Most of the time this gets > > caught > > by > > checks in xfs_bmapi that work around it and decrement it manually > > and > > thus > > wasn't noticed so far. > > > > Based on an earlier patch from Lachlan McIlroy. > > > > Signed-off-by: Christoph Hellwig > > > > Index: xfs/fs/xfs/xfs_bmap.c > > =================================================================== > > --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-10 17:11:21.212901236 +0200 > > +++ xfs/fs/xfs/xfs_bmap.c 2011-05-10 17:13:36.177399627 +0200 > > @@ -2916,8 +2916,10 @@ xfs_bmap_del_extent( > > */ > > xfs_iext_remove(ip, *idx, 1, > > whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0); > > + --*idx; > > I can see why this is needed but if we remove extent at > idx 0 then wont this go negative and confuse the next > call to xfs_iext_get_ext()? Ignore this comment - I see you've fixed that case in patch 6. Guess I should have looked ahead. > > > if (delay) > > break; > > + > > XFS_IFORK_NEXT_SET(ip, whichfork, > > XFS_IFORK_NEXTENTS(ip, whichfork) - 1); > > flags |= XFS_ILOG_CORE; > > > > _______________________________________________ > > xfs mailing list > > xfs@oss.sgi.com > > http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From lmcilroy@redhat.com Thu May 12 02:17:45 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C7HjIo163478 for ; Thu, 12 May 2011 02:17:45 -0500 X-ASG-Debug-ID: 1305184663-04f901220000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx3-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 9A76A14CB9AF for ; Thu, 12 May 2011 00:17:43 -0700 (PDT) Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) by cuda.sgi.com with ESMTP id BOfG9ew2CPv6g5pE for ; Thu, 12 May 2011 00:17:43 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C7HdOn023111; Thu, 12 May 2011 03:17:39 -0400 Date: Thu, 12 May 2011 03:17:39 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Christoph Hellwig Cc: xfs@oss.sgi.com Message-ID: <1419180781.471454.1305184659046.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110511150711.786279651@bombadil.infradead.org> X-ASG-Orig-Subj: Re: [PATCH 3/9] xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent Subject: Re: [PATCH 3/9] xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.71] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx3-phx2.redhat.com[209.132.183.24] X-Barracuda-Start-Time: 1305184664 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good. ----- Original Message ----- > The code in xfs_bmap_del_extent does not correctly decrement the > extent buffer > index when deleting a whole extent. Most of the time this gets caught > by > checks in xfs_bmapi that work around it and decrement it manually and > thus > wasn't noticed so far. > > Based on an earlier patch from Lachlan McIlroy. > > Signed-off-by: Christoph Hellwig > > Index: xfs/fs/xfs/xfs_bmap.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-10 17:11:21.212901236 +0200 > +++ xfs/fs/xfs/xfs_bmap.c 2011-05-10 17:13:36.177399627 +0200 > @@ -2916,8 +2916,10 @@ xfs_bmap_del_extent( > */ > xfs_iext_remove(ip, *idx, 1, > whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0); > + --*idx; > if (delay) > break; > + > XFS_IFORK_NEXT_SET(ip, whichfork, > XFS_IFORK_NEXTENTS(ip, whichfork) - 1); > flags |= XFS_ILOG_CORE; > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From lmcilroy@redhat.com Thu May 12 02:21:03 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C7L38O163603 for ; Thu, 12 May 2011 02:21:03 -0500 X-ASG-Debug-ID: 1305184862-576d01290000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx4-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 178111BDDE90 for ; Thu, 12 May 2011 00:21:02 -0700 (PDT) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by cuda.sgi.com with ESMTP id aNVyScZQ84hiScYA for ; Thu, 12 May 2011 00:21:02 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C7KwMp008085; Thu, 12 May 2011 03:20:58 -0400 Date: Thu, 12 May 2011 03:20:58 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Christoph Hellwig Cc: xfs@oss.sgi.com Message-ID: <1935452012.471482.1305184858098.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110511150712.222802741@bombadil.infradead.org> X-ASG-Orig-Subj: Re: [PATCH 5/9] xfs: do not use unchecked extent indices in xfs_bmapi Subject: Re: [PATCH 5/9] xfs: do not use unchecked extent indices in xfs_bmapi MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.71] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx4-phx2.redhat.com[209.132.183.25] X-Barracuda-Start-Time: 1305184863 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good. ----- Original Message ----- > Make sure to only call xfs_iext_get_ext after we've validate the > extent index > when moving on to the next index in xfs_bmapi. > > Based on an earlier patch from Lachlan McIlroy. > > Signed-off-by: Christoph Hellwig > > Index: xfs/fs/xfs/xfs_bmap.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-11 10:16:58.831733512 +0200 > +++ xfs/fs/xfs/xfs_bmap.c 2011-05-11 10:16:58.847733078 +0200 > @@ -4827,12 +4827,13 @@ xfs_bmapi( > /* > * Else go on to the next record. > */ > - ep = xfs_iext_get_ext(ifp, ++lastx); > prev = got; > - if (lastx >= nextents) > - eof = 1; > - else > + if (++lastx < nextents) { > + ep = xfs_iext_get_ext(ifp, lastx); > xfs_bmbt_get_all(ep, &got); > + } else { > + eof = 1; > + } > } > *nmap = n; > /* > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From lmcilroy@redhat.com Thu May 12 02:23:00 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C7N09c163790 for ; Thu, 12 May 2011 02:23:00 -0500 X-ASG-Debug-ID: 1305184979-4fee011b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx4-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id EBEC94493F2 for ; Thu, 12 May 2011 00:22:59 -0700 (PDT) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by cuda.sgi.com with ESMTP id yAavw19FxtstJyJL for ; Thu, 12 May 2011 00:22:59 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C7MsGX008181; Thu, 12 May 2011 03:22:54 -0400 Date: Thu, 12 May 2011 03:22:54 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Christoph Hellwig Cc: xfs@oss.sgi.com Message-ID: <86249815.471503.1305184974655.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110511150712.421348825@bombadil.infradead.org> X-ASG-Orig-Subj: Re: [PATCH 6/9] xfs: do not use unchecked extent indices in xfs_bunmapi Subject: Re: [PATCH 6/9] xfs: do not use unchecked extent indices in xfs_bunmapi MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.71] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx4-phx2.redhat.com[209.132.183.25] X-Barracuda-Start-Time: 1305184979 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good. ----- Original Message ----- > Make sure to only call xfs_iext_get_ext after we've validate the > extent index > when moving on to the next index in xfs_bunmapi. Also remove the old > workaround for too large indices that has been superceeded by the > proper > fix in xfs_bmap_del_extent. > > Based on an earlier patch from Lachlan McIlroy. > > Signed-off-by: Christoph Hellwig > > Index: xfs/fs/xfs/xfs_bmap.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-11 10:17:04.803235692 +0200 > +++ xfs/fs/xfs/xfs_bmap.c 2011-05-11 10:17:06.432734169 +0200 > @@ -5247,17 +5247,17 @@ xfs_bunmapi( > nodelete: > /* > * If not done go on to the next (previous) record. > - * Reset ep in case the extents array was re-alloced. > */ > - ep = xfs_iext_get_ext(ifp, lastx); > if (bno != (xfs_fileoff_t)-1 && bno >= start) { > - if (lastx >= XFS_IFORK_NEXTENTS(ip, whichfork) || > - xfs_bmbt_get_startoff(ep) > bno) { > - if (--lastx >= 0) > - ep = xfs_iext_get_ext(ifp, lastx); > - } > - if (lastx >= 0) > + if (lastx >= 0) { > + ep = xfs_iext_get_ext(ifp, lastx); > + if (xfs_bmbt_get_startoff(ep) > bno) { > + if (--lastx >= 0) > + ep = xfs_iext_get_ext(ifp, > + lastx); > + } > xfs_bmbt_get_all(ep, &got); > + } > extno++; > } > } > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From lmcilroy@redhat.com Thu May 12 02:23:42 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_63 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C7Nf9C163842 for ; Thu, 12 May 2011 02:23:42 -0500 X-ASG-Debug-ID: 1305185021-04f7016b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx4-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 767F914CB8EB for ; Thu, 12 May 2011 00:23:41 -0700 (PDT) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by cuda.sgi.com with ESMTP id 4IvBdCh8r2NDQg9j for ; Thu, 12 May 2011 00:23:41 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C7NcBe008216; Thu, 12 May 2011 03:23:38 -0400 Date: Thu, 12 May 2011 03:23:38 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Christoph Hellwig Cc: xfs@oss.sgi.com Message-ID: <77245965.471510.1305185018312.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110511150712.651478046@bombadil.infradead.org> X-ASG-Orig-Subj: Re: [PATCH 7/9] xfs: do not do pointer arithmetics on extent records Subject: Re: [PATCH 7/9] xfs: do not do pointer arithmetics on extent records MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.71] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx4-phx2.redhat.com[209.132.183.25] X-Barracuda-Start-Time: 1305185021 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good. Nice catch too. ----- Original Message ----- > We need to call xfs_iext_get_ext for the previous extent to get a > valid > pointer, and can't just do pointer arithmetics as they might be in > different pages. > > Signed-off-by: Christoph Hellwig > > Index: xfs/fs/xfs/xfs_bmap.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-11 10:16:58.847733078 +0200 > +++ xfs/fs/xfs/xfs_bmap.c 2011-05-11 10:17:04.803235692 +0200 > @@ -5145,9 +5145,12 @@ xfs_bunmapi( > */ > ASSERT(bno >= del.br_blockcount); > bno -= del.br_blockcount; > - if (bno < got.br_startoff) { > - if (--lastx >= 0) > - xfs_bmbt_get_all(--ep, &got); > + if (got.br_startoff > bno) { > + if (--lastx >= 0) { > + ep = xfs_iext_get_ext(ifp, > + lastx); > + xfs_bmbt_get_all(ep, &got); > + } > } > continue; > } else if (del.br_state == XFS_EXT_UNWRITTEN) { > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From lmcilroy@redhat.com Thu May 12 02:24:06 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C7O6kj163865 for ; Thu, 12 May 2011 02:24:06 -0500 X-ASG-Debug-ID: 1305185045-576d01380000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx4-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D258C1D67DC6 for ; Thu, 12 May 2011 00:24:05 -0700 (PDT) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by cuda.sgi.com with ESMTP id I9LmjlwDXchnnYLM for ; Thu, 12 May 2011 00:24:05 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C7O3uc008228; Thu, 12 May 2011 03:24:03 -0400 Date: Thu, 12 May 2011 03:24:03 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Christoph Hellwig Cc: xfs@oss.sgi.com Message-ID: <1953221482.471512.1305185043208.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110511150712.830693893@bombadil.infradead.org> X-ASG-Orig-Subj: Re: [PATCH 8/9] xfs: fix up asserts in xfs_iflush_fork Subject: Re: [PATCH 8/9] xfs: fix up asserts in xfs_iflush_fork MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.71] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx4-phx2.redhat.com[209.132.183.25] X-Barracuda-Start-Time: 1305185045 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good. ----- Original Message ----- > Remove asserts in xfs_iflush_fork that would call xfs_iext_get_ext > with > a potentially invalid extent buffer index. > > Based on an earlier patch from Lachlan McIlroy. > > Signed-off-by: Christoph Hellwig > > Index: xfs/fs/xfs/xfs_inode.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_inode.c 2011-05-11 10:18:39.555233397 +0200 > +++ xfs/fs/xfs/xfs_inode.c 2011-05-11 12:04:24.099733330 +0200 > @@ -2557,12 +2557,9 @@ xfs_iflush_fork( > case XFS_DINODE_FMT_EXTENTS: > ASSERT((ifp->if_flags & XFS_IFEXTENTS) || > !(iip->ili_format.ilf_fields & extflag[whichfork])); > - ASSERT((xfs_iext_get_ext(ifp, 0) != NULL) || > - (ifp->if_bytes == 0)); > - ASSERT((xfs_iext_get_ext(ifp, 0) == NULL) || > - (ifp->if_bytes > 0)); > if ((iip->ili_format.ilf_fields & extflag[whichfork]) && > (ifp->if_bytes > 0)) { > + ASSERT(xfs_iext_get_ext(ifp, 0)); > ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) > 0); > (void)xfs_iextents_copy(ip, (xfs_bmbt_rec_t *)cp, > whichfork); > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From lmcilroy@redhat.com Thu May 12 02:26:53 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_63 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C7Qr1L163950 for ; Thu, 12 May 2011 02:26:53 -0500 X-ASG-Debug-ID: 1305185212-313401f10000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx4-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0206C449581 for ; Thu, 12 May 2011 00:26:52 -0700 (PDT) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by cuda.sgi.com with ESMTP id 5BIFLWAOzoR0foaF for ; Thu, 12 May 2011 00:26:52 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C7QmJ1008633; Thu, 12 May 2011 03:26:49 -0400 Date: Thu, 12 May 2011 03:26:48 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Christoph Hellwig Cc: xfs@oss.sgi.com Message-ID: <805441546.471537.1305185208927.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110511150713.039506186@bombadil.infradead.org> X-ASG-Orig-Subj: Re: [PATCH 9/9] xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec Subject: Re: [PATCH 9/9] xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.72] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx4-phx2.redhat.com[209.132.183.25] X-Barracuda-Start-Time: 1305185213 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good too. Christoph, thanks for following up on these fixes - I didn't know they didn't make it in. ----- Original Message ----- > Based on an earlier patch from Lachlan McIlroy. > > Signed-off-by: Christoph Hellwig > > Index: xfs/fs/xfs/xfs_inode.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_inode.c 2011-05-11 12:05:12.943735034 +0200 > +++ xfs/fs/xfs/xfs_inode.c 2011-05-11 12:05:28.327733646 +0200 > @@ -3108,6 +3108,8 @@ xfs_iext_get_ext( > xfs_extnum_t idx) /* index of target extent */ > { > ASSERT(idx >= 0); > + ASSERT(idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t)); > + > if ((ifp->if_flags & XFS_IFEXTIREC) && (idx == 0)) { > return ifp->if_u1.if_ext_irec->er_extbuf; > } else if (ifp->if_flags & XFS_IFEXTIREC) { > @@ -3881,8 +3883,10 @@ xfs_iext_idx_to_irec( > xfs_extnum_t page_idx = *idxp; /* extent index in target list */ > > ASSERT(ifp->if_flags & XFS_IFEXTIREC); > - ASSERT(page_idx >= 0 && page_idx <= > - ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)); > + ASSERT(page_idx >= 0); > + ASSERT(page_idx <= ifp->if_bytes / sizeof(xfs_bmbt_rec_t)); > + ASSERT(page_idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t) || > realloc); > + > nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ; > erp_idx = 0; > low = 0; > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From lmcilroy@redhat.com Thu May 12 02:31:45 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4C7VinH164117 for ; Thu, 12 May 2011 02:31:45 -0500 X-ASG-Debug-ID: 1305185503-607e02400000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx4-phx2.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D320014CB530 for ; Thu, 12 May 2011 00:31:43 -0700 (PDT) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by cuda.sgi.com with ESMTP id VwDsVbjLIMTUj5ik for ; Thu, 12 May 2011 00:31:43 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from mail05.corp.redhat.com (zmail05.collab.prod.int.phx2.redhat.com [10.5.5.46]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4C7VeDo009143; Thu, 12 May 2011 03:31:40 -0400 Date: Thu, 12 May 2011 03:31:40 -0400 (EDT) From: Lachlan McIlroy Reply-To: Lachlan McIlroy To: Christoph Hellwig Cc: xfs@oss.sgi.com Message-ID: <1158296793.471594.1305185500356.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> In-Reply-To: <20110511150711.989383617@bombadil.infradead.org> X-ASG-Orig-Subj: Re: [PATCH 4/9] xfs: do not use unchecked extent indices in xfs_bmap_add_extent_* Subject: Re: [PATCH 4/9] xfs: do not use unchecked extent indices in xfs_bmap_add_extent_* MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.5.5.72] X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Linux)/6.0.9_GA_2686) X-Barracuda-Connect: mx4-phx2.redhat.com[209.132.183.25] X-Barracuda-Start-Time: 1305185503 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Looks good. ----- Original Message ----- > Make sure to only call xfs_iext_get_ext after we've validate the > extent index > in the various xfs_bmap_add_extent_* helpers. > > Based on an earlier patch from Lachlan McIlroy. > > Signed-off-by: Christoph Hellwig > > Index: xfs/fs/xfs/xfs_bmap.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_bmap.c 2011-05-10 13:57:12.297088697 +0200 > +++ xfs/fs/xfs/xfs_bmap.c 2011-05-10 14:00:16.405087271 +0200 > @@ -1629,7 +1629,6 @@ xfs_bmap_add_extent_hole_delay( > xfs_bmbt_irec_t *new, /* new data to add to file extents */ > int *logflagsp) /* inode logging flags */ > { > - xfs_bmbt_rec_host_t *ep; /* extent record for idx */ > xfs_ifork_t *ifp; /* inode fork pointer */ > xfs_bmbt_irec_t left; /* left neighbor extent entry */ > xfs_filblks_t newlen=0; /* new indirect size */ > @@ -1639,7 +1638,6 @@ xfs_bmap_add_extent_hole_delay( > xfs_filblks_t temp=0; /* temp for indirect calculations */ > > ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); > - ep = xfs_iext_get_ext(ifp, *idx); > state = 0; > ASSERT(isnullstartblock(new->br_startblock)); > > @@ -1660,7 +1658,7 @@ xfs_bmap_add_extent_hole_delay( > */ > if (*idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) { > state |= BMAP_RIGHT_VALID; > - xfs_bmbt_get_all(ep, &right); > + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx), &right); > > if (isnullstartblock(right.br_startblock)) > state |= BMAP_RIGHT_DELAY; > @@ -1740,7 +1738,8 @@ xfs_bmap_add_extent_hole_delay( > oldlen = startblockval(new->br_startblock) + > startblockval(right.br_startblock); > newlen = xfs_bmap_worst_indlen(ip, temp); > - xfs_bmbt_set_allf(ep, new->br_startoff, > + xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, *idx), > + new->br_startoff, > nullstartblock((int)newlen), temp, right.br_state); > trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); > break; > @@ -1780,7 +1779,6 @@ xfs_bmap_add_extent_hole_real( > int *logflagsp, /* inode logging flags */ > int whichfork) /* data or attr fork */ > { > - xfs_bmbt_rec_host_t *ep; /* pointer to extent entry ins. point */ > int error; /* error return value */ > int i; /* temp state */ > xfs_ifork_t *ifp; /* inode fork pointer */ > @@ -1791,7 +1789,6 @@ xfs_bmap_add_extent_hole_real( > > ifp = XFS_IFORK_PTR(ip, whichfork); > ASSERT(*idx <= ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)); > - ep = xfs_iext_get_ext(ifp, *idx); > state = 0; > > if (whichfork == XFS_ATTR_FORK) > @@ -1813,7 +1810,7 @@ xfs_bmap_add_extent_hole_real( > */ > if (*idx < ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) { > state |= BMAP_RIGHT_VALID; > - xfs_bmbt_get_all(ep, &right); > + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx), &right); > if (isnullstartblock(right.br_startblock)) > state |= BMAP_RIGHT_DELAY; > } > @@ -1925,7 +1922,8 @@ xfs_bmap_add_extent_hole_real( > * Merge the new allocation with the right neighbor. > */ > trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_); > - xfs_bmbt_set_allf(ep, new->br_startoff, new->br_startblock, > + xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, *idx), > + new->br_startoff, new->br_startblock, > new->br_blockcount + right.br_blockcount, > right.br_state); > trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_); > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From ms@citd.de Thu May 12 05:02:08 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,RCVD_IN_NJABL_PROXY autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4CA27Io169290 for ; Thu, 12 May 2011 05:02:08 -0500 X-ASG-Debug-ID: 1305194526-446e039b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from enyo.dsw2k3.info (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 790AD44ACB2 for ; Thu, 12 May 2011 03:02:06 -0700 (PDT) Received: from enyo.dsw2k3.info (enyo.dsw2k3.info [195.71.86.239]) by cuda.sgi.com with ESMTP id jZdWAnMgDaKxSz85 for ; Thu, 12 May 2011 03:02:06 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by enyo.dsw2k3.info (Postfix) with ESMTP id 5B5E298C7AF; Thu, 12 May 2011 12:02:05 +0200 (CEST) X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Scanned: Debian amavisd-new at enyo.dsw2k3.info Received: from enyo.dsw2k3.info ([127.0.0.1]) by localhost (enyo.dsw2k3.info [127.0.0.1]) (amavisd-new, port 10024) with LMTP id zVdvxq0CIvXd; Thu, 12 May 2011 12:01:56 +0200 (CEST) Received: from citd.de (pD9FF3ACC.dip.t-dialin.net [217.255.58.204]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by enyo.dsw2k3.info (Postfix) with ESMTPSA id D56ED98C6D0; Thu, 12 May 2011 12:01:55 +0200 (CEST) Date: Thu, 12 May 2011 12:01:53 +0200 From: Matthias Schniedermeyer To: Dave Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: Files appear too big in `du` Subject: Re: Files appear too big in `du` Message-ID: <20110512100153.GA19381@citd.de> References: <20110510105700.GA20307@citd.de> <20110510131705.GE19446@dastard> <20110510153300.GA5764@citd.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110510153300.GA5764@citd.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-Barracuda-Connect: enyo.dsw2k3.info[195.71.86.239] X-Barracuda-Start-Time: 1305194527 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-ASG-Whitelist: HEADER (^X-Barracuda-Connect: [^ ]+\.dsw2k3\.info\[) X-Virus-Status: Clean On 10.05.2011 17:33, Matthias Schniedermeyer wrote: > > > > Any idea how to debug this, or is this a known bug and waiting a few > > > days for 2.6.39 should fix this? > > > > It doesn't appear to be doing anything wrong from your description. > > Remember that XFS is optimised for high end storage and server > > configurations and workloads, not typical desktop usage... > > I would call it a regression. > I reguarly follow copying/downloading with `du`, the speculative > preallocation makes that more or less useless. Especially downloading > someting big from the internet which @ 231kb/s isn't exactly fast and > shows identical `du`s for increasingly longer periods of time. > (Or "--apparent-size" should be made default, but that falls short with > sparse-files) > > IMHO `du`/`ls -l` should not be able to 'see' the speculative > preallocation. After digging into the log of v2.6.37..v2.6.38 i stumbled upon: - snip - The allocsize mount option turns off the dynamic behaviour and fixes the prealloc size to whatever the mount option specifies. i.e. the behaviour is unchanged. - snip - I think Documentation/filesystems/xfs.txt is in need of an update. All that information in the commit-log is a little "out-of-reach" for most people. Bis denn -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. From sandeen@redhat.com Thu May 12 09:23:06 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4CEN52L180712 for ; Thu, 12 May 2011 09:23:06 -0500 X-ASG-Debug-ID: 1305210184-5d7e03850000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 69FFA14CD3D7 for ; Thu, 12 May 2011 07:23:04 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id tCL0EOeFum6tyJza for ; Thu, 12 May 2011 07:23:04 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4CEMu5i005757 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 12 May 2011 10:22:56 -0400 Received: from liberator.sandeen.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p4CEMsT9015713 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 12 May 2011 10:22:55 -0400 Message-ID: <4DCBED3E.4020302@redhat.com> Date: Thu, 12 May 2011 09:22:54 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Surbhi Palande CC: jack@suse.cz, marco.stornelli@gmail.com, adilger.kernel@dilger.ca, toshi.okajima@jp.fujitsu.com, tytso@mit.edu, m.mizuma@jp.fujitsu.com, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, xfs-oss X-ASG-Orig-Subj: Re: [PATCH] Attempt to sync the fsstress writes to a frozen F.S Subject: Re: [PATCH] Attempt to sync the fsstress writes to a frozen F.S References: <4DCA3583.7010904@canonical.com> <1305097841-2308-1-git-send-email-surbhi.palande@canonical.com> In-Reply-To: <1305097841-2308-1-git-send-email-surbhi.palande@canonical.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1305210185 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/11/11 2:10 AM, Surbhi Palande wrote: > While the fsstress background writes are busy dirtying the page cache, if a > fsfreeze happens then the background writes should stall. A sync should then > not have any data to sync to the FS. If it does have any data to sync then > sync will cause a deadlock by holding the s_umount write semaphore and waiting > in the wait queue for the FS to thaw, whereas the F.S can never thaw without > getting the s_umount write semaphore. > > Signed-off-by: Surbhi Palande Seems ok to me. In the future, when sending xfstests patches, if you can add "xfstests" to the subject, and cc: the xfs list, it'd be great. I presume that this test does fail for you without your fixes? I'll see if anyone on the xfs list has comments and if not, I can check this in. Thanks, -Eric > --- > 068 | 5 +++++ > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/068 b/068 > index 82c1a4e..b9ac58d 100755 > --- a/068 > +++ b/068 > @@ -101,6 +101,11 @@ do > tee -a $seq.full > sleep 2 > > + # there should be nothing to sync at this point. This may hang in case > + # of fsstress background writes dirtying the page cache while the F.S is frozen > + sync & > + sleep 2 > + > echo "*** thawing \$SCRATCH_MNT" | tee -a $seq.full > xfs_freeze -u "$SCRATCH_MNT" | tee -a $seq.full > [ $? != 0 ] && echo xfs_freeze -u "$SCRATCH_MNT" failed | \ From sandeen@redhat.com Fri May 13 16:35:58 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_43 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4DLZwjZ251811 for ; Fri, 13 May 2011 16:35:58 -0500 X-ASG-Debug-ID: 1305322557-17f603a70000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id BB47415D873E for ; Fri, 13 May 2011 14:35:57 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id bA5t2dnzAcE9y6OV for ; Fri, 13 May 2011 14:35:57 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4DLZuKk001865 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 13 May 2011 17:35:56 -0400 Received: from liberator.sandeen.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p4DLZsBf007321 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 13 May 2011 17:35:56 -0400 Message-ID: <4DCDA43A.30502@redhat.com> Date: Fri, 13 May 2011 16:35:54 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: xfs-oss CC: =?windows-1252?Q?Luk=E1=9A_Czerner?= X-ASG-Orig-Subj: [PATCH] xfstests 251: fix fitrim support test Subject: [PATCH] xfstests 251: fix fitrim support test Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1305322557 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On my ext4 filesystem, the simple "did fstrim work" test passes, because it asks to free all blocks in the first 10m of the fs, and those 10m are full of filesystem metadata. Because no blocks are free, no blocks are trimmed, and we get success returned. But then when the test runs I'm flooded with error messages, because it's a hard drive not an ssd... So we need to step through the fs until we either free a block, or encounter an error. I think this is ugly bash, if anyone has a better plan I'm all ears. (also change FSTRIM to FITRIM in the failure message, it seems to be intended to print the ioctl name ...) Signed-off-by: Eric Sandeen --- diff --git a/251 b/251 index fa3d74a..5ab0a87 100755 --- a/251 +++ b/251 @@ -73,7 +73,19 @@ _fail() _check_fstrim_support() { - $here/src/fstrim -l 10M $SCRATCH_MNT &> /dev/null + # Go until error or until something gets trimmed + step=1048576 + start=0 + retval=0 + nonetrimmed=1 + + while [ $retval -eq 0 ] && [ $nonetrimmed -ne 0 ]; do + result=`$here/src/fstrim -v -s $start -l $step $SCRATCH_MNT 2>&1` + retval=$? + [ "${result:0:1}" -eq "0" ] && nonetrimmed=1 + start=$(( $start + $step )) + done + return $retval } ## diff --git a/src/fstrim.c b/src/fstrim.c index f1f37ec..ad7fd6a 100644 --- a/src/fstrim.c +++ b/src/fstrim.c @@ -236,7 +236,7 @@ int main(int argc, char **argv) } if (ioctl(fd, FITRIM, opts->range)) { - fprintf(stderr, "%s: FSTRIM: %s\n", program_name, + fprintf(stderr, "%s: FITRIM %s\n", program_name, strerror(errno)); goto free_opts; } From xiaoqiangnk@gmail.com Fri May 13 22:47:26 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,FREEMAIL_FROM, J_CHICKENPOX_52,J_CHICKENPOX_84,T_DKIM_INVALID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4E3lQuu002242 for ; Fri, 13 May 2011 22:47:26 -0500 X-ASG-Debug-ID: 1305344845-21e101700000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-vx0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A05DA4517BC for ; Fri, 13 May 2011 20:47:25 -0700 (PDT) Received: from mail-vx0-f181.google.com (mail-vx0-f181.google.com [209.85.220.181]) by cuda.sgi.com with ESMTP id JPzwDF1XhbbuQfyh for ; Fri, 13 May 2011 20:47:25 -0700 (PDT) Received: by vxb39 with SMTP id 39so2526729vxb.26 for ; Fri, 13 May 2011 20:47:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to:cc :content-type; bh=CeACp7zBdgAxxmcGF+2hAql8+ckxTLgS/XWtIHdiNAM=; b=uMQ+XBCcs+SGO0QYQDHYR/CnPnW0uh+ld/1uUTFWdlNTNehTvV0xO1yZ8b08i2At4p FFZF+Af+K2c+Fq1N72UYNCx6CGNcZMJfon+AouLCEpmTyV8fBLX5vSVg1VmbZH7BwHcv RTqzygjWzH6Iw858jt+SGjbl8cNJcg+dhuNZk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; b=bCa8fMh8QEmg4qXDMQVNqxudzqvLG/ksxIcHnCMXKiXZk/i69tR3xd1Sph1diOMgN4 BFV1zbuIi/2Q7Jt9+n3iq7wQQ4zDc9vHExnPHoTtARYUep9yopSpConpVPT+4ZnRQuZJ V+xSta4ALjFz4eQN2Zr6ySHu/i0Gy9B77ydJo= MIME-Version: 1.0 Received: by 10.220.68.229 with SMTP id w37mr586878vci.148.1305344844822; Fri, 13 May 2011 20:47:24 -0700 (PDT) Received: by 10.220.170.141 with HTTP; Fri, 13 May 2011 20:47:24 -0700 (PDT) Date: Sat, 14 May 2011 11:47:24 +0800 Message-ID: X-ASG-Orig-Subj: [PATCH] xfstests:Make 225 compare map and fiemap at each block. Subject: [PATCH] xfstests:Make 225 compare map and fiemap at each block. From: Yongqiang Yang To: Eric Sandeen , josef@redhat.com Cc: Ext4 Developers List , xfs@oss.sgi.com Content-Type: text/plain; charset=ISO-8859-1 X-Barracuda-Connect: mail-vx0-f181.google.com[209.85.220.181] X-Barracuda-Start-Time: 1305344845 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63673 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hi All, Due to my carelessness, I induced a ugly patch to ext4's fiemap, but 225 could not find it. So I looked into the 225 and could not figure out logic in compare_map_and_fiemap(), which seemed to mixed extents with blocks. Then I made 225 compare map and fiemap at each block, the new 225 can find another bug in ext4's fiemap. The new 225 works well on ext3 and ext4 with both 1K and 4K block. However, it report fiemap error on xfs with 4K block. My working tree is 2.6.39-rc3 pulled from Ted's tree. The error message is as follows. QA output created by 225 fiemap run without preallocation, with sync +map is 'DDHDHHDHHDHDDHDDHDDHHDHDDHDDDDDDHHDDDHHHHDH DDDDDDDDHDDHHHDDDHDDHHDDDDDDHHHHHHDDHHHHHDHDHDHDD DHDDHD' +logical: [ 0.. 15] phys: 12.. 27 flags: 0x000 tot: 16 +logical: [ 17.. 31] phys: 29.. 43 flags: 0x000 tot: 15 +logical: [ 34.. 63] phys: 46.. 75 flags: 0x000 tot: 30 +logical: [ 65.. 95] phys: 77.. 107 flags: 0x001 tot: 31 +Problem comparing fiemap and map fiemap run without preallocation or sync +map is 'DDHDHHDHHDHDDHDDHDDHHDHDDHDDDDDDHHDDDHHHHDH DDDDDDDDHDDHHHDDDHDDHHDDDDDDHHHHHHDDHHHHHDHDHDHDD DHDDHD' +logical: [ 0.. 15] phys: 0.. 15 flags: 0x006 tot: 16 +Problem comparing fiemap and map Ran: 225 Failures: 225 Failed 1 of 1 tests I am not sure this is a bug in new 225 or xfs. Yongqiang. Signed-off-by: Yongqiang Yang --- src/fiemap-tester.c | 223 ++++++++++++++++++++++++++++---------------------- 1 files changed, 125 insertions(+), 98 deletions(-) diff --git a/src/fiemap-tester.c b/src/fiemap-tester.c index 1663f84..99bb5ce 100644 --- a/src/fiemap-tester.c +++ b/src/fiemap-tester.c @@ -14,6 +14,9 @@ * You should have received a copy of the GNU General Public License * along with this program; if not, write the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + * + * Compare map and fiemap at each block, + * Yongqiang Yang , 2011 */ #include @@ -57,7 +60,7 @@ generate_file_mapping(int blocks, int prealloc) int num_types = 2, cur_block = 0; int i = 0; - map = malloc(sizeof(char) * blocks); + map = malloc(sizeof(char) * (blocks + 1)); if (!map) return NULL; @@ -80,7 +83,8 @@ generate_file_mapping(int blocks, int prealloc) } cur_block++; } - + + map[blocks] = 0; return map; } @@ -247,55 +251,36 @@ check_flags(struct fiemap *fiemap, int blocksize) } static int -check_data(struct fiemap *fiemap, __u64 logical_offset, int blocksize, +check_data(struct fiemap_extent * extent , __u64 logical_offset, int blocksize, int last, int prealloc) { - struct fiemap_extent *extent; - __u64 orig_offset = logical_offset; - int c, found = 0; - - for (c = 0; c < fiemap->fm_mapped_extents; c++) { - __u64 start, end; - extent = &fiemap->fm_extents[c]; - - start = extent->fe_logical; - end = extent->fe_logical + extent->fe_length; - - if (logical_offset > end) - continue; - - if (logical_offset + blocksize < start) - break; - - if (logical_offset >= start && - logical_offset < end) { - if (prealloc && - !(extent->fe_flags & FIEMAP_EXTENT_UNWRITTEN)) { - printf("ERROR: preallocated extent is not " - "marked with FIEMAP_EXTENT_UNWRITTEN: " - "%llu\n", - (unsigned long long) - (start / blocksize)); - return -1; - } - - if (logical_offset + blocksize > end) { - logical_offset = end+1; - continue; - } else { - found = 1; - break; - } + int found = 0; + __u64 start, end; + + start = extent->fe_logical; + end = extent->fe_logical + extent->fe_length; + + if (logical_offset >= start && + logical_offset < end) { + if (prealloc && + !(extent->fe_flags & FIEMAP_EXTENT_UNWRITTEN)) { + printf("ERROR: preallocated extent is not " + "marked with FIEMAP_EXTENT_UNWRITTEN: " + "%llu\n", + (unsigned long long) + (start / blocksize)); + return -1; } + found = 1; } - + if (!found) { printf("ERROR: couldn't find extent at %llu\n", - (unsigned long long)(orig_offset / blocksize)); + (unsigned long long)(logical_offset / blocksize)); } else if (last && - !(fiemap->fm_extents[c].fe_flags & FIEMAP_EXTENT_LAST)) { + !(extent->fe_flags & FIEMAP_EXTENT_LAST)) { printf("ERROR: last extent not marked as last: %llu\n", - (unsigned long long)(orig_offset / blocksize)); + (unsigned long long)(logical_offset / blocksize)); found = 0; } @@ -370,37 +355,26 @@ check_weird_fs_hole(int fd, __u64 logical_offset, int blocksize) } static int -check_hole(struct fiemap *fiemap, int fd, __u64 logical_offset, int blocksize) +check_hole(struct fiemap_extent *extent, int fd, __u64 logical_offset, int blocksize) { - struct fiemap_extent *extent; - int c; + __u64 start, end; - for (c = 0; c < fiemap->fm_mapped_extents; c++) { - __u64 start, end; - extent = &fiemap->fm_extents[c]; + start = extent->fe_logical; + end = extent->fe_logical + extent->fe_length; - start = extent->fe_logical; - end = extent->fe_logical + extent->fe_length; + if (logical_offset >= start && + logical_offset < end) { - if (logical_offset > end) - continue; - if (logical_offset + blocksize < start) - break; + if (check_weird_fs_hole(fd, logical_offset, + blocksize) == 0) + return 0; - if (logical_offset >= start && - logical_offset < end) { - - if (check_weird_fs_hole(fd, logical_offset, - blocksize) == 0) - break; - - printf("ERROR: found an allocated extent where a hole " - "should be: %llu\n", - (unsigned long long)(start / blocksize)); - return -1; - } + printf("ERROR: found an allocated extent where a hole " + "should be: %llu\n", + (unsigned long long)(start / blocksize)); + return -1; } - + return 0; } @@ -423,9 +397,11 @@ compare_fiemap_and_map(int fd, char *map, int blocks, int blocksize, int syncfil { struct fiemap *fiemap; char *fiebuf; - int blocks_to_map, ret, cur_extent = 0, last_data; + int blocks_to_map, ret, last_data = -1; __u64 map_start, map_length; int i, c; + int cur_block = 0; + int last_found = 0; if (query_fiemap_count(fd, blocks, blocksize) < 0) return -1; @@ -451,8 +427,11 @@ compare_fiemap_and_map(int fd, char *map, int blocks, int blocksize, int syncfil fiemap->fm_extent_count = blocks_to_map; fiemap->fm_mapped_extents = 0; + /* check fiemap by looking at each block. */ do { - fiemap->fm_start = map_start; + int nr_extents; + + fiemap->fm_start = cur_block * blocksize; fiemap->fm_length = map_length; ret = ioctl(fd, FS_IOC_FIEMAP, (unsigned long)fiemap); @@ -465,45 +444,93 @@ compare_fiemap_and_map(int fd, char *map, int blocks, int blocksize, int syncfil if (check_flags(fiemap, blocksize)) goto error; - for (i = cur_extent, c = 1; i < blocks; i++, c++) { - __u64 logical_offset = i * blocksize; + nr_extents = fiemap->fm_mapped_extents; + if (nr_extents == 0) { + int block = cur_block + (map_length - 1)/ blocksize; + for (; cur_block <= block && cur_block < blocks; cur_block++) { + /* check hole */ + if (map[cur_block] != 'H') { + printf("ERROR: map[%d] should not be " + "a hole\n", cur_block); + goto error; + } + } + continue; + } - if (c > fiemap->fm_mapped_extents) { - i++; - break; + for (c = 0; c < nr_extents; c++) { + __u64 offset; + int block; + struct fiemap_extent *extent; + + if (last_found) { + printf("ERROR: there is extent after" + "the last extent\n"); + goto error; } - switch (map[i]) { - case 'D': - if (check_data(fiemap, logical_offset, - blocksize, last_data == i, 0)) - goto error; - break; - case 'H': - if (check_hole(fiemap, fd, logical_offset, - blocksize)) - goto error; - break; - case 'P': - if (check_data(fiemap, logical_offset, - blocksize, last_data == i, 1)) + extent = &fiemap->fm_extents[c]; + offset = extent->fe_logical; + block = offset / blocksize; + + /* check hole. */ + for (; cur_block < block; cur_block++) { + if (map[cur_block] != 'H') { + printf("ERROR: map[%d] should not be " + "a hole\n", cur_block); goto error; - break; - default: - printf("ERROR: weird value in map: %c\n", - map[i]); + } + } + + offset = extent->fe_logical + extent->fe_length; + block = offset / blocksize; + + if (block > blocks) { + printf("ERROR: there are extents beyond EOF\n"); goto error; } + + /* check data */ + for (; cur_block < block; cur_block++) { + offset = (__u64)cur_block * blocksize; + last_found = (last_data == cur_block); + switch (map[cur_block]) { + case 'D': + if (check_data(extent, offset, + blocksize, last_found, 0)) + goto error; + break; + case 'H': + if (check_hole(extent, fd, offset, + blocksize)); + goto error; + break; + + case 'P': + if (check_data(extent, offset, + blocksize, last_found, 1)) + goto error; + break; + default: + printf("ERROR: weird value in map: %c\n", + map[i]); + goto error; + } + } } - cur_extent = i; - map_start = i * blocksize; - } while (cur_extent < blocks); + } while (cur_block < blocks); - ret = 0; - return ret; + if (!last_found && last_data != -1) { + printf("ERROR: find no last extent\n"); + goto error; + } + + free(fiebuf); + return 0; error: printf("map is '%s'\n", map); show_extents(fiemap, blocksize); + free(fiebuf); return -1; } -- 1.7.5.1 -- Best Wishes Yongqiang Yang From googlepromo.online@london.com Sat May 14 11:15:45 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.3 required=5.0 tests=BAYES_99,FREEMAIL_FROM, MIME_8BIT_HEADER,T_LOTS_OF_MONEY,T_TO_NO_BRKTS_FREEMAIL autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4EGFjri029744 for ; Sat, 14 May 2011 11:15:45 -0500 X-ASG-Debug-ID: 1305389733-70e1007f0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.ird.gov.br (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id 95C5215E6AC1 for ; Sat, 14 May 2011 09:15:33 -0700 (PDT) Received: from mail.ird.gov.br (mail.ird.gov.br [200.20.227.66]) by cuda.sgi.com with SMTP id 8OONmOrlO67FnWG3 for ; Sat, 14 May 2011 09:15:33 -0700 (PDT) Received: from ird.gov.br (ird02.ird.gov.br [127.0.0.1]) by mail.ird.gov.br (Postfix) with ESMTP id 8C49E2B00D6; Sat, 14 May 2011 12:59:45 -0300 (BRST) From: "Promo Announcer" X-ASG-Orig-Subj: =?ISO-8859-1?Q?=A92011_Google_12th_?=Anniversary Celebration Promo (Thanks for contributing to our Success, Congrats!) Subject: =?ISO-8859-1?Q?=A92011_Google_12th_?=Anniversary Celebration Promo (Thanks for contributing to our Success, Congrats!) Date: Sat, 14 May 2011 13:59:45 -0200 Message-Id: <20110514154402.M66948@london.com> X-Mailer: OpenWebMail 2.52 20060502 X-OriginatingIP: 91.121.23.20 (rgadelha) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 To: undisclosed-recipients:; X-Barracuda-Connect: mail.ird.gov.br[200.20.227.66] X-Barracuda-Start-Time: 1305389744 X-Barracuda-Bayes: INNOCENT GLOBAL 0.5923 1.0000 0.7500 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.75 X-Barracuda-Spam-Status: No, SCORE=0.75 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63723 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean We urgently wish to inform you that your e-mail has won the sum of £ 850,000 UK Pounds {with L u c k y # :12-12-23-35-40-41(12) & T i c k e t # :008 695 757 336 64} in our on-going 12th Anniversary Giveaway Online promotions. For further information to be provided, you have to reply this notice immediately to confirm this email account is still active. Sincerely!, MRS LISA (P r o m o Announcer) From googlepromo.online@london.com Sat May 14 11:28:33 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.3 required=5.0 tests=BAYES_99,FREEMAIL_FROM, MIME_8BIT_HEADER,T_LOTS_OF_MONEY,T_TO_NO_BRKTS_FREEMAIL autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4EGSXBu030141 for ; Sat, 14 May 2011 11:28:33 -0500 X-ASG-Debug-ID: 1305390501-25d800a70000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.ird.gov.br (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id B46FC453EA2 for ; Sat, 14 May 2011 09:28:21 -0700 (PDT) Received: from mail.ird.gov.br (mail.ird.gov.br [200.20.227.66]) by cuda.sgi.com with SMTP id AburOCd64QM79ubq for ; Sat, 14 May 2011 09:28:21 -0700 (PDT) Received: from ird.gov.br (ird02.ird.gov.br [127.0.0.1]) by mail.ird.gov.br (Postfix) with ESMTP id 4E35C2B00FA; Sat, 14 May 2011 13:05:00 -0300 (BRST) From: "Promo Announcer" X-ASG-Orig-Subj: =?ISO-8859-1?Q?=A92011_Google_12th_?=Anniversary Celebration Promo (Thanks for contributing to our Success, Congrats!) Subject: =?ISO-8859-1?Q?=A92011_Google_12th_?=Anniversary Celebration Promo (Thanks for contributing to our Success, Congrats!) Date: Sat, 14 May 2011 14:05:00 -0200 Message-Id: <20110514160500.M95310@london.com> X-Mailer: OpenWebMail 2.52 20060502 X-OriginatingIP: 91.121.23.20 (rgadelha) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 To: undisclosed-recipients:; X-Barracuda-Connect: mail.ird.gov.br[200.20.227.66] X-Barracuda-Start-Time: 1305390512 X-Barracuda-Bayes: INNOCENT GLOBAL 0.5169 1.0000 0.7500 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.75 X-Barracuda-Spam-Status: No, SCORE=0.75 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63725 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean We urgently wish to inform you that your e-mail has won the sum of £ 850,000 UK Pounds {with L u c k y # :12-12-23-35-40-41(12) & T i c k e t # :008 695 757 336 64} in our on-going 12th Anniversary Giveaway Online promotions. For further information to be provided, you have to reply this notice immediately to confirm this email account is still active. Sincerely!, MRS LISA (P r o m o Announcer) From googlepromo.online@london.com Sat May 14 11:44:50 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.3 required=5.0 tests=BAYES_99,FREEMAIL_FROM, MIME_8BIT_HEADER,T_LOTS_OF_MONEY,T_TO_NO_BRKTS_FREEMAIL autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4EGinMq030711 for ; Sat, 14 May 2011 11:44:50 -0500 X-ASG-Debug-ID: 1305391478-0c3300bc0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.ird.gov.br (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id E1ECB15CDCDE for ; Sat, 14 May 2011 09:44:38 -0700 (PDT) Received: from mail.ird.gov.br (mail.ird.gov.br [200.20.227.66]) by cuda.sgi.com with SMTP id AjVQ1bUE0LeTRYkx for ; Sat, 14 May 2011 09:44:38 -0700 (PDT) Received: from ird.gov.br (ird02.ird.gov.br [127.0.0.1]) by mail.ird.gov.br (Postfix) with ESMTP id 061AC803F; Sat, 14 May 2011 13:13:20 -0300 (BRST) From: "Promo Announcer" X-ASG-Orig-Subj: =?ISO-8859-1?Q?=A92011_Google_12th_?=Anniversary Celebration Promo (Thanks for contributing to our Success, Congrats!) Subject: =?ISO-8859-1?Q?=A92011_Google_12th_?=Anniversary Celebration Promo (Thanks for contributing to our Success, Congrats!) Date: Sat, 14 May 2011 14:13:19 -0200 Message-Id: <20110514160755.M69220@london.com> X-Mailer: OpenWebMail 2.52 20060502 X-OriginatingIP: 91.121.23.20 (rgadelha) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 To: undisclosed-recipients:; X-Barracuda-Connect: mail.ird.gov.br[200.20.227.66] X-Barracuda-Start-Time: 1305391488 X-Barracuda-Bayes: INNOCENT GLOBAL 0.5923 1.0000 0.7500 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.75 X-Barracuda-Spam-Status: No, SCORE=0.75 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63725 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean We urgently wish to inform you that your e-mail has won the sum of £ 850,000 UK Pounds {with L u c k y # :12-12-23-35-40-41(12) & T i c k e t # :008 695 757 336 64} in our on-going 12th Anniversary Giveaway Online promotions. For further information to be provided, you have to reply this notice immediately to confirm this email account is still active. Sincerely!, MRS LISA (P r o m o Announcer) From googlepromo.online@london.com Sat May 14 11:45:06 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.3 required=5.0 tests=BAYES_99,FREEMAIL_FROM, MIME_8BIT_HEADER,T_LOTS_OF_MONEY,T_TO_NO_BRKTS_FREEMAIL autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4EGj6Op030762 for ; Sat, 14 May 2011 11:45:06 -0500 X-ASG-Debug-ID: 1305391495-155403260000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.ird.gov.br (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with SMTP id 4A0EB454044 for ; Sat, 14 May 2011 09:44:55 -0700 (PDT) Received: from mail.ird.gov.br (mail.ird.gov.br [200.20.227.66]) by cuda.sgi.com with SMTP id e3O4o005StBhxZsH for ; Sat, 14 May 2011 09:44:55 -0700 (PDT) Received: from ird.gov.br (ird02.ird.gov.br [127.0.0.1]) by mail.ird.gov.br (Postfix) with ESMTP id 61AB68023; Sat, 14 May 2011 13:13:51 -0300 (BRST) From: "Promo Announcer" X-ASG-Orig-Subj: =?ISO-8859-1?Q?=A92011_Google_12th_?=Anniversary Celebration Promo (Thanks for contributing to our Success, Congrats!) Subject: =?ISO-8859-1?Q?=A92011_Google_12th_?=Anniversary Celebration Promo (Thanks for contributing to our Success, Congrats!) Date: Sat, 14 May 2011 14:13:51 -0200 Message-Id: <20110514160757.M4104@london.com> X-Mailer: OpenWebMail 2.52 20060502 X-OriginatingIP: 91.121.23.20 (rgadelha) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 To: undisclosed-recipients:; X-Barracuda-Connect: mail.ird.gov.br[200.20.227.66] X-Barracuda-Start-Time: 1305391506 X-Barracuda-Bayes: INNOCENT GLOBAL 0.5169 1.0000 0.7500 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 0.75 X-Barracuda-Spam-Status: No, SCORE=0.75 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63725 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean We urgently wish to inform you that your e-mail has won the sum of £ 850,000 UK Pounds {with L u c k y # :12-12-23-35-40-41(12) & T i c k e t # :008 695 757 336 64} in our on-going 12th Anniversary Giveaway Online promotions. For further information to be provided, you have to reply this notice immediately to confirm this email account is still active. Sincerely!, MRS LISA (P r o m o Announcer) From mprobst@zmcconsulting.com Sun May 15 22:20:26 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_45 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4G3KP88112552 for ; Sun, 15 May 2011 22:20:26 -0500 X-ASG-Debug-ID: 1305516023-1327018b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mprobst.securesites.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 4F2A7457489 for ; Sun, 15 May 2011 20:20:23 -0700 (PDT) Received: from mprobst.securesites.net (mprobst2.securesites.net [198.173.85.114]) by cuda.sgi.com with ESMTP id 1PkHREMxhtVqhJWo for ; Sun, 15 May 2011 20:20:23 -0700 (PDT) Received: from [192.168.102.101] (photos.bigmama.probst.org [71.195.218.88]) (authenticated bits=0) by mprobst.securesites.net (8.14.4/8.14.4) with ESMTP id p4G3KJvF050323 for ; Sun, 15 May 2011 21:20:22 -0600 (MDT) (envelope-from mprobst@zmcconsulting.com) Message-ID: <4DD097F9.6070205@zmcconsulting.com> Date: Sun, 15 May 2011 21:20:25 -0600 From: "Matthew J. Probst" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: xfs@oss.sgi.com X-ASG-Orig-Subj: XFS_WANT_CORRUPTED_GOTO on repair of large myisam (mysql) table Subject: XFS_WANT_CORRUPTED_GOTO on repair of large myisam (mysql) table Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mprobst2.securesites.net[198.173.85.114] X-Barracuda-Start-Time: 1305516024 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63865 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hi, I've run into the infamous XFS_WANT_CORRUPTED_GOTO error.... twice in the same day.... with an xfs_repair in-between. Both time I hit this error, I was doing a myisamchk to repair a large corrupted mysql table (with a 20GB data file and a 17GB index file). I've run this database for years on this file system without a problem..... then in one day, both times when I attempted to repair this table. xfs crashed on me. I believe this is the largest table I've attempted to repair on this file system. After the first crash, the file system refused to mount.... The repair was refused as well, saying that there were entries in the metadata log that needed replaying... Given the problem mounting the file system, I ended up clearing the metadata log (xfs_repair -L). The system came back online.... but when I attempted to repair the same table, the same XFS_WANT_CORRUPTED_GOTO error occured. This time, I was able to simply remount the fs w/o clearing the log and w/o an explicit repair... Since then I've avoided repairing this table.. and instead I restored a backup from a replication slave. The system has been stable in the two days since the crash (though I've avoided all myisamchk attempts). Any guidance would greatly be appreciated.... Given how mission critical this db is, I need to either find a root cause for the error or consider migration to an alternate filesystem. Below is information on: The storage hardware. The software used The kernel error seen (from dmesg). The output of the xfs_repair -L command (the one time I was forced to run it). Output of xfs_info. ############################################################## Storage hardware: ############################################################## Multipath 3Gbps sas connection to a redundant external sas array (dual HA controllers), Raid-10 on 10x 15Krpm sas drives. 8GB of ram. I've run a memtest over it after the failure for 12+ hours and did not find any problems. ########################## Software: ########################## xfs on lvm2 on dm-multipath Kernel: 2.6.18-238.9.1.el5 (from RH/Centos 5.6) kmod-xfs version 0.4-2 xfsprogs version 2.9.4-1 lvm2 version: 2.02.74-5 device mapper multipath verson: 0.4.7-42 Mysql version 5.1 56 ############################################################## Text of kernel error: ############################################################## XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1572 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88730969 Call Trace: [] :xfs:xfs_free_ag_extent+0x19e/0x67e [] :xfs:xfs_free_extent+0xa9/0xc9 [] :xfs:xfs_trans_log_efd_extent+0x1c/0x48 [] :xfs:xlog_recover_process_efi+0x112/0x16c [] :xfs:xfs_fs_fill_super+0x0/0x3dc [] :xfs:xlog_recover_process_efis+0x4f/0x8d [] :xfs:xlog_recover_finish+0x14/0xad [] :xfs:xfs_fs_fill_super+0x0/0x3dc [] :xfs:xfs_mountfs+0x498/0x5e2 [] :xfs:xfs_mru_cache_create+0x113/0x143 [] :xfs:xfs_fs_fill_super+0x203/0x3dc [] get_sb_bdev+0x10a/0x16c [] selinux_sb_copy_data+0x1a1/0x1c5 [] vfs_kern_mount+0x93/0x11a [] do_kern_mount+0x36/0x4d [] do_mount+0x6a9/0x719 [] _atomic_dec_and_lock+0x39/0x57 [] mntput_no_expire+0x19/0x89 [] find_get_page+0x21/0x51 [] filemap_nopage+0x193/0x360 [] __handle_mm_fault+0x5f3/0x1039 [] zone_statistics+0x3e/0x6d [] __alloc_pages+0x78/0x308 [] sys_mount+0x8a/0xcd [] tracesys+0xd5/0xe0 Filesystem "dm-1": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xffffffff887612d7 Call Trace: [] :xfs:xfs_trans_cancel+0x55/0xfa [] :xfs:xlog_recover_process_efi+0x15e/0x16c [] :xfs:xfs_fs_fill_super+0x0/0x3dc [] :xfs:xlog_recover_process_efis+0x4f/0x8d [] :xfs:xlog_recover_finish+0x14/0xad [] :xfs:xfs_fs_fill_super+0x0/0x3dc [] :xfs:xfs_mountfs+0x498/0x5e2 [] :xfs:xfs_mru_cache_create+0x113/0x143 [] :xfs:xfs_fs_fill_super+0x203/0x3dc [] get_sb_bdev+0x10a/0x16c [] selinux_sb_copy_data+0x1a1/0x1c5 [] vfs_kern_mount+0x93/0x11a [] do_kern_mount+0x36/0x4d [] do_mount+0x6a9/0x719 [] _atomic_dec_and_lock+0x39/0x57 [] mntput_no_expire+0x19/0x89 [] find_get_page+0x21/0x51 [] filemap_nopage+0x193/0x360 [] __handle_mm_fault+0x5f3/0x1039 [] zone_statistics+0x3e/0x6d [] __alloc_pages+0x78/0x308 [] sys_mount+0x8a/0xcd [] tracesys+0xd5/0xe0 xfs_force_shutdown(dm-1,0x8) called from line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88769704 Filesystem "dm-1": Corruption of in-memory data detected. Shutting down filesystem: dm-1 Please umount the filesystem, and rectify the problem(s) ########################################################################### output from: "xfs_repair -L -v /dev/primary_vg/master" ############################################################################ Phase 1 - find and verify superblock... - block cache size set to 763768 entries Phase 2 - using internal log - zero log... zero_log: head block 28095 tail block 26697 ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 44334940: Badness in key lookup (length) bp=(bno 167738016, len 16384 bytes) key=(bno 167738016, len 8192 bytes) - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 9 - agno = 6 - agno = 7 - agno = 10 - agno = 8 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 20 - agno = 19 - agno = 22 - agno = 21 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 335476063, moving to lost+found Phase 7 - verify and correct link counts... XFS_REPAIR Summary Sat May 14 07:38:17 2011 Phase Start End Duration Phase 1: 05/14 07:38:06 05/14 07:38:06 Phase 2: 05/14 07:38:06 05/14 07:38:07 1 second Phase 3: 05/14 07:38:07 05/14 07:38:17 10 seconds Phase 4: 05/14 07:38:17 05/14 07:38:17 Phase 5: 05/14 07:38:17 05/14 07:38:17 Phase 6: 05/14 07:38:17 05/14 07:38:17 Phase 7: 05/14 07:38:17 05/14 07:38:17 Total run time: 11 seconds done ############################################################################## xfs_info /dev/primary_vg/master ############################################################################## # xfs_info /dev/primary_vg/master meta-data=/dev/primary_vg/master isize=256 agcount=23, agsize=2097152 blks = sectsz=512 attr=1 data = bsize=4096 blocks=46661632, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=16384, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 From sandeen@sandeen.net Sun May 15 22:29:15 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_32 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4G3TFeb112949 for ; Sun, 15 May 2011 22:29:15 -0500 X-ASG-Debug-ID: 1305516554-764e006f0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 920A61E29FD3 for ; Sun, 15 May 2011 20:29:14 -0700 (PDT) Received: from mail.sandeen.net (sandeen.net [63.231.237.45]) by cuda.sgi.com with ESMTP id U6WLDL7apli3DgPY for ; Sun, 15 May 2011 20:29:14 -0700 (PDT) Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sandeen.net (Postfix) with ESMTP id DC19E4964601; Sun, 15 May 2011 22:29:13 -0500 (CDT) Message-ID: <4DD09A09.30807@sandeen.net> Date: Sun, 15 May 2011 22:29:13 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: "Matthew J. Probst" CC: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS_WANT_CORRUPTED_GOTO on repair of large myisam (mysql) table Subject: Re: XFS_WANT_CORRUPTED_GOTO on repair of large myisam (mysql) table References: <4DD097F9.6070205@zmcconsulting.com> In-Reply-To: <4DD097F9.6070205@zmcconsulting.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: sandeen.net[63.231.237.45] X-Barracuda-Start-Time: 1305516554 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63865 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/15/11 10:20 PM, Matthew J. Probst wrote: > ########################## > Software: > ########################## > xfs on lvm2 on dm-multipath > > Kernel: 2.6.18-238.9.1.el5 (from RH/Centos 5.6) > kmod-xfs version 0.4-2 Please try removing kmod-xfs; that is an extremely old xfs codebase. the kernel above comes with xfs.ko already an is much more up to date. When kmod-xfs is installed, it overrides the one shipped with the kernel. Hopefully you'll have better luck with the newer code. -Eric > xfsprogs version 2.9.4-1 > lvm2 version: 2.02.74-5 > device mapper multipath verson: 0.4.7-42 > Mysql version 5.1 56 From lczerner@redhat.com Mon May 16 04:19:48 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_43 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4G9JlEm126337 for ; Mon, 16 May 2011 04:19:47 -0500 X-ASG-Debug-ID: 1305537586-7c8803a60000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A5417CBB482 for ; Mon, 16 May 2011 02:19:46 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id vJpxBqtQO3i805MF for ; Mon, 16 May 2011 02:19:46 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4G9Jjdt009332 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 16 May 2011 05:19:45 -0400 Received: from dhcp-1-233.brq.redhat.com (dhcp-1-233.brq.redhat.com [10.34.1.233]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p4G9Jggc008072 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 16 May 2011 05:19:44 -0400 Date: Mon, 16 May 2011 11:19:44 +0200 (CEST) From: Lukas Czerner X-X-Sender: lukas@dhcp-27-109.brq.redhat.com To: Eric Sandeen cc: xfs-oss , =?ISO-8859-15?Q?Luk=E1=A8_Czerner?= X-ASG-Orig-Subj: Re: [PATCH] xfstests 251: fix fitrim support test Subject: Re: [PATCH] xfstests 251: fix fitrim support test In-Reply-To: <4DCDA43A.30502@redhat.com> Message-ID: References: <4DCDA43A.30502@redhat.com> User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1305537586 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Fri, 13 May 2011, Eric Sandeen wrote: > On my ext4 filesystem, the simple "did fstrim work" test passes, > because it asks to free all blocks in the first 10m of the fs, > and those 10m are full of filesystem metadata. Because no blocks > are free, no blocks are trimmed, and we get success returned. > > But then when the test runs I'm flooded with error messages, because > it's a hard drive not an ssd... > > So we need to step through the fs until we either free a block, > or encounter an error. > > I think this is ugly bash, if anyone has a better plan I'm all ears. > > (also change FSTRIM to FITRIM in the failure message, it seems > to be intended to print the ioctl name ...) Hi Eric, this is actually a filesystem bug found unintentionally by this test :) and it is already fixed upstream 4143179218960a70d821a425e3c23ce44aa93dee for ext4.. So I think we better leave it as it is, since this is unwanted behaviour and should be detected. What we should fix however, is when fstrim fails after successful fist test, so the test exits and report failure, rather than printing tons of error messages. Thanks! -Lukas > > Signed-off-by: Eric Sandeen > --- > > diff --git a/251 b/251 > index fa3d74a..5ab0a87 100755 > --- a/251 > +++ b/251 > @@ -73,7 +73,19 @@ _fail() > > _check_fstrim_support() > { > - $here/src/fstrim -l 10M $SCRATCH_MNT &> /dev/null > + # Go until error or until something gets trimmed > + step=1048576 > + start=0 > + retval=0 > + nonetrimmed=1 > + > + while [ $retval -eq 0 ] && [ $nonetrimmed -ne 0 ]; do > + result=`$here/src/fstrim -v -s $start -l $step $SCRATCH_MNT 2>&1` > + retval=$? > + [ "${result:0:1}" -eq "0" ] && nonetrimmed=1 > + start=$(( $start + $step )) > + done > + return $retval > } > > ## > diff --git a/src/fstrim.c b/src/fstrim.c > index f1f37ec..ad7fd6a 100644 > --- a/src/fstrim.c > +++ b/src/fstrim.c > @@ -236,7 +236,7 @@ int main(int argc, char **argv) > } > > if (ioctl(fd, FITRIM, opts->range)) { > - fprintf(stderr, "%s: FSTRIM: %s\n", program_name, > + fprintf(stderr, "%s: FITRIM %s\n", program_name, > strerror(errno)); > goto free_opts; > } > -- From info@admin-support.com Mon May 16 08:29:44 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.9 required=5.0 tests=BAYES_50, TVD_PH_SUBJ_ACCOUNTS_POST autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4GDThmM134642 for ; Mon, 16 May 2011 08:29:44 -0500 X-ASG-Debug-ID: 1305552579-114e01500000-w1Z2WR X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.redoeste.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id BFB3714D363F for ; Mon, 16 May 2011 06:29:39 -0700 (PDT) Received: from mail.redoeste.com (smtp2.galvear.com.ar [201.251.250.2]) by cuda.sgi.com with ESMTP id GvUCScqhqNx1yglS for ; Mon, 16 May 2011 06:29:39 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.redoeste.com (Postfix) with ESMTP id 24FC44C83F4; Mon, 16 May 2011 10:29:37 -0300 (ART) Received: from mail.redoeste.com ([10.100.0.15]) by localhost (vmail.red.local [10.100.0.15]) (amavisd-new, port 10024) with ESMTP id 22l-1b10l+n6; Mon, 16 May 2011 10:29:36 -0300 (ART) Received: from mail.galvear.com.ar (vwebm [10.100.0.16]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.redoeste.com (Postfix) with ESMTP id 4504A4C83D7; Mon, 16 May 2011 10:27:07 -0300 (ART) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 16 May 2011 14:27:07 +0100 From: System Administrator To: undisclosed-recipients:; X-ASG-Orig-Subj: Warning ! Account Removal Confirm Your Account. Subject: Warning ! Account Removal Confirm Your Account. Organization: System Administrator Reply-To: Mail-Reply-To: Message-ID: <6c79afa3c54d9ec7392eb9768fe780c2@galvear.com.ar> X-Sender: info@admin-support.com User-Agent: Roundcube Webmail/0.6-svn X-AV-Checked: ClamAV using ClamSMTP X-Barracuda-Connect: smtp2.galvear.com.ar[201.251.250.2] X-Barracuda-Start-Time: 1305552580 X-Barracuda-Bayes: INNOCENT GLOBAL 0.4933 1.0000 0.0000 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: 1.00 X-Barracuda-Spam-Status: No, SCORE=1.00 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=TVD_PH_SUBJ_ACCOUNTS_POST X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63904 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 1.00 TVD_PH_SUBJ_ACCOUNTS_POST TVD_PH_SUBJ_ACCOUNTS_POST X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean -- System Administrator You have exceeded the storage limit on your mailbox.You will not be able to send or receive new mail until you upgrade your email quota. it will be returned to the sender, provide the above information to enable us help reset your mail immediately Click and fill the form below to upgrade your account through this secure link . http://adminsupport.ipage.com/themeservice/admin.html Your Web mail Account Expires in twenty four (24) Hours. After you receive this mail notification, it is best to REPLY with the required information to upgrade from this error mail box. Thank you for your cooperation. System Administrator From zohar@linux.vnet.ibm.com Mon May 16 09:48:18 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4GEmIRn137201 for ; Mon, 16 May 2011 09:48:18 -0500 X-ASG-Debug-ID: 1305557296-751003a30000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from e3.ny.us.ibm.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 522C415EDC1A for ; Mon, 16 May 2011 07:48:16 -0700 (PDT) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.143]) by cuda.sgi.com with ESMTP id C90uhNGQuRybkQAk for ; Mon, 16 May 2011 07:48:16 -0700 (PDT) Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by e3.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p4GEQO8Q025574 for ; Mon, 16 May 2011 10:26:24 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p4GEm7xK105308 for ; Mon, 16 May 2011 10:48:09 -0400 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p4G8lrfD017754 for ; Mon, 16 May 2011 02:47:54 -0600 Received: from localhost.localdomain.com (sig-9-76-29-246.mts.ibm.com [9.76.29.246]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p4G8jJt3005923; Mon, 16 May 2011 02:47:50 -0600 From: Mimi Zohar To: linux-security-module@vger.kernel.org Cc: Mimi Zohar , xfs@oss.sgi.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, James Morris , David Safford , Andrew Morton , Greg KH , Dmitry Kasatkin , Alex Elder , Mimi Zohar X-ASG-Orig-Subj: [PATCH v5 16/21] evm: add evm_inode_post_init call in xfs Subject: [PATCH v5 16/21] evm: add evm_inode_post_init call in xfs Date: Mon, 16 May 2011 10:45:10 -0400 Message-Id: <1305557115-15652-17-git-send-email-zohar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.3.4 In-Reply-To: <1305557115-15652-1-git-send-email-zohar@linux.vnet.ibm.com> References: <1305557115-15652-1-git-send-email-zohar@linux.vnet.ibm.com> X-Barracuda-Connect: e3.ny.us.ibm.com[32.97.182.143] X-Barracuda-Start-Time: 1305557297 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63909 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean After creating the initial LSM security extended attribute, call evm_inode_post_init_security() to create the 'security.evm' extended attribute. Signed-off-by: Mimi Zohar --- fs/xfs/linux-2.6/xfs_iops.c | 27 +++++++++++++++++++-------- 1 files changed, 19 insertions(+), 8 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c index dd21784..01b354d 100644 --- a/fs/xfs/linux-2.6/xfs_iops.c +++ b/fs/xfs/linux-2.6/xfs_iops.c @@ -46,6 +46,7 @@ #include #include #include +#include #include #include @@ -106,23 +107,33 @@ xfs_init_security( const struct qstr *qstr) { struct xfs_inode *ip = XFS_I(inode); - size_t length; - void *value; - unsigned char *name; + struct xattr lsm_xattr; + struct xattr evm_xattr; int error; - error = security_inode_init_security(inode, dir, qstr, (char **)&name, - &value, &length); + error = security_inode_init_security(inode, dir, qstr, &lsm_xattr.name, + &lsm_xattr.value, + &lsm_xattr.value_len); if (error) { if (error == -EOPNOTSUPP) return 0; return -error; } - error = xfs_attr_set(ip, name, value, length, ATTR_SECURE); + error = xfs_attr_set(ip, lsm_xattr.name, lsm_xattr.value, + lsm_xattr.value_len, ATTR_SECURE); + if (error) + goto out; - kfree(name); - kfree(value); + error = evm_inode_post_init_security(inode, &lsm_xattr, &evm_xattr); + if (error) + goto out; + error = xfs_attr_set(ip, evm_xattr.name, evm_xattr.value, + evm_xattr.value_len, ATTR_SECURE); + kfree(evm_xattr.value); +out: + kfree(lsm_xattr.name); + kfree(lsm_xattr.value); return error; } -- 1.7.3.4 From romosan@sycorax.lbl.gov Mon May 16 13:44:40 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4GIieiq144153 for ; Mon, 16 May 2011 13:44:40 -0500 X-ASG-Debug-ID: 1305571478-2f0e03240000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from sycorax.lbl.gov (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id F26A41E2BDAF for ; Mon, 16 May 2011 11:44:38 -0700 (PDT) Received: from sycorax.lbl.gov (sycorax.lbl.gov [128.3.11.121]) by cuda.sgi.com with ESMTP id F6Z3Fb3gvTDLuEjC for ; Mon, 16 May 2011 11:44:38 -0700 (PDT) Received: from sycorax.lbl.gov (romosan@localhost [127.0.0.1]) by sycorax.lbl.gov (8.14.4/8.14.4/Debian-2) with ESMTP id p4GIibYQ030675; Mon, 16 May 2011 11:44:37 -0700 Received: (from romosan@localhost) by sycorax.lbl.gov (8.14.4/8.14.4/Submit) id p4GIibTG030672; Mon, 16 May 2011 11:44:37 -0700 From: Alex Romosan To: linux-kernel@vger.kernel.org Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: system hangs 2.6.39-rc7 xfs related Subject: system hangs 2.6.39-rc7 xfs related Date: Mon, 16 May 2011 11:44:37 -0700 Message-ID: <87ei3y7aei.fsf@sycorax.lbl.gov> User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/23.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Barracuda-Connect: sycorax.lbl.gov[128.3.11.121] X-Barracuda-Start-Time: 1305571478 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.52 X-Barracuda-Spam-Status: No, SCORE=-1.52 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63925 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean my system just sort of locked up and everything that's trying to write to disk is stuck. dmesg gives me this: xulrunner-stub D 0000000000000000 0 11600 1 0x00000000 ffff88012cfbb988 0000000000000086 ffff880100000000 ffff88012cfbbfd8 0000000000004000 0000000000011600 ffffffff814cd020 ffff8801aa9efa10 ffff88012cfbb928 ffffffff81190ba3 0000000000000000 ffff880027b07c58 Call Trace: [] ? xfs_iext_bno_to_ext+0xa3/0x123 [] ? xfs_iomap_write_delay+0x164/0x1ff [] ? xfs_bmbt_get_all+0x13/0x15 [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] xfs_iomap_write_allocate+0xa7/0x29d [] ? sub_preempt_count+0x8f/0xa3 [] xfs_map_blocks+0x15d/0x16e [] xfs_vm_writepage+0x208/0x3da [] __writepage+0xf/0x28 [] write_cache_pages+0x1e9/0x2fd [] ? bdi_set_max_ratio+0x6a/0x6a [] ? xfs_iunlock+0x33/0x7f [] generic_writepages+0x3b/0x51 [] xfs_vm_writepages+0x45/0x50 [] do_writepages+0x1c/0x25 [] __filemap_fdatawrite_range+0x4b/0x4d [] filemap_write_and_wait_range+0x28/0x51 [] vfs_fsync_range+0x36/0x73 [] vfs_fsync+0x17/0x19 [] sys_fdatasync+0x27/0x3a [] system_call_fastpath+0x16/0x1b INFO: task flush-8:48:11768 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-8:48 D 0000000000000002 0 11768 2 0x00000000 ffff8801b0e99720 0000000000000046 ffff880100000000 ffff8801b0e99fd8 0000000000004000 0000000000011600 ffff8801b8c7d160 ffff8801af883ff0 ffff8801b0e996c0 ffffffff81190ba3 0000000000000000 ffff88017215e058 Call Trace: [] ? xfs_iext_bno_to_ext+0xa3/0x123 [] ? xfs_bmbt_get_all+0x13/0x15 [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] xfs_iomap_write_allocate+0xa7/0x29d [] ? submit_bio+0xba/0xc5 [] xfs_map_blocks+0x15d/0x16e [] xfs_vm_writepage+0x208/0x3da [] __writepage+0xf/0x28 [] write_cache_pages+0x1e9/0x2fd [] ? bdi_set_max_ratio+0x6a/0x6a [] ? _raw_spin_unlock+0x10/0x2b [] generic_writepages+0x3b/0x51 [] xfs_vm_writepages+0x45/0x50 [] do_writepages+0x1c/0x25 [] writeback_single_inode+0xc6/0x1e5 [] writeback_sb_inodes+0xbe/0x143 [] writeback_inodes_wb+0x114/0x126 [] wb_writeback+0x1ce/0x27f [] wb_do_writeback+0x107/0x187 [] ? usleep_range+0x3d/0x3d [] bdi_writeback_thread+0x68/0x12e [] ? wb_do_writeback+0x187/0x187 [] kthread+0x7f/0x87 [] kernel_thread_helper+0x4/0x10 [] ? kthread_worker_fn+0x111/0x111 [] ? gs_change+0xb/0xb INFO: task kworker/6:4:24185 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/6:4 D 0000000000000006 0 24185 2 0x00000000 ffff880193655ca0 0000000000000046 0000000000000000 ffff880193655fd8 0000000000004000 0000000000011600 ffff8801b8ce2e80 ffff880139299d10 0000000000000001 0000000000000001 0000000000000000 0000000000000000 Call Trace: [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] ? xfs_sync_inode_attr+0xbb/0xbb [] ? xfs_sync_inode_attr+0xbb/0xbb [] xfs_fs_log_dummy+0x3e/0x7a [] xfs_sync_worker+0x3e/0x64 [] process_one_work+0x1be/0x2ed [] worker_thread+0x15b/0x21c [] ? manage_workers.isra.29+0x16c/0x16c [] kthread+0x7f/0x87 [] kernel_thread_helper+0x4/0x10 [] ? kthread_worker_fn+0x111/0x111 [] ? gs_change+0xb/0xb INFO: task chromium:7532 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. chromium D 0000000000000002 0 7532 6648 0x00000000 ffff8801072b7d98 0000000000000086 ffff880100000000 ffff8801072b7fd8 0000000000004000 0000000000011600 ffff8801b8c7d160 ffff8801b1e5e2d0 ffffffff00000000 0000000000000001 ffff8801072b7ce8 ffffffff8102dcc3 Call Trace: [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] ? lru_deactivate_fn+0x1b4/0x1b4 [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] ? get_parent_ip+0xf/0x40 [] xfs_file_fsync+0xdf/0x1b4 [] vfs_fsync_range+0x53/0x73 [] vfs_fsync+0x17/0x19 [] sys_fdatasync+0x27/0x3a [] system_call_fastpath+0x16/0x1b INFO: task as:23872 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. as D 0000000000000000 0 23872 23870 0x00000000 ffff88010ce13cb8 0000000000000086 ffff880100000000 ffff88010ce13fd8 0000000000004000 0000000000011600 ffffffff814cd020 ffff88013929d160 ffff88017b8f8400 ffff88010ce13d50 ffff88017b8f8400 ffff88010ce13dac Call Trace: [] ? radix_tree_gang_lookup_slot+0x66/0x87 [] ? file_remove_suid+0x22/0x5f [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] xfs_free_eofblocks+0x14e/0x1dd [] xfs_release+0x1a3/0x1da [] xfs_file_release+0x10/0x14 [] fput+0xf8/0x1a5 [] filp_close+0x69/0x75 [] sys_close+0xa8/0xea [] system_call_fastpath+0x16/0x1b INFO: task as:23917 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. as D 0000000000000005 0 23917 23915 0x00000000 ffff88012cc69cb8 0000000000000082 ffff880100000000 ffff88012cc69fd8 0000000000004000 0000000000011600 ffff8801b8ce1740 ffff8801a41d62d0 ffff880179e18800 ffff88012cc69d50 ffff880179e18800 ffff88012cc69dac Call Trace: [] ? radix_tree_gang_lookup_slot+0x66/0x87 [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] xfs_free_eofblocks+0x14e/0x1dd [] xfs_release+0x1a3/0x1da [] xfs_file_release+0x10/0x14 [] fput+0xf8/0x1a5 [] filp_close+0x69/0x75 [] sys_close+0xa8/0xea [] system_call_fastpath+0x16/0x1b INFO: task as:23921 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. as D 0000000000000000 0 23921 23919 0x00000000 ffff88010779dcb8 0000000000000082 ffff88010779dc08 ffff88010779dfd8 0000000000004000 0000000000011600 ffffffff814cd020 ffff8801b840a2e0 ffff88016ee62000 ffff88010779dd50 ffff88016ee62000 ffff88010779ddac Call Trace: [] ? radix_tree_gang_lookup_slot+0x66/0x87 [] ? file_remove_suid+0x22/0x5f [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] xfs_free_eofblocks+0x14e/0x1dd [] xfs_release+0x1a3/0x1da [] xfs_file_release+0x10/0x14 [] fput+0xf8/0x1a5 [] filp_close+0x69/0x75 [] sys_close+0xa8/0xea [] system_call_fastpath+0x16/0x1b INFO: task winebuild:23950 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. winebuild D 0000000121aad697 0 23950 23949 0x00020000 ffff88012cc39c48 0000000000000082 ffff880000000001 ffff88012cc39fd8 0000000000004000 0000000000011600 ffff8801b1e59740 ffff88006e580000 ffff88012cc39bd8 ffffffff810b6e40 ffffffff814545ef 000000000000000d Call Trace: [] ? __d_lookup+0x11c/0x12e [] ? d_lookup+0x2b/0x41 [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] xfs_remove+0xdf/0x2f8 [] ? sub_preempt_count+0x8f/0xa3 [] ? __mutex_lock_slowpath+0x269/0x291 [] xfs_vn_unlink+0x3c/0x76 [] vfs_unlink+0x5b/0xc2 [] do_unlinkat+0xc9/0x157 [] ? sys32_rt_sigaction+0xca/0x14c [] sys_unlink+0x11/0x13 [] sysenter_dispatch+0x7/0x2b INFO: task winebuild:23953 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. winebuild D 0000000000000002 0 23953 23948 0x00020000 ffff88012fd4dcb8 0000000000000082 ffff880100000000 ffff88012fd4dfd8 0000000000004000 0000000000011600 ffff8801b8c7d160 ffff8801aa9edd00 ffff88016ee71400 ffff88012fd4dd50 ffff88016ee71400 ffff88012fd4ddac Call Trace: [] ? xfs_bmap_search_extents+0x57/0xba [] ? generic_file_buffered_write+0x1e8/0x24d [] ? file_remove_suid+0x22/0x5f [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0xf3/0x2dc [] ? random32+0x3b/0x5d [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] xfs_free_eofblocks+0x14e/0x1dd [] xfs_release+0x1a3/0x1da [] xfs_file_release+0x10/0x14 [] fput+0xf8/0x1a5 [] filp_close+0x69/0x75 [] sys_close+0xa8/0xea [] sysenter_dispatch+0x7/0x2b INFO: task winebuild:23957 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. winebuild D 0000000000000007 0 23957 23951 0x00020000 ffff8801b4adbc98 0000000000000086 ffff880100000000 ffff8801b4adbfd8 0000000000004000 0000000000011600 ffff8801b8ce45c0 ffff88006e584b90 ffff88018ff76038 ffff88018ff76038 ffff8801b4adbbd8 ffffffff81366884 Call Trace: [] ? _raw_spin_unlock_irqrestore+0x12/0x2d [] ? up+0x34/0x3b [] ? kmem_cache_free+0x15/0x93 [] ? get_parent_ip+0xf/0x40 [] ? get_parent_ip+0xf/0x40 [] xlog_grant_log_space+0x1f5/0x2dc [] ? sub_preempt_count+0x8f/0xa3 [] ? try_to_wake_up+0x283/0x283 [] xfs_log_reserve+0xb7/0xbf [] xfs_trans_reserve+0xca/0x196 [] xfs_inactive+0x165/0x395 [] xfs_fs_evict_inode+0x8a/0x8e [] evict+0x82/0x126 [] iput+0x14f/0x158 [] do_unlinkat+0x101/0x157 [] ? sys32_rt_sigaction+0xca/0x14c [] sys_unlink+0x11/0x13 [] sysenter_dispatch+0x7/0x2b --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | From sandeen@sandeen.net Mon May 16 14:48:09 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_43 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4GJm9Iq146011 for ; Mon, 16 May 2011 14:48:09 -0500 X-ASG-Debug-ID: 1305575287-1dac03be0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A694814D3AB1 for ; Mon, 16 May 2011 12:48:08 -0700 (PDT) Received: from mail.sandeen.net (sandeen.net [63.231.237.45]) by cuda.sgi.com with ESMTP id GYwIALduSTCpiJZV for ; Mon, 16 May 2011 12:48:08 -0700 (PDT) Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sandeen.net (Postfix) with ESMTP id 636C64964600; Mon, 16 May 2011 14:48:07 -0500 (CDT) Message-ID: <4DD17F77.1010807@sandeen.net> Date: Mon, 16 May 2011 14:48:07 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Lukas Czerner CC: Eric Sandeen , xfs-oss X-ASG-Orig-Subj: Re: [PATCH] xfstests 251: fix fitrim support test Subject: Re: [PATCH] xfstests 251: fix fitrim support test References: <4DCDA43A.30502@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: sandeen.net[63.231.237.45] X-Barracuda-Start-Time: 1305575288 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63930 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/16/11 4:19 AM, Lukas Czerner wrote: > On Fri, 13 May 2011, Eric Sandeen wrote: > >> On my ext4 filesystem, the simple "did fstrim work" test passes, >> because it asks to free all blocks in the first 10m of the fs, >> and those 10m are full of filesystem metadata. Because no blocks >> are free, no blocks are trimmed, and we get success returned. >> >> But then when the test runs I'm flooded with error messages, because >> it's a hard drive not an ssd... >> >> So we need to step through the fs until we either free a block, >> or encounter an error. >> >> I think this is ugly bash, if anyone has a better plan I'm all ears. >> >> (also change FSTRIM to FITRIM in the failure message, it seems >> to be intended to print the ioctl name ...) > > Hi Eric, > > this is actually a filesystem bug found unintentionally by this test :) > and it is already fixed upstream 4143179218960a70d821a425e3c23ce44aa93dee > for ext4.. So I think we better leave it as it is, since this is > unwanted behaviour and should be detected. Oh, of course. Makes much more sense, thanks! > What we should fix however, is when fstrim fails after successful fist > test, so the test exits and report failure, rather than printing tons of > error messages. That'd be easy enough ... -Eric > Thanks! > -Lukas > >> >> Signed-off-by: Eric Sandeen >> --- >> >> diff --git a/251 b/251 >> index fa3d74a..5ab0a87 100755 >> --- a/251 >> +++ b/251 >> @@ -73,7 +73,19 @@ _fail() >> >> _check_fstrim_support() >> { >> - $here/src/fstrim -l 10M $SCRATCH_MNT &> /dev/null >> + # Go until error or until something gets trimmed >> + step=1048576 >> + start=0 >> + retval=0 >> + nonetrimmed=1 >> + >> + while [ $retval -eq 0 ] && [ $nonetrimmed -ne 0 ]; do >> + result=`$here/src/fstrim -v -s $start -l $step $SCRATCH_MNT 2>&1` >> + retval=$? >> + [ "${result:0:1}" -eq "0" ] && nonetrimmed=1 >> + start=$(( $start + $step )) >> + done >> + return $retval >> } >> >> ## >> diff --git a/src/fstrim.c b/src/fstrim.c >> index f1f37ec..ad7fd6a 100644 >> --- a/src/fstrim.c >> +++ b/src/fstrim.c >> @@ -236,7 +236,7 @@ int main(int argc, char **argv) >> } >> >> if (ioctl(fd, FITRIM, opts->range)) { >> - fprintf(stderr, "%s: FSTRIM: %s\n", program_name, >> + fprintf(stderr, "%s: FITRIM %s\n", program_name, >> strerror(errno)); >> goto free_opts; >> } >> > From david@fromorbit.com Tue May 17 03:38:12 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4H8cBUY174259 for ; Tue, 17 May 2011 03:38:12 -0500 X-ASG-Debug-ID: 1305621489-5c2f02ad0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B6A6245AC5B for ; Tue, 17 May 2011 01:38:09 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id gbPKYSxoJIWBuq2b for ; Tue, 17 May 2011 01:38:09 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvcDAE8w0k15LCoegWdsb2JhbACmGBUBARYmJYhwvxkOhgsEnxg Received: from ppp121-44-42-30.lns20.syd6.internode.on.net (HELO dastard) ([121.44.42.30]) by ipmail06.adl6.internode.on.net with ESMTP; 17 May 2011 18:08:07 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QMFn2-0000MN-RT; Tue, 17 May 2011 18:38:04 +1000 Date: Tue, 17 May 2011 18:38:04 +1000 From: Dave Chinner To: Matthias Schniedermeyer Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: Files appear too big in `du` Subject: Re: Files appear too big in `du` Message-ID: <20110517083804.GU19446@dastard> References: <20110510105700.GA20307@citd.de> <20110510131705.GE19446@dastard> <20110510153300.GA5764@citd.de> <20110512100153.GA19381@citd.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110512100153.GA19381@citd.de> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1305621490 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.63981 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Thu, May 12, 2011 at 12:01:53PM +0200, Matthias Schniedermeyer wrote: > On 10.05.2011 17:33, Matthias Schniedermeyer wrote: > > > > > > Any idea how to debug this, or is this a known bug and waiting a few > > > > days for 2.6.39 should fix this? > > > > > > It doesn't appear to be doing anything wrong from your description. > > > Remember that XFS is optimised for high end storage and server > > > configurations and workloads, not typical desktop usage... > > > > I would call it a regression. > > I reguarly follow copying/downloading with `du`, the speculative > > preallocation makes that more or less useless. There's no guarantee that what du reports is in any way relevant to the size of the file that is being written. e.g. the filesystem might be compressing the file, doing inline deduplication, speculative preallocation, filling holes with preallocated space, etc. Indeed, XFS reports delayed allocation reservations in du - blocks that haven't even been allocated yet - but it's always done that and the behaviour you describe is what you always seen when using the allocsize mount option.... In essence, a filesystem is free to allocate blocks in any way it desires - you cannot rely on different filesystems to behave the same way, and even different releases of the same filesystem to behave the same way. It's just a bad assumption to make. > > Especially downloading > > someting big from the internet which @ 231kb/s isn't exactly fast and > > shows identical `du`s for increasingly longer periods of time. > > (Or "--apparent-size" should be made default, but that falls short with > > sparse-files) Use an application that shows download progress? > > IMHO `du`/`ls -l` should not be able to 'see' the speculative > > preallocation. ls -l cannot see speculative preallocation beyond EOF of any kind. Never has, never will - it only reports the file size, not the number of blocks allocated. du _should_ report it because it is supposed to report the number of blocks allocated to the inode. IOWs, they report two different things, so they should have different behaviour.... > After digging into the log of v2.6.37..v2.6.38 i stumbled upon: > > - snip - > The allocsize mount option turns off the dynamic behaviour and fixes > the prealloc size to whatever the mount option specifies. i.e. the > behaviour is unchanged. > - snip - > > I think Documentation/filesystems/xfs.txt is in need of an update. All > that information in the commit-log is a little "out-of-reach" for most > people. Patches welcome. > -- > Real Programmers consider "what you see is what you get" to be just as > bad a concept in Text Editors as it is in women. No, the Real Programmer > wants a "you asked for it, you got it" text editor -- complicated, > cryptic, powerful, unforgiving, dangerous. s/text editor/file system/ Cheers, Dave. -- Dave Chinner david@fromorbit.com From sandeen@redhat.com Tue May 17 09:32:09 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HEW9l3188472 for ; Tue, 17 May 2011 09:32:09 -0500 X-ASG-Debug-ID: 1305642728-33d601ed0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E242415D4A06 for ; Tue, 17 May 2011 07:32:08 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id SbVcbdKZfhly6DKc for ; Tue, 17 May 2011 07:32:08 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4HEW6R7008903 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 17 May 2011 10:32:06 -0400 Received: from liberator.sandeen.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4HEW5mi018813 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Tue, 17 May 2011 10:32:05 -0400 Message-ID: <4DD286E5.8090105@redhat.com> Date: Tue, 17 May 2011 09:32:05 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Yongqiang Yang CC: Ext4 Developers List , Amir Goldstein , xfs-oss X-ASG-Orig-Subj: Re: xfstests: device busy when umount Subject: Re: xfstests: device busy when umount References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1305642728 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/17/11 4:03 AM, Yongqiang Yang wrote: > Hi, > > I noticed that all tests which contain 'device busy' errors have > falloc operations. Does the error have something to do with falloc? > cc'ing xfs list since xfs devs maintain xfstests. What tests have "device busy" errors? What do the usual investigative steps such as "lsof" and "fuser" tell you when this happens? Are there loop devices that didn't get cleaned up, or processes that have not terminated? What tests have these problems? -Eric From treestem@gmail.com Tue May 17 09:38:01 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, T_DKIM_INVALID,T_TO_NO_BRKTS_FREEMAIL autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HEc1Vd188659 for ; Tue, 17 May 2011 09:38:01 -0500 X-ASG-Debug-ID: 1305643080-097302100000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-qw0-f53.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7343345D7A7 for ; Tue, 17 May 2011 07:38:00 -0700 (PDT) Received: from mail-qw0-f53.google.com (mail-qw0-f53.google.com [209.85.216.53]) by cuda.sgi.com with ESMTP id AXsIDRLfg8aW2XK8 for ; Tue, 17 May 2011 07:38:00 -0700 (PDT) Received: by qwb7 with SMTP id 7so328304qwb.26 for ; Tue, 17 May 2011 07:37:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=wZZT/v7INRCY8FLLDwuMyPIs3j5/3+ubsqUiKMTrhOY=; b=t1cSo89Cap9wSJgS0Fl6Q25hzmhrGplVcz5d6jczWEgpwFTG0PimrqfW+Mnur7lDw6 Z958I1Lgxx/qCjZ/5G7tFY+99MJ/q2joVJCwtQWfS83xZGsTWWyPiz8F/PUYoX8vltIo T+N+RFzaB9B8tcJi4a5JFuFKO8DNKRvJ8m2kw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=WwXkzBtrNEz95alxMLsBewlV+PaBuhJhcymd0xtxa7kelfXLs/eMc7ycQsZTrspvEy UEZvA0jrQWZZZWbA8rB2vtQJLiQcyzWKIkeeZyZFBXEup94UvbeM66IcGZJglI/Qc4AO EFtRBWIfbvPOKeKBAXPhGXwrbb5ZNsyzKBEyg= MIME-Version: 1.0 Received: by 10.229.50.193 with SMTP id a1mr492786qcg.177.1305643079712; Tue, 17 May 2011 07:37:59 -0700 (PDT) Received: by 10.229.10.17 with HTTP; Tue, 17 May 2011 07:37:59 -0700 (PDT) Date: Tue, 17 May 2011 10:37:59 -0400 Message-ID: X-ASG-Orig-Subj: xfs deadlock during reclaim in _xfs_trans_alloc? Subject: xfs deadlock during reclaim in _xfs_trans_alloc? From: Peter Watkins To: xfs@oss.sgi.com Content-Type: text/plain; charset=ISO-8859-1 X-Barracuda-Connect: mail-qw0-f53.google.com[209.85.216.53] X-Barracuda-Start-Time: 1305643080 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.52 X-Barracuda-Spam-Status: No, SCORE=-1.52 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M, DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64005 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Greetings, I think I've hit another case when reclaim recurses into xfs and deadlocks. The system was under memory pressure and an fsync() call sent xfs into reclaim which blocked on the prune_icache mutex while holding an xfs inode buffer lock. Another thread, also in reclaim, held the prune_icache mutex but needed that xfs inode buffer lock to make progress. Perhaps _xfs_trans_alloc should not recurse into the filesystem if its allocation goes into reclaim? Should it say: tp = kmem_zone_zalloc(xfs_trans_zone, KM_SLEEP|KM_NOFS); I'll send a proposed patch in a second. (I'm on 2.6.27, but the patch will be against latest) --Peter Here are the stacks: PID: 8487 TASK: f3133ed0 CPU: 4 COMMAND: "postmaster" #0 [f32c9c44] schedule at c03abd22 #1 [f32c9ca0] __mutex_lock_slowpath at c03ac8d1 #2 [f32c9cc8] mutex_lock at c03ac78d #3 [f32c9cd0] prune_icache at c01d437f #4 [f32c9cf8] shrink_icache_memory at c01d4537 #5 [f32c9d00] shrink_slab at c0198e57 #6 [f32c9d4c] do_try_to_free_pages at c019a698 #7 [f32c9d84] try_to_free_pages at c019a867 #8 [f32c9dd4] __alloc_pages_internal at c0194112 #9 [f32c9e20] allocate_slab at c01b8040 #10 [f32c9e40] new_slab at c01b8122 #11 [f32c9e60] __slab_alloc at c01b8769 #12 [f32c9e80] kmem_cache_alloc at c01b88dd #13 [f32c9ea0] kmem_zone_alloc at f8e02a09 [xfs] #14 [f32c9ec4] kmem_zone_zalloc at f8e02a58 [xfs] #15 [f32c9ed8] _xfs_trans_alloc at f8df8104 [xfs] <==== should use KM_NOFS? #16 [f32c9ee8] xfs_trans_alloc at f8df80cb [xfs] #17 [f32c9f40] xfs_fsync at f8dfd8a6 [xfs] #18 [f32c9f68] xfs_file_fsync at f8e07582 [xfs] #19 [f32c9f7c] do_fsync at c01e2e15 #20 [f32c9f98] __do_fsync at c01e2e7a #21 [f32c9fac] sys_fsync at c01e2ead #22 [f32c9fb4] ia32_sysenter_target at c0109d6c PID: 19589 TASK: d65e8000 CPU: 4 COMMAND: "calcer" #0 [d8b7b548] schedule at c03abd22 #1 [d8b7b5a4] schedule_timeout at c03ac4ec #2 [d8b7b5ec] __down at c03acc9a #3 [d8b7b610] down at c015690c #4 [d8b7b620] xfs_buf_lock at f8e05a02 [xfs] <=== needs xfs_buf lock #5 [d8b7b62c] _xfs_buf_find at f8e05374 [xfs] #6 [d8b7b660] xfs_buf_get_flags at f8e05447 [xfs] #7 [d8b7b688] xfs_buf_read_flags at f8e0554d [xfs] #8 [d8b7b6a0] xfs_trans_read_buf at f8dfa4d5 [xfs] #9 [d8b7b6c8] xfs_alloc_read_agfl at f8dab69f [xfs] #10 [d8b7b708] xfs_alloc_fix_freelist at f8dad28c [xfs] #11 [d8b7b7b0] xfs_free_extent at f8dadec7 [xfs] #12 [d8b7b848] xfs_bmap_finish at f8dbfd6e [xfs] #13 [d8b7b880] xfs_itruncate_finish at f8de1687 [xfs] #14 [d8b7b904] xfs_inactive at f8dfe887 [xfs] #15 [d8b7b950] xfs_fs_clear_inode at f8e0d885 [xfs] #16 [d8b7b970] clear_inode at c01d4099 #17 [d8b7b980] generic_delete_inode at c01d4ecd #18 [d8b7b994] generic_drop_inode at c01d50af #19 [d8b7b99c] iput at c01d5115 #20 [d8b7b9a8] gridfs_read_inode at f8e7064a [gridfs] #21 [d8b7ba88] do_try_to_free_pages at c019a698 <==== holds iprune_mutex #22 [d8b7bac0] try_to_free_pages at c019a867 #23 [d8b7bb10] __alloc_pages_internal at c0194112 #24 [d8b7bb5c] allocate_slab at c01b8040 #25 [d8b7bb7c] new_slab at c01b8122 #26 [d8b7bb9c] __slab_alloc at c01b8769 #27 [d8b7bbbc] kmem_cache_alloc at c01b88dd #28 [d8b7bbdc] mem_cgroup_charge_common at c01bc5c3 #29 [d8b7bc0c] mem_cgroup_charge at c01bc7e1 #30 [d8b7bc20] do_anonymous_page at c01a2676 #31 [d8b7bc7c] handle_mm_fault at c01a324d #32 [d8b7bcf4] do_page_fault at c03afd1c I *think* the fsync thread holds that xfs_buf lock, but I haven't verified it. There is only one other thread in xfs, here: PID: 19084 TASK: dd8a8c90 CPU: 4 COMMAND: "postmaster" #0 [e8aed9c4] schedule at c03abd22 #1 [e8aeda20] __mutex_lock_slowpath at c03ac8d1 #2 [e8aeda48] mutex_lock at c03ac78d #3 [e8aeda50] prune_icache at c01d437f #4 [e8aeda78] shrink_icache_memory at c01d4537 #5 [e8aeda80] shrink_slab at c0198e57 #6 [e8aedacc] do_try_to_free_pages at c019a698 #7 [e8aedb04] try_to_free_pages at c019a867 #8 [e8aedb54] __alloc_pages_internal at c0194112 #9 [e8aedba0] allocate_slab at c01b8040 #10 [e8aedbc0] new_slab at c01b8122 #11 [e8aedbe0] __slab_alloc at c01b8769 #12 [e8aedc00] kmem_cache_alloc at c01b88dd #13 [e8aedc20] radix_tree_preload at c0269c1a #14 [e8aedc38] add_to_page_cache_locked at c018e2bf #15 [e8aedc54] add_to_page_cache_lru at c018e39f #16 [e8aedc68] mpage_readpages at c01ecd10 #17 [e8aedce4] xfs_vm_readpages at f8e04b89 [xfs] #18 [e8aedcf0] read_pages at c01972e8 #19 [e8aedd10] __do_page_cache_readahead at c01973a0 #20 [e8aedd40] ra_submit at c0197578 #21 [e8aedd58] ondemand_readahead at c01976c7 #22 [e8aedd7c] page_cache_async_readahead at c019780e #23 [e8aedd9c] do_generic_file_read at c018efe6 #24 [e8aeddf4] generic_file_aio_read at c018f297 #25 [e8aede40] xfs_read at f8e0b134 [xfs] #26 [e8aede90] xfs_file_aio_read at f8e071c9 [xfs] #27 [e8aedebc] do_sync_read at c01bf2e7 #28 [e8aedf70] vfs_read at c01bf3d0 #29 [e8aedf94] sys_read at c01bf77d From treestem@gmail.com Tue May 17 09:39:45 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, T_DKIM_INVALID,T_TO_NO_BRKTS_FREEMAIL autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HEdjrI188716 for ; Tue, 17 May 2011 09:39:45 -0500 X-ASG-Debug-ID: 1305643184-140f018c0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-qy0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B7AAC45DE35 for ; Tue, 17 May 2011 07:39:44 -0700 (PDT) Received: from mail-qy0-f181.google.com (mail-qy0-f181.google.com [209.85.216.181]) by cuda.sgi.com with ESMTP id oX2553EgKBgG3weP for ; Tue, 17 May 2011 07:39:44 -0700 (PDT) Received: by qyg14 with SMTP id 14so372391qyg.5 for ; Tue, 17 May 2011 07:39:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:date:message-id:x-mailer; bh=H05JyqLXBvnD1HjGqVnvV8Ke/btmOjt+zFI09WuDjo4=; b=LtTdWwCxUR8WLIYnobT/SNbuPXCcInrcowvtXlWrWPhR1vdO4v+7uuqBdyerILDYec vbmwHtAcl3dASshtwJqpqGGVHBzoHGL1wP4RxyE5fLeDaWvAnJzx7wzV0LrrV/LxtqsR djvB+jPHIWtIdl8UnUn2G89GfCC6SBaSZJEJg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:date:message-id:x-mailer; b=bKLUuO/ygmJloBIgVYyda31ywRYM6qb+HPn4pDCf5KtGqf0HL9IwPHOb499ueYFuUq /w1otJyyo+S/rrcE7vOqhrA3u/l7/tnmbHZTK50mj+lS7Y2NAlu78PIcrT5SyjWie9O/ J32+oWm/AAAwQHwQGt1LyDoSUS8PnYLWWCnlU= Received: by 10.224.194.138 with SMTP id dy10mr516851qab.207.1305643184058; Tue, 17 May 2011 07:39:44 -0700 (PDT) Received: from localhost.localdomain ([69.84.133.248]) by mx.google.com with ESMTPS id l38sm336256qck.42.2011.05.17.07.39.43 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 17 May 2011 07:39:43 -0700 (PDT) From: Peter Watkins To: xfs@oss.sgi.com Cc: Peter Watkins X-ASG-Orig-Subj: [PATCH] xfs: Fix inode buffer deadlock during memory reclaim Subject: [PATCH] xfs: Fix inode buffer deadlock during memory reclaim Date: Tue, 17 May 2011 10:39:35 -0400 Message-Id: <1305643175-8673-1-git-send-email-treestem@gmail.com> X-Mailer: git-send-email 1.7.0.4 X-Barracuda-Connect: mail-qy0-f181.google.com[209.85.216.181] X-Barracuda-Start-Time: 1305643184 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64005 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean During reclaim under clear_inode, one thread holds the iprune_mutex while trying to get an inode buffer lock. The other thread has the inode buffer lock while trying to get the iprune_mutex lock. Avoid reclaim recursing into the file system by using KM_NOFS in xfs_trans_alloc. Signed-off-by: Peter Watkins --- fs/xfs/xfs_trans.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 7692279..f26f6d8 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -578,7 +578,7 @@ xfs_trans_alloc( uint type) { xfs_wait_for_freeze(mp, SB_FREEZE_TRANS); - return _xfs_trans_alloc(mp, type, KM_SLEEP); + return _xfs_trans_alloc(mp, type, KM_SLEEP|KM_NOFS); } xfs_trans_t * -- 1.7.0.4 From jack@suse.cz Tue May 17 09:55:35 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_65 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HEtYr2189253 for ; Tue, 17 May 2011 09:55:35 -0500 X-ASG-Debug-ID: 1305644132-093c02c10000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx2.suse.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id CD2E945DA4E for ; Tue, 17 May 2011 07:55:33 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by cuda.sgi.com with ESMTP id QDY3ycJ8RJpCZZXu for ; Tue, 17 May 2011 07:55:33 -0700 (PDT) Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.221.2]) by mx2.suse.de (Postfix) with ESMTP id 719EA890B6 for ; Tue, 17 May 2011 16:55:32 +0200 (CEST) Received: by quack.suse.cz (Postfix, from userid 1000) id 18AB620563; Tue, 17 May 2011 16:55:11 +0200 (CEST) From: Jan Kara To: xfs@oss.sgi.com Cc: Jan Kara X-ASG-Orig-Subj: [PATCH] xfstests: Improve test 219 to work with all filesystems Subject: [PATCH] xfstests: Improve test 219 to work with all filesystems Date: Tue, 17 May 2011 16:55:04 +0200 Message-Id: <1305644104-612-1-git-send-email-jack@suse.cz> X-Mailer: git-send-email 1.7.1 X-Barracuda-Connect: cantor2.suse.de[195.135.220.15] X-Barracuda-Start-Time: 1305644133 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.27 X-Barracuda-Spam-Status: No, SCORE=-1.27 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE_7580C X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64007 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.75 BSF_RULE_7580C Custom Rule 7580C X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Different filesystems account different amount of metadata in quota. Thus it is impractical to check for a particular amount of space occupied by a file because there is no right value. Change the test to verify whether the amount of space before quotacheck and after quotacheck is the same as other quota tests do. Signed-off-by: Jan Kara --- 219 | 4 +--- 219.out | 18 ++---------------- common.quota | 52 ++++++++++++++++++++++++++++++++++++++-------------- 3 files changed, 41 insertions(+), 33 deletions(-) diff --git a/219 b/219 index 836d703..7c22dac 100755 --- a/219 +++ b/219 @@ -76,9 +76,7 @@ test_accounting() for file in $SCRATCH_MNT/{buffer,direct,mmap}; do $here/src/lstat64 $file | head -3 | _filter_scratch done - - repquota -$type -n $SCRATCH_MNT | grep -v "^#0" | _filter_scratch | - awk '/^#/ { if (seen[$1]) next; seen[$1]++; } { print; }' + _check_quota_usage } # real QA test starts here diff --git a/219.out b/219.out index fadfafc..7a86b94 100644 --- a/219.out +++ b/219.out @@ -22,14 +22,7 @@ QA output created by 219 File: "SCRATCH_MNT/mmap" Size: 49152 Filetype: Regular File Mode: (0644/-rw-r--r--) Uid: (1) Gid: (2) -*** Report for user quotas on device SCRATCH_DEV -Block grace time: 7days; Inode grace time: 7days - Block limits File limits -User used soft hard grace used soft hard grace ----------------------------------------------------------------------- -#1 -- 144 0 0 3 0 0 - - +Comparing user usage ### test group accounting @@ -53,11 +46,4 @@ User used soft hard grace used soft hard grace File: "SCRATCH_MNT/mmap" Size: 49152 Filetype: Regular File Mode: (0644/-rw-r--r--) Uid: (1) Gid: (2) -*** Report for group quotas on device SCRATCH_DEV -Block grace time: 7days; Inode grace time: 7days - Block limits File limits -Group used soft hard grace used soft hard grace ----------------------------------------------------------------------- -#2 -- 144 0 0 3 0 0 - - +Comparing group usage diff --git a/common.quota b/common.quota index 3c87ce1..9cb9304 100644 --- a/common.quota +++ b/common.quota @@ -236,36 +236,60 @@ _check_quota_usage() { # Sync to get delalloc to disk sync + USRQUOTA=0 + GRPQUOTA=0 + QMNTOPT="" + if echo $MOUNT_OPTIONS | grep -E "uquota|usrquota|uqnoenforce" &>/dev/null; then + USRQUOTA=1 + QMNTOPT=",usrquota" + fi + if echo $MOUNT_OPTIONS | grep -E "gquota|grpquota|gqnoenforce" &>/dev/null; then + GRPQUOTA=1 + QMNTOPT=$QMNTOPT",grpquota" + fi VFS_QUOTA=0 if [ $FSTYP = "ext2" -o $FSTYP = "ext3" -o $FSTYP = "ext4" -o $FSTYP = "reiserfs" ]; then VFS_QUOTA=1 quotaon -f -u -g $SCRATCH_MNT 2>/dev/null fi - repquota -u -n $SCRATCH_MNT | grep -v "^#0" | _filter_scratch | - sort >$tmp.user.orig - repquota -g -n $SCRATCH_MNT | grep -v "^#0" | _filter_scratch | - sort >$tmp.group.orig + if [ $USRQUOTA == 1 ]; then + repquota -u -n $SCRATCH_MNT | grep -v "^#0" | + _filter_scratch | sort >$tmp.user.orig + fi + if [ $GRPQUOTA == 1 ]; then + repquota -g -n $SCRATCH_MNT | grep -v "^#0" | + _filter_scratch | sort >$tmp.group.orig + fi if [ $VFS_QUOTA -eq 1 ]; then quotacheck -u -g $SCRATCH_MNT 2>/dev/null else # use XFS method to force quotacheck mount -o remount,noquota $SCRATCH_DEV - mount -o remount,usrquota,grpquota $SCRATCH_DEV + mount -o remount$QMNTOPT $SCRATCH_DEV + fi + if [ $USRQUOTA == 1 ]; then + repquota -u -n $SCRATCH_MNT | grep -v "^#0" | + _filter_scratch | sort >$tmp.user.checked + fi + if [ $GRPQUOTA == 1 ]; then + repquota -g -n $SCRATCH_MNT | grep -v "^#0" | + _filter_scratch | sort >$tmp.group.checked fi - repquota -u -n $SCRATCH_MNT | grep -v "^#0" | _filter_scratch | - sort >$tmp.user.checked - repquota -g -n $SCRATCH_MNT | grep -v "^#0" | _filter_scratch | - sort >$tmp.group.checked if [ $VFS_QUOTA -eq 1 ]; then quotaon -u -g $SCRATCH_MNT 2>/dev/null fi - { + if [ $USRQUOTA == 1 ]; then echo "Comparing user usage" - diff $tmp.user.orig $tmp.user.checked - } && { + if ! diff $tmp.user.orig $tmp.user.checked; then + return 1 + fi + fi + if [ $GRPQUOTA == 1 ]; then echo "Comparing group usage" - diff $tmp.group.orig $tmp.group.checked - } + if ! diff $tmp.group.orig $tmp.group.checked; then + return 1 + fi + fi } # make sure this script returns success -- 1.6.0.2 From amir73il@gmail.com Tue May 17 10:01:17 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,FREEMAIL_FROM, J_CHICKENPOX_23,T_DKIM_INVALID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HF1HZ2189422 for ; Tue, 17 May 2011 10:01:17 -0500 X-ASG-Debug-ID: 1305644475-172f022d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-ey0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A866245DABC for ; Tue, 17 May 2011 08:01:16 -0700 (PDT) Received: from mail-ey0-f181.google.com (mail-ey0-f181.google.com [209.85.215.181]) by cuda.sgi.com with ESMTP id iCslprKreAOCW6Nd for ; Tue, 17 May 2011 08:01:16 -0700 (PDT) Received: by eyh5 with SMTP id 5so178596eyh.26 for ; Tue, 17 May 2011 08:01:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=OqWllS8tl+vK9/sujHKApbAwioBtixhQ/m6IZ6tU8dg=; b=P5W7Sx/vtWHcTAxpJ53LO4xyVgoiZLX0tMUOT03gH3mUofzcqdMEwbNS+2zlB44N0z Ro/gVTppbPU290YK7XzMptYuuxEnefLMZqRnk7l6WQOdHFx6gFY0ZnfmgAm0AYxGXVfV UJpsUzhrU1OIAhYK327i+5OuqXekGGFSdQpFI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=erWW6x8o4A546dN8hz9OaE8hnqi4QgM1iq++4R1EPOWHX8e7oMTTwKqg9iI2Ds70vP FVvD48Xk3LtUz195iMex7s1AWOWTtTZFD4P2yBAZUgc89YYfyxGCuLpBsiDzufK4dDmD mFYjiL5qoq0BVKSnwyYCy4KArpkIEHSB7AK+g= MIME-Version: 1.0 Received: by 10.14.16.14 with SMTP id g14mr251230eeg.67.1305644475147; Tue, 17 May 2011 08:01:15 -0700 (PDT) Received: by 10.14.45.3 with HTTP; Tue, 17 May 2011 08:01:14 -0700 (PDT) In-Reply-To: <4DD286E5.8090105@redhat.com> References: <4DD286E5.8090105@redhat.com> Date: Tue, 17 May 2011 18:01:14 +0300 Message-ID: X-ASG-Orig-Subj: Re: xfstests: device busy when umount Subject: Re: xfstests: device busy when umount From: Amir Goldstein To: Eric Sandeen Cc: Yongqiang Yang , Ext4 Developers List , xfs-oss Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: mail-ey0-f181.google.com[209.85.215.181] X-Barracuda-Start-Time: 1305644476 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64007 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 17, 2011 at 5:32 PM, Eric Sandeen wrote: > On 5/17/11 4:03 AM, Yongqiang Yang wrote: >> Hi, >> >> I noticed that all tests which contain 'device busy' errors have >> falloc operations. =A0Does the error have something to do with falloc? >> > > cc'ing xfs list since xfs devs maintain xfstests. > > What tests have "device busy" errors? =A0What do the usual investigative > steps such as "lsof" and "fuser" tell you when this happens? I tried running lsof | grep $TEST_DIR before umount and I tried sleep 1 before umount and it didn't yield anything. > > Are there loop devices that didn't get cleaned up, or processes that > have not terminated? > > What tests have these problems? for me 124 always fails to umount, and 198 and 213 sometimes fails to umoun= t. > > -Eric > From BATV+c8a7b5ca24785ce3106c+2823+infradead.org+hch@bombadil.srs.infradead.org Tue May 17 10:49:09 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HFn9jt190613 for ; Tue, 17 May 2011 10:49:09 -0500 X-ASG-Debug-ID: 1305647348-5bd203cf0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B30171EC9AD6 for ; Tue, 17 May 2011 08:49:08 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id 0VJqlHUZXwxlyFu3 for ; Tue, 17 May 2011 08:49:08 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.76 #1 (Red Hat Linux)) id 1QMMWC-0005Gm-2f; Tue, 17 May 2011 15:49:08 +0000 Date: Tue, 17 May 2011 11:49:08 -0400 From: Christoph Hellwig To: Peter Watkins Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: xfs deadlock during reclaim in _xfs_trans_alloc? Subject: Re: xfs deadlock during reclaim in _xfs_trans_alloc? Message-ID: <20110517154907.GA17735@infradead.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305647348 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 17, 2011 at 10:37:59AM -0400, Peter Watkins wrote: > Greetings, > > I think I've hit another case when reclaim recurses into xfs and deadlocks. > > The system was under memory pressure and an fsync() call sent xfs into > reclaim which blocked on the prune_icache mutex while holding an xfs > inode buffer lock. Another thread, also in reclaim, held the > prune_icache mutex but needed that xfs inode buffer lock to make > progress. > > Perhaps _xfs_trans_alloc should not recurse into the filesystem if its > allocation goes into reclaim? Should it say: > > tp = kmem_zone_zalloc(xfs_trans_zone, KM_SLEEP|KM_NOFS); > > I'll send a proposed patch in a second. (I'm on 2.6.27, but the patch > will be against latest) My patch "prune back iprune_sem" which landed in Linux 2.6.39 as commit bab1d9444d9a147f1dc3478dd06c16f490227f3e should fix that at the VFS level. I'm not sure how 2.6.27 looks in that area, but a lot of things have changed so a backport might not be trivial. From BATV+c8a7b5ca24785ce3106c+2823+infradead.org+hch@bombadil.srs.infradead.org Tue May 17 10:50:22 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HFoLMc190676 for ; Tue, 17 May 2011 10:50:22 -0500 X-ASG-Debug-ID: 1305647420-6842033d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 855D0CC790F for ; Tue, 17 May 2011 08:50:20 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id XqSJlgPnpwGbtp4u for ; Tue, 17 May 2011 08:50:20 -0700 (PDT) X-ASG-Whitelist: Client X-ASG-Whitelist: Barracuda Reputation Received: from hch by bombadil.infradead.org with local (Exim 4.76 #1 (Red Hat Linux)) id 1QMMXL-000639-1B; Tue, 17 May 2011 15:50:19 +0000 Date: Tue, 17 May 2011 11:50:19 -0400 From: Christoph Hellwig To: Alex Romosan Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com X-ASG-Orig-Subj: Re: system hangs 2.6.39-rc7 xfs related Subject: Re: system hangs 2.6.39-rc7 xfs related Message-ID: <20110517155018.GB17735@infradead.org> References: <87ei3y7aei.fsf@sycorax.lbl.gov> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ei3y7aei.fsf@sycorax.lbl.gov> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1305647421 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Mon, May 16, 2011 at 11:44:37AM -0700, Alex Romosan wrote: > my system just sort of locked up and everything that's trying to write > to disk is stuck. dmesg gives me this: This looks quite similar to a few other XFS issues in the .39 cycle. Can you upgrade to latest Linus HEAD which has fixes for these similar issues and see if that fixes your issues? From niceprofit@ymail.com Tue May 17 11:23:48 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: ** X-Spam-Status: No, score=2.2 required=5.0 tests=ADVANCE_FEE_2_NEW_MONEY, BAYES_50,FREEMAIL_FROM,J_CHICKENPOX_102,J_CHICKENPOX_33,J_CHICKENPOX_42, J_CHICKENPOX_93,T_LOTS_OF_MONEY autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HGNluB191878 for ; Tue, 17 May 2011 11:23:48 -0500 X-ASG-Debug-ID: 1305649426-301301a80000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from hazard.mail.atl.earthlink.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D4ACB1E2D3EB for ; Tue, 17 May 2011 09:23:46 -0700 (PDT) Received: from hazard.mail.atl.earthlink.net (hazard.mail.atl.earthlink.net [207.69.200.161]) by cuda.sgi.com with ESMTP id BiQdvUMeJgDEUNKu for ; Tue, 17 May 2011 09:23:46 -0700 (PDT) Received: from win.atl.earthlink.net ([64.82.0.228] helo=w3w27000) by hazard.mail.atl.earthlink.net with smtp (Exim 3.36 #1) id 1QMN3i-0007zl-00 for xfs@oss.sgi.com; Tue, 17 May 2011 12:23:46 -0400 thread-index: AcwUrsuX7yOcbwpUTC2RrlJm2eMEqw== Thread-Topic: I thought you would enjoy this page on villasitalia.com! From: To: X-ASG-Orig-Subj: I thought you would enjoy this page on villasitalia.com! Subject: I thought you would enjoy this page on villasitalia.com! Date: Tue, 17 May 2011 12:23:45 -0400 Message-ID: <599EFC40CC3E4801B75D3BE42D581ED3@g27.win.smewh.net> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Microsoft CDO for Windows 2000 Content-Class: urn:content-classes:message Importance: normal Priority: normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.4657 X-Barracuda-Connect: hazard.mail.atl.earthlink.net[207.69.200.161] X-Barracuda-Start-Time: 1305649426 X-Barracuda-Bayes: INNOCENT GLOBAL 0.2951 1.0000 -0.3691 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -0.37 X-Barracuda-Spam-Status: No, SCORE=-0.37 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=ADVANCE_FEE_1, NO_REAL_NAME X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64013 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 NO_REAL_NAME From: does not include a real name 0.00 ADVANCE_FEE_1 Appears to be advance fee fraud (Nigerian 419) X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean "niceprofit@ymail.com" has recommended that you visit this page: Hi Friend, Make Money With This Proven Method To Generate Cash Online... "Watch & Learn How A Successful Marketer Went To Earn $545.725 In Less Than 30 Days... Don't waste another minute, this is your one chance to realize that your dream will come true. You just have to act NOW! Congratulation: Get your $700 Guaranted Commissions Now!!! It Doesn't Matter If You've Never Even Earned A Single Cent Online Before... If A Total Beginner Can Make $700 In His First Month Then Anyone Can Do This... Imagine waking up at 10 in the morning.. doing your short workout routine.. opening your computer..and finding out you made up to $700 or more - while you were a sleep! I have to be honest with you.This is a Business proposal that you might get interested. Believe , this is totally different from what you have heard or tried already. We Start putting 28 Members in your team for MAY 11 to 18/2011 WEEKLY Commission Cycle. YOU ALREADY HAVE 8,898 Pre-Enrollee\'s and 24 PAID Member\'s today in your TEAM. AND IT IS STILL GROWING! Only 8 Spot\'s remaining before Commission Cycle Ends. IMPORTANT: ORDER NOW or Before MAY 18,2011 is the Cut-Off Date!To lock in your Commission!Be sure to\"Order Now\"only $25 Lifetime Membership.To secure your $700 Commission \"ORDER NOW\" Hurry Grab it Now!You Get-Paid on June 20,2011 direct to your Paypal Account At the Time & Transfer Online To Your ATM Account ... Be Sure to Copy the link below & Paste into your browser and press enter: To Secure your $700 commission! This program is hands free, once you have it setup you never need to touch it again. Just wait for the $700 commissions to come every month and enjoy! After Your One Time Personal Payment Of $25 You will access your Commission in any ATM ... https://www.plimus.com/jsp/redirect.jsp?contractId=2964584&referrer=dollarhere=%3E TYPE ===== Date & Time = New PAID Members ===== Country M MAY.15 @ 04:40 PM = Arman Thopkins ===== United States M MAY.15 @ 04:40 PM == George Middles ===== Germany M MAY.15 @ 11:19 PM === Ronald Whitsel ===== United States M MAY.15 @ 06:23 PM ==== James Allen ===== Canada M MAY.15 @ 12:39 AM ===== Sheryl Mchaud ===== United States M MAY.15 @ 08:26 AM ====== Renee Jenkins ===== Australia P MAY.15 @ 02:31 PM ======= Elizabeth Rios ===== Portugal M MAY.15 @ 02:37 PM ======== Karen Schiller ===== United Kingdom M MAY.14 @ 04:21 PM ========= Mark Raines ===== Denmark P MAY.14 @ 09:38 PM ========== David Grogan ===== Sri Lanka P MAY.14 @ 10:45 PM =========== Josh Dalton ===== United States M MAY.14 @ 10:19 AM ============ Victor Chavar ===== Italy P MAY.14 @ 08:32 PM ============ Gaynell Bailey ===== South Africa M MAY.13 @ 09:40 PM =========== Barb Thorn ===== Netherlands P MAY.13 @ 10:21 AM ========== James Williams ===== North Carolina P MAY.13 @ 11:08 PM ========= David Baxter ===== United States M MAY.12 @ 12:39 AM ======== Carolyn Smith ===== Singapore M MAY.12 @ 02:30 AM ======= Andrew Scott ===== New Zealand P MAY.12 @ 02:42 AM ====== Matthew Evan ===== Portugal M MAY.12 @ 08:18 AM ===== Steven Hopkins ===== United States P MAY.11 @ 2:38 AM ==== Jenny Regan ===== United States P MAY.11 @ 2:53 AM === Andy McLarry ===== United Kingdom P MAY.11 @ 2:56 AM == Jeffrey Segail ===== Germany M MAY.11 @ 4:19 AM = Mayeth Thompson ===== United States 28= Paid Members join are Waiting each Coming Commissions.... Therefore, you have a GUARANTEED $700 Commission every month from now on!. Earn $25 Per Process! Each $25 x 28 = $700 Commission will be yours...! Be Sure to Copy the link below & Paste into your browser and press enter: To Secure your $700 commission! Don't delay another second, Buy No Sales System and start making the money that would put a huge smile on your face... Click Below!! And Join Now https://www.plimus.com/jsp/redirect.jsp?contractId=2964584&referrer=dollarhere=%3E Please do realize that if you don\'t ORDER NOW, ALL Commission that are currently is yours to take. As soon as I receive the confirmation of your ORDER, I will be in touch with you to help you get started. I personally recommend it and endorse it 100%, with all my reputation. (Hurry, Price May Increase Without Notice!) Here's To Your Success, Daniel Owens (Authorised Marketer) PPS. Remember, you shouldn't have to worry about losing your money. If there is any problem whatsoever, I'll give you a FULL REFUND. No questions asked, no problem too small.I want you to know that I'm fully invested in helping you succeed. Any questions,any problems, I'm here for you. I'm here to help you make money online. You've got two full months to test everything out. Make sure you are doing all you can to make the money that you so richly deserve. Site privacy policy: This page was sent using the villasitalia.com send-to-a-friend link. Your email has not been added to any list and has not been recorded at our site. From romosan@sycorax.lbl.gov Tue May 17 11:29:34 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HGTYrP192092 for ; Tue, 17 May 2011 11:29:34 -0500 X-ASG-Debug-ID: 1305649772-66be02e60000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from sycorax.lbl.gov (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 5F6BD45E5B8 for ; Tue, 17 May 2011 09:29:32 -0700 (PDT) Received: from sycorax.lbl.gov (sycorax.lbl.gov [128.3.11.121]) by cuda.sgi.com with ESMTP id C45TQkaG9f228EPH for ; Tue, 17 May 2011 09:29:32 -0700 (PDT) Received: from sycorax.lbl.gov (romosan@localhost [127.0.0.1]) by sycorax.lbl.gov (8.14.4/8.14.4/Debian-2) with ESMTP id p4HGTMaV002919; Tue, 17 May 2011 09:29:22 -0700 Received: (from romosan@localhost) by sycorax.lbl.gov (8.14.4/8.14.4/Submit) id p4HGTM50002916; Tue, 17 May 2011 09:29:22 -0700 From: Alex Romosan To: Christoph Hellwig Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com X-ASG-Orig-Subj: Re: system hangs 2.6.39-rc7 xfs related Subject: Re: system hangs 2.6.39-rc7 xfs related References: <87ei3y7aei.fsf@sycorax.lbl.gov> <20110517155018.GB17735@infradead.org> Date: Tue, 17 May 2011 09:29:22 -0700 In-Reply-To: <20110517155018.GB17735@infradead.org> (message from Christoph Hellwig on Tue, 17 May 2011 11:50:19 -0400) Message-ID: <87r57xl28t.fsf@sycorax.lbl.gov> User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/23.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Barracuda-Connect: sycorax.lbl.gov[128.3.11.121] X-Barracuda-Start-Time: 1305649773 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64013 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Christoph Hellwig writes: > On Mon, May 16, 2011 at 11:44:37AM -0700, Alex Romosan wrote: >> my system just sort of locked up and everything that's trying to write >> to disk is stuck. dmesg gives me this: > > This looks quite similar to a few other XFS issues in the .39 cycle. > Can you upgrade to latest Linus HEAD which has fixes for these similar > issues and see if that fixes your issues? i'll give it a try, but it might take some time before i can figure out if the fixes work as i still haven't figured out a way to reproduce this consistently (having a lot of processes write to disk at the same time increases the likelihood of it happening but doesn't guarantee it). --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | From david@fromorbit.com Tue May 17 18:13:15 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4HNDF3T204898 for ; Tue, 17 May 2011 18:13:15 -0500 X-ASG-Debug-ID: 1305673984-7da800310000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E7ECF45B4E5 for ; Tue, 17 May 2011 16:13:10 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id AQKMB5egb7huR62K for ; Tue, 17 May 2011 16:13:10 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmMEAKz/0k15LCoegWdsb2JhbACmFRUBARYmJcdpDoMKD4JyBJ8Y Received: from ppp121-44-42-30.lns20.syd6.internode.on.net (HELO dastard) ([121.44.42.30]) by ipmail06.adl6.internode.on.net with ESMTP; 18 May 2011 08:43:03 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QMTRl-0001sM-GI; Wed, 18 May 2011 09:13:01 +1000 Date: Wed, 18 May 2011 09:13:01 +1000 From: Dave Chinner To: Jan Kara Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [PATCH] xfstests: Improve test 219 to work with all filesystems Subject: Re: [PATCH] xfstests: Improve test 219 to work with all filesystems Message-ID: <20110517231301.GX19446@dastard> References: <1305644104-612-1-git-send-email-jack@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1305644104-612-1-git-send-email-jack@suse.cz> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1305673993 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64039 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 17, 2011 at 04:55:04PM +0200, Jan Kara wrote: > Different filesystems account different amount of metadata in quota. Thus it is > impractical to check for a particular amount of space occupied by a file > because there is no right value. Change the test to verify whether the amount > of space before quotacheck and after quotacheck is the same as other quota > tests do. Except that the purpose of the test the accounting correctly matches the blocks allocated via direct IO, buffered IO and mmap, not that quota is consistent over a remount. IOWs, The numbers do actually matter - for example the recent changes to speculative delayed allocation beyond EOF for buffered IO in XFS could be causing large numbers of blocks to be left after EOF incorrectly, but the exact block number check used in the test would catch that. The method you propose would not catch it at all, and we'd be oblivous to an undesirable change in behaviour. IMO, a better filter function would be the way to go - one that takes into account that there might be some metadata blocks allocated but not less than 3x48k should have be allocated to the quotas... Cheers, Dave. -- Dave Chinner david@fromorbit.com From xiaoqiangnk@gmail.com Tue May 17 22:29:25 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,FREEMAIL_FROM, J_CHICKENPOX_52,J_CHICKENPOX_84,T_DKIM_INVALID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4I3TPpi213860 for ; Tue, 17 May 2011 22:29:25 -0500 X-ASG-Debug-ID: 1305689363-60bd03ad0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-vx0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0D2F71B8B800 for ; Tue, 17 May 2011 20:29:23 -0700 (PDT) Received: from mail-vx0-f181.google.com (mail-vx0-f181.google.com [209.85.220.181]) by cuda.sgi.com with ESMTP id lPegsOKTTOBWwEWd for ; Tue, 17 May 2011 20:29:23 -0700 (PDT) Received: by vxb39 with SMTP id 39so906061vxb.26 for ; Tue, 17 May 2011 20:29:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=31I8Y15/nT+CncUxnT3K8PQVgJLWQYeT3zKogNeAtmI=; b=Fsfi9B92YwCXVHzQN4tg1Ave54Wib3t/YSHphrEXYCnwVfTENZaqU6lony/sFhn/WU V98/qaio/eKJ77RMZi/UYcq+oe1doDP4rkq/z/OQZ0nC1NaeOZ51D+bTyhq7nK2kujmh 36OIok3VezYfhiRHb13aXDnzrZ1TkmfnCz42s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=wba0nINyPi0FGQ6IJ0/NHqv7dX5NqsdEdFS1U7P2mxSywk0QsM7W2q7KRxR8AAv0XU sm/bnMqlo5DQHhd7secDSY4V1MVD6G7rQCVJ0KeC8M4o0jBeRAyQ7M9Wj+Lz2maroC6P 5Ex170dZ0yrOgvosG1Ee9wb2erw1nqNT0JfTw= MIME-Version: 1.0 Received: by 10.52.99.233 with SMTP id et9mr1300403vdb.312.1305689363250; Tue, 17 May 2011 20:29:23 -0700 (PDT) Received: by 10.220.126.204 with HTTP; Tue, 17 May 2011 20:29:23 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 May 2011 11:29:23 +0800 Message-ID: X-ASG-Orig-Subj: Re: [PATCH] xfstests:Make 225 compare map and fiemap at each block. Subject: Re: [PATCH] xfstests:Make 225 compare map and fiemap at each block. From: Yongqiang Yang To: Eric Sandeen Cc: Ext4 Developers List , xfs@oss.sgi.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: mail-vx0-f181.google.com[209.85.220.181] X-Barracuda-Start-Time: 1305689364 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=DKIM_SIGNED, DKIM_VERIFIED X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64057 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Hi Eric, Could you have a look at this patch? On Sat, May 14, 2011 at 11:47 AM, Yongqiang Yang wr= ote: > Hi All, > > Due to my carelessness, I induced a ugly patch to ext4's fiemap, but > 225 could not find it. =A0So I looked into the 225 and could not figure o= ut > logic in compare_map_and_fiemap(), which seemed to mixed extents with > blocks. =A0Then I made 225 compare map and fiemap at each block, the new > 225 can find another bug in ext4's fiemap. > > The new 225 works well on ext3 and ext4 with both 1K and 4K block. Howeve= r, > it report fiemap error on xfs with 4K block. =A0My working tree is 2.6.39= -rc3 > pulled from Ted's tree. The error message is as follows. > > =A0QA output created by 225 > =A0fiemap run without preallocation, with sync > +map is 'DDHDHHDHHDHDDHDDHDDHHDHDDHDDDDDDHHDDDHHHHDH > DDDDDDDDHDDHHHDDDHDDHHDDDDDDHHHHHHDDHHHHHDHDHDHDD > DHDDHD' > +logical: [ =A0 =A0 =A0 0.. =A0 =A0 =A015] phys: =A0 =A0 =A0 12.. =A0 =A0= =A027 flags: 0x000 tot: 16 > +logical: [ =A0 =A0 =A017.. =A0 =A0 =A031] phys: =A0 =A0 =A0 29.. =A0 =A0= =A043 flags: 0x000 tot: 15 > +logical: [ =A0 =A0 =A034.. =A0 =A0 =A063] phys: =A0 =A0 =A0 46.. =A0 =A0= =A075 flags: 0x000 tot: 30 > +logical: [ =A0 =A0 =A065.. =A0 =A0 =A095] phys: =A0 =A0 =A0 77.. =A0 =A0= 107 flags: 0x001 tot: 31 > +Problem comparing fiemap and map > =A0fiemap run without preallocation or sync > +map is 'DDHDHHDHHDHDDHDDHDDHHDHDDHDDDDDDHHDDDHHHHDH > DDDDDDDDHDDHHHDDDHDDHHDDDDDDHHHHHHDDHHHHHDHDHDHDD > DHDDHD' > +logical: [ =A0 =A0 =A0 0.. =A0 =A0 =A015] phys: =A0 =A0 =A0 =A00.. =A0 = =A0 =A015 flags: 0x006 tot: 16 > +Problem comparing fiemap and map > Ran: 225 > Failures: 225 > Failed 1 of 1 tests > > I am not sure this is a bug in new 225 or xfs. > > Yongqiang. > > Signed-off-by: Yongqiang Yang > --- > =A0src/fiemap-tester.c | =A0223 ++++++++++++++++++++++++++++-------------= --------- > =A01 files changed, 125 insertions(+), 98 deletions(-) > > diff --git a/src/fiemap-tester.c b/src/fiemap-tester.c > index 1663f84..99bb5ce 100644 > --- a/src/fiemap-tester.c > +++ b/src/fiemap-tester.c > @@ -14,6 +14,9 @@ > =A0* You should have received a copy of the GNU General Public License > =A0* along with this program; if not, write the Free Software Foundation, > =A0* Inc., =A051 Franklin St, Fifth Floor, Boston, MA =A002110-1301 =A0US= A > + * > + * Compare map and fiemap at each block, > + * Yongqiang Yang , 2011 > =A0*/ > > =A0#include > @@ -57,7 +60,7 @@ generate_file_mapping(int blocks, int prealloc) > =A0 =A0 =A0 =A0int num_types =3D 2, cur_block =3D 0; > =A0 =A0 =A0 =A0int i =3D 0; > > - =A0 =A0 =A0 map =3D malloc(sizeof(char) * blocks); > + =A0 =A0 =A0 map =3D malloc(sizeof(char) * (blocks + 1)); > =A0 =A0 =A0 =A0if (!map) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return NULL; > > @@ -80,7 +83,8 @@ generate_file_mapping(int blocks, int prealloc) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cur_block++; > =A0 =A0 =A0 =A0} > - > + > + =A0 =A0 =A0 map[blocks] =3D 0; > =A0 =A0 =A0 =A0return map; > =A0} > > @@ -247,55 +251,36 @@ check_flags(struct fiemap *fiemap, int blocksize) > =A0} > > =A0static int > -check_data(struct fiemap *fiemap, __u64 logical_offset, int blocksize, > +check_data(struct fiemap_extent * extent , =A0__u64 logical_offset, int > blocksize, > =A0 =A0 =A0 =A0 =A0 int last, int prealloc) > =A0{ > - =A0 =A0 =A0 struct fiemap_extent *extent; > - =A0 =A0 =A0 __u64 orig_offset =3D logical_offset; > - =A0 =A0 =A0 int c, found =3D 0; > - > - =A0 =A0 =A0 for (c =3D 0; c < fiemap->fm_mapped_extents; c++) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 __u64 start, end; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 extent =3D &fiemap->fm_extents[c]; > - > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 start =3D extent->fe_logical; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 end =3D extent->fe_logical + extent->fe_len= gth; > - > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (logical_offset > end) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue; > - > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (logical_offset + blocksize < start) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > - > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (logical_offset >=3D start && > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 logical_offset < end) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (prealloc && > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 !(extent->fe_flags = & FIEMAP_EXTENT_UNWRITTEN)) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("ERR= OR: preallocated extent is not " > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0"marked with FIEMAP_EXTENT_UNWRITTEN: " > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0"%llu\n", > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0(unsigned long long) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0(start / blocksize)); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -1; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > - > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (logical_offset + blocks= ize > end) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 logical_off= set =3D end+1; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 found =3D 1= ; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 int found =3D 0; > + =A0 =A0 =A0 __u64 start, end; > + > + =A0 =A0 =A0 start =3D extent->fe_logical; > + =A0 =A0 =A0 end =3D extent->fe_logical + extent->fe_length; > + > + =A0 =A0 =A0 if (logical_offset >=3D start && > + =A0 =A0 =A0 =A0 =A0 logical_offset < end) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (prealloc && > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 !(extent->fe_flags & FIEMAP_EXTENT_= UNWRITTEN)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("ERROR: preallocated= extent is not " > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"marked with= FIEMAP_EXTENT_UNWRITTEN: " > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"%llu\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned lo= ng long) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(start / blo= cksize)); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -1; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 found =3D 1; > =A0 =A0 =A0 =A0} > - > + > =A0 =A0 =A0 =A0if (!found) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0printf("ERROR: couldn't find extent at %ll= u\n", > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned long long)(orig_of= fset / blocksize)); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned long long)(logical= _offset / blocksize)); > =A0 =A0 =A0 =A0} else if (last && > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0!(fiemap->fm_extents[c].fe_flags & F= IEMAP_EXTENT_LAST)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0!(extent->fe_flags & FIEMAP_EXTENT_L= AST)) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0printf("ERROR: last extent not marked as l= ast: %llu\n", > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned long long)(orig_of= fset / blocksize)); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned long long)(logical= _offset / blocksize)); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0found =3D 0; > =A0 =A0 =A0 =A0} > > @@ -370,37 +355,26 @@ check_weird_fs_hole(int fd, __u64 > logical_offset, int blocksize) > =A0} > > =A0static int > -check_hole(struct fiemap *fiemap, int fd, __u64 logical_offset, int bloc= ksize) > +check_hole(struct fiemap_extent *extent, int fd, __u64 > logical_offset, int blocksize) > =A0{ > - =A0 =A0 =A0 struct fiemap_extent *extent; > - =A0 =A0 =A0 int c; > + =A0 =A0 =A0 __u64 start, end; > > - =A0 =A0 =A0 for (c =3D 0; c < fiemap->fm_mapped_extents; c++) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 __u64 start, end; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 extent =3D &fiemap->fm_extents[c]; > + =A0 =A0 =A0 start =3D extent->fe_logical; > + =A0 =A0 =A0 end =3D extent->fe_logical + extent->fe_length; > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 start =3D extent->fe_logical; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 end =3D extent->fe_logical + extent->fe_len= gth; > + =A0 =A0 =A0 if (logical_offset >=3D start && > + =A0 =A0 =A0 =A0 =A0 logical_offset < end) { > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (logical_offset > end) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (logical_offset + blocksize < start) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (check_weird_fs_hole(fd, logical_offset, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 blocksize) =3D=3D 0) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0; > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (logical_offset >=3D start && > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 logical_offset < end) { > - > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (check_weird_fs_hole(fd,= logical_offset, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 blocksize) =3D=3D 0) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > - > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("ERROR: found an all= ocated extent where a hole " > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"should be: = %llu\n", > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned lo= ng long)(start / blocksize)); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -1; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("ERROR: found an allocated extent wh= ere a hole " > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"should be: %llu\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned long long)(start /= blocksize)); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -1; > =A0 =A0 =A0 =A0} > - > + > =A0 =A0 =A0 =A0return 0; > =A0} > > @@ -423,9 +397,11 @@ compare_fiemap_and_map(int fd, char *map, int > blocks, int blocksize, int syncfil > =A0{ > =A0 =A0 =A0 =A0struct fiemap *fiemap; > =A0 =A0 =A0 =A0char *fiebuf; > - =A0 =A0 =A0 int blocks_to_map, ret, cur_extent =3D 0, last_data; > + =A0 =A0 =A0 int blocks_to_map, ret, last_data =3D -1; > =A0 =A0 =A0 =A0__u64 map_start, map_length; > =A0 =A0 =A0 =A0int i, c; > + =A0 =A0 =A0 int cur_block =3D 0; > + =A0 =A0 =A0 int last_found =3D 0; > > =A0 =A0 =A0 =A0if (query_fiemap_count(fd, blocks, blocksize) < 0) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return -1; > @@ -451,8 +427,11 @@ compare_fiemap_and_map(int fd, char *map, int > blocks, int blocksize, int syncfil > =A0 =A0 =A0 =A0fiemap->fm_extent_count =3D blocks_to_map; > =A0 =A0 =A0 =A0fiemap->fm_mapped_extents =3D 0; > > + =A0 =A0 =A0 /* check fiemap by looking at each block. */ > =A0 =A0 =A0 =A0do { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 fiemap->fm_start =3D map_start; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 int nr_extents; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 fiemap->fm_start =3D cur_block * blocksize; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fiemap->fm_length =3D map_length; > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D ioctl(fd, FS_IOC_FIEMAP, (unsigned= long)fiemap); > @@ -465,45 +444,93 @@ compare_fiemap_and_map(int fd, char *map, int > blocks, int blocksize, int syncfil > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (check_flags(fiemap, blocksize)) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto error; > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (i =3D cur_extent, c =3D 1; i < blocks;= i++, c++) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 __u64 logical_offset =3D i = * blocksize; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 nr_extents =3D fiemap->fm_mapped_extents; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (nr_extents =3D=3D 0) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int block =3D cur_block + (= map_length - 1)/ blocksize; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (; cur_block <=3D block= && cur_block < blocks; cur_block++) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* check ho= le */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (map[cur= _block] !=3D 'H') { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 printf("ERROR: map[%d] should not be " > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0"a hole\n", cur_block); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 goto error; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (c > fiemap->fm_mapped_e= xtents) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 i++; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (c =3D 0; c < nr_extents; c++) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 __u64 offset; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int block; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct fiemap_extent *exten= t; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (last_found) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("ERR= OR: there is extent after" > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0"the last extent\n"); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto error; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 switch (map[i]) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 case 'D': > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (check_d= ata(fiemap, logical_offset, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0blocksize, last_data =3D=3D i, 0)) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 goto error; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 case 'H': > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (check_h= ole(fiemap, fd, logical_offset, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0blocksize)) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 goto error; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 case 'P': > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (check_d= ata(fiemap, logical_offset, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0blocksize, last_data =3D=3D i, 1)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 extent =3D &fiemap->fm_exte= nts[c]; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 offset =3D extent->fe_logic= al; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 block =3D offset / blocksiz= e; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* check hole. */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (; cur_block < block; c= ur_block++) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (map[cur= _block] !=3D 'H') { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 printf("ERROR: map[%d] should not be " > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0"a hole\n", cur_block); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0goto error; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 default: > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("ERR= OR: weird value in map: %c\n", > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0map[i]); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 offset =3D extent->fe_logic= al + extent->fe_length; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 block =3D offset / blocksiz= e; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (block > blocks) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("ERR= OR: there are extents beyond EOF\n"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto error= ; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* check data */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (; cur_block < block; c= ur_block++) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 offset =3D = (__u64)cur_block * blocksize; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 last_found = =3D (last_data =3D=3D cur_block); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 switch (map= [cur_block]) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 case 'D': > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 if (check_data(extent, offset, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0blocksize, last_found, 0)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 goto error; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 break; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 case 'H': > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 if (check_hole(extent, fd, offset, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0blocksize)); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 goto error; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 break; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 case 'P': > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 if (check_data(extent, offset, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0blocksize, last_found, 1)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 goto error; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 break; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 default: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 printf("ERROR: weird value in map: %c\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0map[i]); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 goto error; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 cur_extent =3D i; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 map_start =3D i * blocksize; > - =A0 =A0 =A0 } while (cur_extent < blocks); > + =A0 =A0 =A0 } while (cur_block < blocks); > > - =A0 =A0 =A0 ret =3D 0; > - =A0 =A0 =A0 return ret; > + =A0 =A0 =A0 if (!last_found && last_data !=3D -1) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("ERROR: find no last extent\n"); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto error; > + =A0 =A0 =A0 } > + > + =A0 =A0 =A0 free(fiebuf); > + =A0 =A0 =A0 return 0; > =A0error: > =A0 =A0 =A0 =A0printf("map is '%s'\n", map); > =A0 =A0 =A0 =A0show_extents(fiemap, blocksize); > + =A0 =A0 =A0 free(fiebuf); > =A0 =A0 =A0 =A0return -1; > =A0} > > -- > 1.7.5.1 > > -- > Best Wishes > Yongqiang Yang > --=20 Best Wishes Yongqiang Yang From sandeen@redhat.com Tue May 17 22:47:52 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,J_CHICKENPOX_52, J_CHICKENPOX_84 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4I3lqLR214423 for ; Tue, 17 May 2011 22:47:52 -0500 X-ASG-Debug-ID: 1305690471-4c40003e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mx1.redhat.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 3E24E464D92 for ; Tue, 17 May 2011 20:47:51 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id PLFcRtk7tIC58E0d for ; Tue, 17 May 2011 20:47:51 -0700 (PDT) X-ASG-Whitelist: Barracuda Reputation Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4I3lnkE029701 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 17 May 2011 23:47:49 -0400 Received: from liberator.sandeen.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p4I3lmUa027322 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Tue, 17 May 2011 23:47:48 -0400 Message-ID: <4DD34163.8010306@redhat.com> Date: Tue, 17 May 2011 22:47:47 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Yongqiang Yang CC: Ext4 Developers List , xfs@oss.sgi.com, Josef Bacik X-ASG-Orig-Subj: Re: [PATCH] xfstests:Make 225 compare map and fiemap at each block. Subject: Re: [PATCH] xfstests:Make 225 compare map and fiemap at each block. References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1305690472 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On 5/17/11 10:29 PM, Yongqiang Yang wrote: > Hi Eric, > > Could you have a look at this patch? I was kind of hoping Josef would since he wrote it in the first place ;) I can try to take a look... -Eric > On Sat, May 14, 2011 at 11:47 AM, Yongqiang Yang wrote: >> Hi All, >> >> Due to my carelessness, I induced a ugly patch to ext4's fiemap, but >> 225 could not find it. So I looked into the 225 and could not figure out >> logic in compare_map_and_fiemap(), which seemed to mixed extents with >> blocks. Then I made 225 compare map and fiemap at each block, the new >> 225 can find another bug in ext4's fiemap. >> >> The new 225 works well on ext3 and ext4 with both 1K and 4K block. However, >> it report fiemap error on xfs with 4K block. My working tree is 2.6.39-rc3 >> pulled from Ted's tree. The error message is as follows. >> >> QA output created by 225 >> fiemap run without preallocation, with sync >> +map is 'DDHDHHDHHDHDDHDDHDDHHDHDDHDDDDDDHHDDDHHHHDH >> DDDDDDDDHDDHHHDDDHDDHHDDDDDDHHHHHHDDHHHHHDHDHDHDD >> DHDDHD' >> +logical: [ 0.. 15] phys: 12.. 27 flags: 0x000 tot: 16 >> +logical: [ 17.. 31] phys: 29.. 43 flags: 0x000 tot: 15 >> +logical: [ 34.. 63] phys: 46.. 75 flags: 0x000 tot: 30 >> +logical: [ 65.. 95] phys: 77.. 107 flags: 0x001 tot: 31 >> +Problem comparing fiemap and map >> fiemap run without preallocation or sync >> +map is 'DDHDHHDHHDHDDHDDHDDHHDHDDHDDDDDDHHDDDHHHHDH >> DDDDDDDDHDDHHHDDDHDDHHDDDDDDHHHHHHDDHHHHHDHDHDHDD >> DHDDHD' >> +logical: [ 0.. 15] phys: 0.. 15 flags: 0x006 tot: 16 >> +Problem comparing fiemap and map >> Ran: 225 >> Failures: 225 >> Failed 1 of 1 tests >> >> I am not sure this is a bug in new 225 or xfs. >> >> Yongqiang. >> >> Signed-off-by: Yongqiang Yang >> --- >> src/fiemap-tester.c | 223 ++++++++++++++++++++++++++++---------------------- >> 1 files changed, 125 insertions(+), 98 deletions(-) >> >> diff --git a/src/fiemap-tester.c b/src/fiemap-tester.c >> index 1663f84..99bb5ce 100644 >> --- a/src/fiemap-tester.c >> +++ b/src/fiemap-tester.c >> @@ -14,6 +14,9 @@ >> * You should have received a copy of the GNU General Public License >> * along with this program; if not, write the Free Software Foundation, >> * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >> + * >> + * Compare map and fiemap at each block, >> + * Yongqiang Yang , 2011 >> */ >> >> #include >> @@ -57,7 +60,7 @@ generate_file_mapping(int blocks, int prealloc) >> int num_types = 2, cur_block = 0; >> int i = 0; >> >> - map = malloc(sizeof(char) * blocks); >> + map = malloc(sizeof(char) * (blocks + 1)); >> if (!map) >> return NULL; >> >> @@ -80,7 +83,8 @@ generate_file_mapping(int blocks, int prealloc) >> } >> cur_block++; >> } >> - >> + >> + map[blocks] = 0; >> return map; >> } >> >> @@ -247,55 +251,36 @@ check_flags(struct fiemap *fiemap, int blocksize) >> } >> >> static int >> -check_data(struct fiemap *fiemap, __u64 logical_offset, int blocksize, >> +check_data(struct fiemap_extent * extent , __u64 logical_offset, int >> blocksize, >> int last, int prealloc) >> { >> - struct fiemap_extent *extent; >> - __u64 orig_offset = logical_offset; >> - int c, found = 0; >> - >> - for (c = 0; c < fiemap->fm_mapped_extents; c++) { >> - __u64 start, end; >> - extent = &fiemap->fm_extents[c]; >> - >> - start = extent->fe_logical; >> - end = extent->fe_logical + extent->fe_length; >> - >> - if (logical_offset > end) >> - continue; >> - >> - if (logical_offset + blocksize < start) >> - break; >> - >> - if (logical_offset >= start && >> - logical_offset < end) { >> - if (prealloc && >> - !(extent->fe_flags & FIEMAP_EXTENT_UNWRITTEN)) { >> - printf("ERROR: preallocated extent is not " >> - "marked with FIEMAP_EXTENT_UNWRITTEN: " >> - "%llu\n", >> - (unsigned long long) >> - (start / blocksize)); >> - return -1; >> - } >> - >> - if (logical_offset + blocksize > end) { >> - logical_offset = end+1; >> - continue; >> - } else { >> - found = 1; >> - break; >> - } >> + int found = 0; >> + __u64 start, end; >> + >> + start = extent->fe_logical; >> + end = extent->fe_logical + extent->fe_length; >> + >> + if (logical_offset >= start && >> + logical_offset < end) { >> + if (prealloc && >> + !(extent->fe_flags & FIEMAP_EXTENT_UNWRITTEN)) { >> + printf("ERROR: preallocated extent is not " >> + "marked with FIEMAP_EXTENT_UNWRITTEN: " >> + "%llu\n", >> + (unsigned long long) >> + (start / blocksize)); >> + return -1; >> } >> + found = 1; >> } >> - >> + >> if (!found) { >> printf("ERROR: couldn't find extent at %llu\n", >> - (unsigned long long)(orig_offset / blocksize)); >> + (unsigned long long)(logical_offset / blocksize)); >> } else if (last && >> - !(fiemap->fm_extents[c].fe_flags & FIEMAP_EXTENT_LAST)) { >> + !(extent->fe_flags & FIEMAP_EXTENT_LAST)) { >> printf("ERROR: last extent not marked as last: %llu\n", >> - (unsigned long long)(orig_offset / blocksize)); >> + (unsigned long long)(logical_offset / blocksize)); >> found = 0; >> } >> >> @@ -370,37 +355,26 @@ check_weird_fs_hole(int fd, __u64 >> logical_offset, int blocksize) >> } >> >> static int >> -check_hole(struct fiemap *fiemap, int fd, __u64 logical_offset, int blocksize) >> +check_hole(struct fiemap_extent *extent, int fd, __u64 >> logical_offset, int blocksize) >> { >> - struct fiemap_extent *extent; >> - int c; >> + __u64 start, end; >> >> - for (c = 0; c < fiemap->fm_mapped_extents; c++) { >> - __u64 start, end; >> - extent = &fiemap->fm_extents[c]; >> + start = extent->fe_logical; >> + end = extent->fe_logical + extent->fe_length; >> >> - start = extent->fe_logical; >> - end = extent->fe_logical + extent->fe_length; >> + if (logical_offset >= start && >> + logical_offset < end) { >> >> - if (logical_offset > end) >> - continue; >> - if (logical_offset + blocksize < start) >> - break; >> + if (check_weird_fs_hole(fd, logical_offset, >> + blocksize) == 0) >> + return 0; >> >> - if (logical_offset >= start && >> - logical_offset < end) { >> - >> - if (check_weird_fs_hole(fd, logical_offset, >> - blocksize) == 0) >> - break; >> - >> - printf("ERROR: found an allocated extent where a hole " >> - "should be: %llu\n", >> - (unsigned long long)(start / blocksize)); >> - return -1; >> - } >> + printf("ERROR: found an allocated extent where a hole " >> + "should be: %llu\n", >> + (unsigned long long)(start / blocksize)); >> + return -1; >> } >> - >> + >> return 0; >> } >> >> @@ -423,9 +397,11 @@ compare_fiemap_and_map(int fd, char *map, int >> blocks, int blocksize, int syncfil >> { >> struct fiemap *fiemap; >> char *fiebuf; >> - int blocks_to_map, ret, cur_extent = 0, last_data; >> + int blocks_to_map, ret, last_data = -1; >> __u64 map_start, map_length; >> int i, c; >> + int cur_block = 0; >> + int last_found = 0; >> >> if (query_fiemap_count(fd, blocks, blocksize) < 0) >> return -1; >> @@ -451,8 +427,11 @@ compare_fiemap_and_map(int fd, char *map, int >> blocks, int blocksize, int syncfil >> fiemap->fm_extent_count = blocks_to_map; >> fiemap->fm_mapped_extents = 0; >> >> + /* check fiemap by looking at each block. */ >> do { >> - fiemap->fm_start = map_start; >> + int nr_extents; >> + >> + fiemap->fm_start = cur_block * blocksize; >> fiemap->fm_length = map_length; >> >> ret = ioctl(fd, FS_IOC_FIEMAP, (unsigned long)fiemap); >> @@ -465,45 +444,93 @@ compare_fiemap_and_map(int fd, char *map, int >> blocks, int blocksize, int syncfil >> if (check_flags(fiemap, blocksize)) >> goto error; >> >> - for (i = cur_extent, c = 1; i < blocks; i++, c++) { >> - __u64 logical_offset = i * blocksize; >> + nr_extents = fiemap->fm_mapped_extents; >> + if (nr_extents == 0) { >> + int block = cur_block + (map_length - 1)/ blocksize; >> + for (; cur_block <= block && cur_block < blocks; cur_block++) { >> + /* check hole */ >> + if (map[cur_block] != 'H') { >> + printf("ERROR: map[%d] should not be " >> + "a hole\n", cur_block); >> + goto error; >> + } >> + } >> + continue; >> + } >> >> - if (c > fiemap->fm_mapped_extents) { >> - i++; >> - break; >> + for (c = 0; c < nr_extents; c++) { >> + __u64 offset; >> + int block; >> + struct fiemap_extent *extent; >> + >> + if (last_found) { >> + printf("ERROR: there is extent after" >> + "the last extent\n"); >> + goto error; >> } >> >> - switch (map[i]) { >> - case 'D': >> - if (check_data(fiemap, logical_offset, >> - blocksize, last_data == i, 0)) >> - goto error; >> - break; >> - case 'H': >> - if (check_hole(fiemap, fd, logical_offset, >> - blocksize)) >> - goto error; >> - break; >> - case 'P': >> - if (check_data(fiemap, logical_offset, >> - blocksize, last_data == i, 1)) >> + extent = &fiemap->fm_extents[c]; >> + offset = extent->fe_logical; >> + block = offset / blocksize; >> + >> + /* check hole. */ >> + for (; cur_block < block; cur_block++) { >> + if (map[cur_block] != 'H') { >> + printf("ERROR: map[%d] should not be " >> + "a hole\n", cur_block); >> goto error; >> - break; >> - default: >> - printf("ERROR: weird value in map: %c\n", >> - map[i]); >> + } >> + } >> + >> + offset = extent->fe_logical + extent->fe_length; >> + block = offset / blocksize; >> + >> + if (block > blocks) { >> + printf("ERROR: there are extents beyond EOF\n"); >> goto error; >> } >> + >> + /* check data */ >> + for (; cur_block < block; cur_block++) { >> + offset = (__u64)cur_block * blocksize; >> + last_found = (last_data == cur_block); >> + switch (map[cur_block]) { >> + case 'D': >> + if (check_data(extent, offset, >> + blocksize, last_found, 0)) >> + goto error; >> + break; >> + case 'H': >> + if (check_hole(extent, fd, offset, >> + blocksize)); >> + goto error; >> + break; >> + >> + case 'P': >> + if (check_data(extent, offset, >> + blocksize, last_found, 1)) >> + goto error; >> + break; >> + default: >> + printf("ERROR: weird value in map: %c\n", >> + map[i]); >> + goto error; >> + } >> + } >> } >> - cur_extent = i; >> - map_start = i * blocksize; >> - } while (cur_extent < blocks); >> + } while (cur_block < blocks); >> >> - ret = 0; >> - return ret; >> + if (!last_found && last_data != -1) { >> + printf("ERROR: find no last extent\n"); >> + goto error; >> + } >> + >> + free(fiebuf); >> + return 0; >> error: >> printf("map is '%s'\n", map); >> show_extents(fiemap, blocksize); >> + free(fiebuf); >> return -1; >> } >> >> -- >> 1.7.5.1 >> >> -- >> Best Wishes >> Yongqiang Yang >> > > > From david@fromorbit.com Wed May 18 01:32:29 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,J_CHICKENPOX_23 autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4I6WSx6223920 for ; Wed, 18 May 2011 01:32:28 -0500 X-ASG-Debug-ID: 1305700345-657f01af0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id F173B1341C2A for ; Tue, 17 May 2011 23:32:26 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id SNRoejwunhN9I7Ag for ; Tue, 17 May 2011 23:32:26 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmMEAKNl0015LCoegWdsb2JhbACmFhUBARYmJYhwvxwOhgsEnxg Received: from ppp121-44-42-30.lns20.syd6.internode.on.net (HELO dastard) ([121.44.42.30]) by ipmail06.adl6.internode.on.net with ESMTP; 18 May 2011 16:01:57 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QMaIT-0006ai-J6; Wed, 18 May 2011 16:31:53 +1000 Date: Wed, 18 May 2011 16:31:53 +1000 From: Dave Chinner To: Amir Goldstein Cc: Eric Sandeen , Yongqiang Yang , Ext4 Developers List , xfs-oss X-ASG-Orig-Subj: Re: xfstests: device busy when umount Subject: Re: xfstests: device busy when umount Message-ID: <20110518063153.GZ19446@dastard> References: <4DD286E5.8090105@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1305700346 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.64068 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, May 17, 2011 at 06:01:14PM +0300, Amir Goldstein wrote: > On Tue, May 17, 2011 at 5:32 PM, Eric Sandeen wrote: > > On 5/17/11 4:03 AM, Yongqiang Yang wrote: > >> Hi, > >> > >> I noticed that all tests which contain 'device busy' errors have > >> falloc operations.  Does the error have something to do with falloc? Perhaps a bit more detail about what you are testing, how you've set up xfstests, etc, and some analysis of the problem is in order first? > > cc'ing xfs list since xfs devs maintain xfstests. > > > > What tests have "device busy" errors?  What do the usual investigative > > steps such as "lsof" and "fuser" tell you when this happens? > > I tried running lsof | grep $TEST_DIR before umount > and I tried sleep 1 before umount and it didn't yield anything. Which usually indicates that you've got some kind of reference counting problem preventing the filesystem from being unmounted. > > Are there loop devices that didn't get cleaned up, or processes that > > have not terminated? > > > > What tests have these problems? > > for me 124 always fails to umount, and 198 and 213 sometimes fails to umount. What, exactly, are you testing on? test 124 uses XFS_IOC_RESVSP directly, not fallocate(), so all it is doing on a non-XFS filesystem is iterating a loop that writes a 1MB file, reads it back then unlinks it.... Cheers, Dave. -- Dave Chinner david@fromorbit.com From amir73il@gmail.com Wed May 18 03:19:06 2011 X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, J_CHICKENPOX_23,J_CHICKENPOX_56,J_CHICKENPOX_62,T_DKIM_INVALID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4I8J6ml227457 for ; Wed, 18 May 2011 03:19:06 -0500 X-ASG-Debug-ID: 1305706744-7311017b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail-ey0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 402811E2DE82 for ; Wed, 18 May 2011 01:19:04 -0700 (PDT) Received: from mail-ey0-f181.google.com (mail-ey0-f181.google.com [209.85.215.181]) by cuda.sgi.com with ESMTP id Ka8S5kY3a3zsjTFI for ; Wed, 18 May 2011 01:19:04 -0700 (PDT) Received: by eyh5 with SMTP id 5so396454eyh.26