X-Spam-Checker-Version: SpamAssassin 3.3.0-rupdated (updated) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-rupdated Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id mBEM2nuE024664 for ; Sun, 14 Dec 2008 16:02:49 -0600 X-ASG-Debug-ID: 1229292167-615602d30000-ps1ADW X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ty.sabi.co.UK (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 347871E58B for ; Sun, 14 Dec 2008 14:02:47 -0800 (PST) Received: from ty.sabi.co.UK (82-69-39-138.dsl.in-addr.zen.co.uk [82.69.39.138]) by cuda.sgi.com with ESMTP id PaTLpxzeMF2VufiD for ; Sun, 14 Dec 2008 14:02:47 -0800 (PST) Received: from from [127.0.0.1] (helo=tree.ty.sabi.co.uk) by ty.sabi.co.UK with esmtp(Exim 4.68 #1) id 1LBz2M-0002GM-2o; Sun, 14 Dec 2008 22:02:06 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18757.33373.744917.457587@tree.ty.sabi.co.uk> Date: Sun, 14 Dec 2008 22:02:05 +0000 X-Face: SMJE]JPYVBO-9UR%/8d'mG.F!@.,l@c[f'[%S8'BZIcbQc3/">GrXDwb#;fTRGNmHr^JFb SAptvwWc,0+z+~p~"Gdr4H$(|N(yF(wwCM2bW0~U?HPEE^fkPGx^u[*[yV.gyB!hDOli}EF[\cW*S H&spRGFL}{`bj1TaD^l/"[ msn( /TH#THs{Hpj>)]f>, Linux RAID X-ASG-Orig-Subj: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs] Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs] In-Reply-To: <200812141912.59649.Martin@lichtvoll.de> References: <1229225480.16555.152.camel@localhost> <18757.4606.966139.10342@tree.ty.sabi.co.uk> <200812141912.59649.Martin@lichtvoll.de> X-Mailer: VM 7.17 under 21.5 (beta28) XEmacs Lucid From: pg_xf2@xf2.for.sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-Barracuda-Connect: 82-69-39-138.dsl.in-addr.zen.co.uk[82.69.39.138] X-Barracuda-Start-Time: 1229292168 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.92 X-Barracuda-Spam-Status: No, SCORE=-1.92 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=RDNS_DYNAMIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.1.12732 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS [ ... ] > But - as far as I understood - the filesystem doesn't have to > wait for barriers to complete, but could continue issuing IO > requests happily. A barrier only means, any request prior to > that have to land before and any after it after it. > It doesn't mean that the barrier has to land immediately and > the filesystem has to wait for this. At least that always was > the whole point of barriers for me. If thats not the case I > misunderstood the purpose of barriers to the maximum extent > possible. Unfortunately that seems the case. The purpose of barriers is to guarantee that relevant data is known to be on persistent storage (kind of hardware 'fsync'). In effect write barrier means "tell me when relevant data is on persistent storage", or less precisely "flush/sync writes now and tell me when it is done". Properties as to ordering are just a side effect. That is, the application (file system in the case of metadata, user process in the case of data) knows that a barrier operation is complete, it knows that all data involved in the barrier operation are on persistent storage. In case of serially dependent transactions, applications do wait until the previous transaction is completed before starting the next one (e.g. creating potentially many files in the same directory, something that 'tar' does). "all data involved" is usually all previous writes, but in more sophisticated cases it can be just specific writes. When an applications at transaction end points (for a file system, metadata updates) issues a write barrier and then waits for its completion. If the host adapter/disk controllers don't have persistent storage, then completion (should) only happen when the data involved is actually on disk; if they do have it, then multiple barriers can be outstanding, if the host adapter/disk controller does support multiple outstanding operations (e.g. thanks to tagged queueing). The best case is when the IO subsystem supports all of these: * tagged queueing: multiple write barriers can be outstanding; * fine granule (specific writes, not all writes) barriers: just metadata writes need to be flushed to persistent storage, not any intervening data writes too; * the host adapter and/or disk controller have persistent caches: as long as those caches have space, barriers can complete immediately, without waiting a write to disk. It just happens that typical contemporary PC IO subsystems (at the hardware level, not the Linux level) have none of those features, except sometimes for NCQ which is a reduced form of TCQ, and apparently is not that useful. Write barriers are also useful without persistent caches, if there is proper tagged queueing and fine granularity.