Received: with ECARTIS (v1.0.0; list linux-xfs); Tue, 27 May 2003 03:59:29 -0700 (PDT) Received: from mail.tvol.net (pr-66-150-46-254.wgate.com [66.150.46.254]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4RAx92x025398 for ; Tue, 27 May 2003 03:59:12 -0700 Received: from sinz.eng.tvol.net ([10.32.2.99]) by mail.tvol.net with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GZVMLW1A; Tue, 27 May 2003 06:59:34 -0400 Received: from wgate.com (localhost.localdomain [127.0.0.1]) by sinz.eng.tvol.net (8.12.8/8.12.5) with ESMTP id h4RAw8fj016140; Tue, 27 May 2003 06:58:09 -0400 Message-ID: <3ED344C0.1010700@wgate.com> Date: Tue, 27 May 2003 06:58:08 -0400 From: Michael Sinz User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030507 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andi Kleen CC: linux-xfs@oss.sgi.com Subject: Re: Tomorrow References: <1053694002.2887.1.camel@localhost.localdomain> <1053697162.21472.51.camel@jen.americas.sgi.com> <20030523134438.GC30288@wotan.suse.de> <20030523150530.A31022@infradead.org> <20030524071709.GK27626@plato.local.lan> <20030524095245.A24074@infradead.org> <20030524091516.GM27626@plato.local.lan> <20030524093103.GA12181@wotan.suse.de> In-Reply-To: <20030524093103.GA12181@wotan.suse.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 4156 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: msinz@wgate.com Precedence: bulk X-list: linux-xfs Content-Length: 2212 Lines: 46 Andi Kleen wrote: >>i wouldn't call them v3 dirs either, that implies its an `upgrade' to >>v2, when in fact its a downgrade (non-broken -> broken). maybe call >>them v0 (afaik xfs only has two dir formats v1 and v2). or call it >>something entirely different, like broken_dirs ;-) > > > I would not call them broken, but what is a bit worrying is that it can > be quite complicated to lower case letters. In the American ASCII subset it's > easy, but for other languages it usually needs huge lookup tables and worse > there are different character set. When we did this for the Amiga (oh so many years ago) it was a royal PITA. We ended up punting for the most part on anything that was outside of the ISO-Latin-1 code page and even there we had a problem due to some "differences" of opinion by certain language groups what was supposed to happen. This gets worse when you look at behavior patterns due to the fact that a file, especially one accessed over the network, may be accessed by a machine with different locale settings and thus have slightly different rules as to what is the lowercase form of an uppercase letter or wordform. While I can fully understand the need to do this somewhere closer to the filesystem (as the performance impact can be massive otherwise) there is no really good solution to this in the international space when you start to network machines accross locale settings. (A pair of files that are correctly unique names in one locale may not be unique in another locale!) > You either only support UTF-8 Unicode (shifting the burden of conversion > to user space) or you need to store a "codepage" per filesystem. Linux seems > to go towards the UTF-8 route. The kernel already has some code for this > (JFS does it), but it will be not pretty. I have not looked at the JFS code at all but this can not be very pretty if they supported the locale preferences. (Unless, in the last 10 years there was some new agreement such that case conversion for all locales are consistant with eachother) -- Michael Sinz -- Director, Systems Engineering -- Worldgate Communications A master's secrets are only as good as the master's ability to explain them to others.