From owner-xfs@oss.sgi.com Tue Apr 1 03:54:15 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 03:54:24 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,STOX_REPLY_TYPE autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m31AsCZf019813 for ; Tue, 1 Apr 2008 03:54:15 -0700 X-ASG-Debug-ID: 1207047285-562601da0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from tyo201.gate.nec.co.jp (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0868A71FB12 for ; Tue, 1 Apr 2008 03:54:45 -0700 (PDT) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193]) by cuda.sgi.com with ESMTP id kkOsEnCzEHwgECVp for ; Tue, 01 Apr 2008 03:54:45 -0700 (PDT) Received: from mailgate3.nec.co.jp (mailgate53F.nec.co.jp [10.7.69.162]) by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31Asipx028227; Tue, 1 Apr 2008 19:54:44 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id m31Ashm26568; Tue, 1 Apr 2008 19:54:43 +0900 (JST) Received: from togyo.jp.nec.com (togyo.jp.nec.com [10.26.220.4]) by mailsv4.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31Ash5A016406; Tue, 1 Apr 2008 19:54:43 +0900 (JST) Received: from TNESB07336 ([10.64.168.65] [10.64.168.65]) by mail.jp.nec.com with ESMTP; Tue, 1 Apr 2008 19:54:43 +0900 Message-Id: <2530BB4B166747659C8F65C9C3DE7CFB@nsl.ad.nec.co.jp> From: "Takashi Sato" To: "David Chinner" Cc: "David Chinner" , , , , , References: <20080328180736t-sato@mail.jp.nec.com> <20080331000057.GI108924158@sgi.com> In-Reply-To: <20080331000057.GI108924158@sgi.com> X-ASG-Orig-Subj: Re: [RFC PATCH 2/2] Add timeout feature Subject: Re: [RFC PATCH 2/2] Add timeout feature Date: Tue, 1 Apr 2008 19:54:42 +0900 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6000.16480 X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6000.16545 X-Barracuda-Connect: TYO201.gate.nec.co.jp[202.32.8.193] X-Barracuda-Start-Time: 1207047289 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46520 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15116 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: t-sato@yk.jp.nec.com Precedence: bulk X-list: xfs Hi, David Chinner wrote: > The timeout is not for the freeze operation - the timeout is > only set up once the freeze is complete. i.e: > > $ time sudo ~/test_src/xfs_io -f -x -c 'gfreeze 10' /mnt/scratch/test > freezing with level = 10 > > real 0m23.204s > user 0m0.008s > sys 0m0.012s > > The freeze takes 23s, and then the 10s timeout is started. So > this timeout does not protect against freeze_bdev() hangs at all. > All it does is introduce silent unfreezing of the block device that > can not be synchronised with the application that is operating > on the frozen device. Exactly my timeout feature is only for an application, not for freeze_bdev(). I think it is needed for the situation we can't unfreeze from userspace. (e.g. Freezing the root filesystem) > FWIW, resetting this timeout from userspace is unreliable - there's > no guarantee that under load your userspace process will get to run > again inside the timeout to reset it, hence leaving you with a > unfrozen filesystem when you really want it frozen... The timeout period specified to the reset ioctl should be much larger than the interval for calling the reset ioctl repeatedly. (e.g timeout period = 2 minutes, calling interval = 5 seconds) The reset ioctl will work under such setting. If a timeout still occurs before a reset, it would imply that an unexpected problem (e.g. deadlock) occur in an application. Cheers, Takashi From owner-xfs@oss.sgi.com Tue Apr 1 05:00:09 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 05:00:17 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m31C072n001075 for ; Tue, 1 Apr 2008 05:00:09 -0700 X-ASG-Debug-ID: 1207051242-1da2025b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp7-g19.free.fr (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 589EC126E470 for ; Tue, 1 Apr 2008 05:00:42 -0700 (PDT) Received: from smtp7-g19.free.fr (smtp7-g19.free.fr [212.27.42.64]) by cuda.sgi.com with ESMTP id 2UuA7V0r8wBjabRR for ; Tue, 01 Apr 2008 05:00:42 -0700 (PDT) Received: from smtp7-g19.free.fr (localhost [127.0.0.1]) by smtp7-g19.free.fr (Postfix) with ESMTP id DB223322882; Tue, 1 Apr 2008 14:00:41 +0200 (CEST) Received: from galadriel.home (pla78-1-82-235-234-79.fbx.proxad.net [82.235.234.79]) by smtp7-g19.free.fr (Postfix) with ESMTP id 8F95632286A; Tue, 1 Apr 2008 14:00:41 +0200 (CEST) Date: Tue, 1 Apr 2008 14:00:35 +0200 From: Emmanuel Florac To: David Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: Serious XFS crash Subject: Re: Serious XFS crash Message-ID: <20080401140035.46470306@galadriel.home> In-Reply-To: <20080325233611.GW103491721@sgi.com> References: <20080325185453.3a1957dd@galadriel.home> <20080325233611.GW103491721@sgi.com> Organization: Intellique X-Mailer: Claws Mail 2.9.1 (GTK+ 2.8.20; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Barracuda-Connect: smtp7-g19.free.fr[212.27.42.64] X-Barracuda-Start-Time: 1207051243 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46522 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id m31C092n001077 X-archive-position: 15117 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: eflorac@intellique.com Precedence: bulk X-list: xfs Le Wed, 26 Mar 2008 10:36:11 +1100 vous écriviez: > What sector size is being used for the XFS filesystem? If it's > not the same as teh filesystem block size, then XFS can't have done > this itself because the offset that this garbage starts at would > not be block aligned..... I've gone thru the logs. This machine had a serious XFS crash on march 6 due to bad blocks (failed drive in the RAID-5). Is it possible that the March 19 XFS crash is related to this, i. e. after running xfs_repair on march 6 it remained some on-disk garbage that provoked a new crash a couple of weeks later? Here is the march 6 crash : Mar 6 10:42:46 system3 kernel: [xfs_alloc_read_agf+244/432] xfs_alloc_read_agf+0xf4/0x1b0 Mar 6 10:42:46 system3 kernel: [xfs_alloc_fix_freelist+1000/1120] xfs_alloc_fix_freelist+0x3e8/0x460 Mar 6 10:42:46 system3 last message repeated 2 times Mar 6 10:42:46 system3 kernel: [_xfs_trans_commit+489/928] _xfs_trans_commit+0x1e9/0x3a0 Mar 6 10:42:46 system3 kernel: [xfs_free_extent+152/224] xfs_free_extent+0x98/0xe0 Mar 6 10:42:46 system3 kernel: [xfs_bmap_finish+263/400] xfs_bmap_finish+0x107/0x190 Mar 6 10:42:46 system3 kernel: [xfs_itruncate_finish+544/976] xfs_itruncate_finish+0x220/0x3d0 Mar 6 10:42:46 system3 kernel: [xfs_trans_ijoin+43/128] xfs_trans_ijoin+0x2b/0x80 Mar 6 10:42:46 system3 kernel: [xfs_inactive+1195/1296] xfs_inactive+0x4ab/0x510 Mar 6 10:42:46 system3 kernel: [xfs_fs_clear_inode+156/192] xfs_fs_clear_inode+0x9c/0xc0 Mar 6 10:42:46 system3 kernel: [invalidate_inode_buffers+21/112] invalidate_inode_buffers+0x15/0x70 Mar 6 10:42:46 system3 kernel: [clear_inode+212/320] clear_inode+0xd4/0x140 Mar 6 10:42:46 system3 kernel: [truncate_inode_pages+23/32] truncate_inode_pages+0x17/0x20 Mar 6 10:42:46 system3 kernel: [generic_delete_inode+264/272] generic_delete_inode+0x108/0x110 Mar 6 10:42:46 system3 kernel: [iput+83/112] iput+0x53/0x70 Mar 6 10:42:46 system3 kernel: [do_unlinkat+186/272] do_unlinkat+0xba/0x110 Mar 6 10:42:46 system3 kernel: [sys_fcntl64+89/144] sys_fcntl64+0x59/0x90 Mar 6 10:42:46 system3 kernel: [syscall_call+7/11] syscall_call+0x7/0xb Mar 6 10:42:46 system3 kernel: xfs_force_shutdown(md0,0x8) called from line 4267 of file fs/xfs/xfs_bmap.c. Return address = 0xc0256b29 Mar 6 10:51:19 system3 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0023): Sector repair completed:port=6, LBA=0xE6E00. Mar 6 10:51:20 system3 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0023): Sector repair completed:port=6, LBA=0xE6DCA. -- -------------------------------------------------- Emmanuel Florac www.intellique.com -------------------------------------------------- From owner-xfs@oss.sgi.com Tue Apr 1 05:15:47 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 05:15:54 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_24, J_CHICKENPOX_53 autolearn=no version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m31CFimi003143 for ; Tue, 1 Apr 2008 05:15:47 -0700 X-ASG-Debug-ID: 1207052180-5002001d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from tyo201.gate.nec.co.jp (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2BDFD72047F for ; Tue, 1 Apr 2008 05:16:20 -0700 (PDT) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193]) by cuda.sgi.com with ESMTP id IGMg2GftSWpUw1iF for ; Tue, 01 Apr 2008 05:16:20 -0700 (PDT) Received: from mailgate4.nec.co.jp ([10.7.69.184]) by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31CGIZw000059; Tue, 1 Apr 2008 21:16:18 +0900 (JST) Received: (from root@localhost) by mailgate4.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id m31CGIR06923; Tue, 1 Apr 2008 21:16:18 +0900 (JST) Received: from kaishu.jp.nec.com (kaishu.jp.nec.com [10.26.220.5]) by mailsv.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31CGHkb009544; Tue, 1 Apr 2008 21:16:17 +0900 (JST) Received: from TNESB07336 ([10.64.168.65] [10.64.168.65]) by mail.jp.nec.com with ESMTP; Tue, 1 Apr 2008 21:16:15 +0900 To: David Chinner Cc: "linux-fsdevel@vger.kernel.org" , "linux-ext4@vger.kernel.org" , "xfs@oss.sgi.com" , "dm-devel@redhat.com" , "linux-kernel@vger.kernel.org" X-ASG-Orig-Subj: [RFC PATCH 0/3] freeze feature ver 1.1 Subject: [RFC PATCH 0/3] freeze feature ver 1.1 Message-Id: <20080401211614t-sato@mail.jp.nec.com> Mime-Version: 1.0 X-Mailer: WeMail32[2.51] ID:1K0086 From: Takashi Sato Date: Tue, 1 Apr 2008 21:16:14 +0900 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Barracuda-Connect: TYO201.gate.nec.co.jp[202.32.8.193] X-Barracuda-Start-Time: 1207052181 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46524 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15118 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: t-sato@yk.jp.nec.com Precedence: bulk X-list: xfs Hi, David Chinner wrote: > Patch below to remove the XFS specific ioctl interfaces for this > functionality. Thank you very much for your patch to remove the XFS specific code. I have merged your patch into 2.6.25-rc7 and included it in the freeze patch-set as the following [PATCH 2/3]. [PATCH 1/3] Implement generic freeze feature The ioctls for the generic freeze feature are below. o Freeze the filesystem int ioctl(int fd, int FIFREEZE, arg) fd: The file descriptor of the mountpoint FIFREEZE: request code for the freeze arg: Ignored Return value: 0 if the operation succeeds. Otherwise, -1 o Unfreeze the filesystem int ioctl(int fd, int FITHAW, arg) fd: The file descriptor of the mountpoint FITHAW: request code for unfreeze arg: Ignored Return value: 0 if the operation succeeds. Otherwise, -1 [PATCH 2/3] Remove XFS specific ioctl interfaces for freeze feature It removes XFS specific ioctl interfaces and request codes for freeze feature. This patch has been supplied by David Chinner. [PATCH 3/3] Add timeout feature The timeout feature is added to freeze ioctl. And new ioctl to reset the timeout period is added. o Freeze the filesystem int ioctl(int fd, int FIFREEZE, long *timeval) fd: The file descriptor of the mountpoint FIFREEZE: request code for the freeze timeval: the timeout period in seconds If it's 0 or 1, the timeout isn't set. This special case of "1" is implemented to keep the compatibility with XFS applications. Return value: 0 if the operation succeeds. Otherwise, -1 o Reset the timeout period This is useful for the application to set the timeval more accurately. For example, the freezer resets the timeval to 10 seconds every 5 seconds. In this approach, even if the freezer causes a deadlock by accessing the frozen filesystem, it will be solved by the timeout in 10 seconds and the freezer can recognize that at the next reset of timeval. int ioctl(int fd, int FIFREEZE_RESET_TIMEOUT, long *timeval) fd:file descriptor of mountpoint FIFREEZE_RESET_TIMEOUT: request code for reset of timeout period timeval: new timeout period in seconds Return value: 0 if the operation succeeds. Otherwise, -1 Error number: If the filesystem has already been unfrozen, errno is set to EINVAL. Any comments are very welcome. Cheers, Takashi From owner-xfs@oss.sgi.com Tue Apr 1 05:17:00 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 05:17:07 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_53 autolearn=no version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m31CGw9j003456 for ; Tue, 1 Apr 2008 05:17:00 -0700 X-ASG-Debug-ID: 1207052253-1da103a50000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from tyo201.gate.nec.co.jp (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 695FF126E715 for ; Tue, 1 Apr 2008 05:17:33 -0700 (PDT) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193]) by cuda.sgi.com with ESMTP id 5Y3e8tZGWUuXbusR for ; Tue, 01 Apr 2008 05:17:33 -0700 (PDT) Received: from mailgate3.nec.co.jp (mailgate54C.nec.co.jp [10.7.69.197]) by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31CHWXX000909; Tue, 1 Apr 2008 21:17:32 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id m31CHWi24315; Tue, 1 Apr 2008 21:17:32 +0900 (JST) Received: from kuichi.jp.nec.com (kuichi.jp.nec.com [10.26.220.17]) by mailsv4.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31CHWcX005370; Tue, 1 Apr 2008 21:17:32 +0900 (JST) Received: from TNESB07336 ([10.64.168.65] [10.64.168.65]) by mail.jp.nec.com with ESMTP; Tue, 1 Apr 2008 21:17:29 +0900 To: David Chinner Cc: "linux-ext4@vger.kernel.org" , "xfs@oss.sgi.com" , "linux-fsdevel@vger.kernel.org" , "dm-devel@redhat.com" , "linux-kernel@vger.kernel.org" X-ASG-Orig-Subj: [RFC PATCH 1/3] Implement generic freeze feature Subject: [RFC PATCH 1/3] Implement generic freeze feature Message-Id: <20080401211729t-sato@mail.jp.nec.com> Mime-Version: 1.0 X-Mailer: WeMail32[2.51] ID:1K0086 From: Takashi Sato Date: Tue, 1 Apr 2008 21:17:29 +0900 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Barracuda-Connect: TYO201.gate.nec.co.jp[202.32.8.193] X-Barracuda-Start-Time: 1207052254 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.42 X-Barracuda-Spam-Status: No, SCORE=-1.42 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=MARKETING_SUBJECT X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46525 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.60 MARKETING_SUBJECT Subject contains popular marketing words X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15119 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: t-sato@yk.jp.nec.com Precedence: bulk X-list: xfs The ioctls for the generic freeze feature are below. o Freeze the filesystem int ioctl(int fd, int FIFREEZE, arg) fd: The file descriptor of the mountpoint FIFREEZE: request code for the freeze arg: Ignored Return value: 0 if the operation succeeds. Otherwise, -1 o Unfreeze the filesystem int ioctl(int fd, int FITHAW, arg) fd: The file descriptor of the mountpoint FITHAW: request code for unfreeze arg: Ignored Return value: 0 if the operation succeeds. Otherwise, -1 Signed-off-by: Takashi Sato Signed-off-by: Masayuki Hamaguchi --- fs/block_dev.c | 3 +++ fs/buffer.c | 25 +++++++++++++++++++++++++ fs/ioctl.c | 35 +++++++++++++++++++++++++++++++++++ fs/super.c | 32 +++++++++++++++++++++++++++++++- include/linux/fs.h | 7 +++++++ 5 files changed, 101 insertions(+), 1 deletion(-) diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7.org/fs/block_dev.c linux-2.6.25-rc7-freeze/fs/block_ dev.c --- linux-2.6.25-rc7.org/fs/block_dev.c 2008-03-26 10:38:14.000000000 +0900 +++ linux-2.6.25-rc7-freeze/fs/block_dev.c 2008-03-27 09:26:36.000000000 +0900 @@ -284,6 +284,9 @@ static void init_once(struct kmem_cache INIT_LIST_HEAD(&bdev->bd_holder_list); #endif inode_init_once(&ei->vfs_inode); + + /* Initialize semaphore for freeze. */ + sema_init(&bdev->bd_freeze_sem, 1); } static inline void __bd_forget(struct inode *inode) diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7.org/fs/buffer.c linux-2.6.25-rc7-freeze/fs/buffer.c --- linux-2.6.25-rc7.org/fs/buffer.c 2008-03-26 10:38:14.000000000 +0900 +++ linux-2.6.25-rc7-freeze/fs/buffer.c 2008-03-26 20:32:23.000000000 +0900 @@ -201,6 +201,19 @@ struct super_block *freeze_bdev(struct b { struct super_block *sb; + down(&bdev->bd_freeze_sem); + sb = get_super_without_lock(bdev); + + /* If super_block has been already frozen, return. */ + if (sb && sb->s_frozen != SB_UNFROZEN) { + put_super(sb); + up(&bdev->bd_freeze_sem); + return sb; + } + + if (sb) + put_super(sb); + down(&bdev->bd_mount_sem); sb = get_super(bdev); if (sb && !(sb->s_flags & MS_RDONLY)) { @@ -219,6 +232,9 @@ struct super_block *freeze_bdev(struct b } sync_blockdev(bdev); + + up(&bdev->bd_freeze_sem); + return sb; /* thaw_bdev releases s->s_umount and bd_mount_sem */ } EXPORT_SYMBOL(freeze_bdev); @@ -232,6 +248,13 @@ EXPORT_SYMBOL(freeze_bdev); */ void thaw_bdev(struct block_device *bdev, struct super_block *sb) { + down(&bdev->bd_freeze_sem); + + if (sb && sb->s_frozen == SB_UNFROZEN) { + up(&bdev->bd_freeze_sem); + return; + } + if (sb) { BUG_ON(sb->s_bdev != bdev); @@ -244,6 +267,8 @@ void thaw_bdev(struct block_device *bdev } up(&bdev->bd_mount_sem); + + up(&bdev->bd_freeze_sem); } EXPORT_SYMBOL(thaw_bdev); diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7.org/fs/ioctl.c linux-2.6.25-rc7-freeze/fs/ioctl.c --- linux-2.6.25-rc7.org/fs/ioctl.c 2008-03-26 10:38:14.000000000 +0900 +++ linux-2.6.25-rc7-freeze/fs/ioctl.c 2008-03-26 20:22:17.000000000 +0900 @@ -13,6 +13,7 @@ #include #include #include +#include #include @@ -181,6 +182,40 @@ int do_vfs_ioctl(struct file *filp, unsi } else error = -ENOTTY; break; + + case FIFREEZE: { + struct super_block *sb = filp->f_path.dentry->d_inode->i_sb; + + if (!capable(CAP_SYS_ADMIN)) { + error = -EPERM; + break; + } + + /* If filesystem doesn't support freeze feature, return. */ + if (sb->s_op->write_super_lockfs == NULL) { + error = -EINVAL; + break; + } + + /* Freeze. */ + freeze_bdev(sb->s_bdev); + + break; + } + + case FITHAW: { + struct super_block *sb = filp->f_path.dentry->d_inode->i_sb; + + if (!capable(CAP_SYS_ADMIN)) { + error = -EPERM; + break; + } + + /* Thaw. */ + thaw_bdev(sb->s_bdev, sb); + break; + } + default: if (S_ISREG(filp->f_path.dentry->d_inode->i_mode)) error = file_ioctl(filp, cmd, arg); diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7.org/fs/super.c linux-2.6.25-rc7-freeze/fs/super.c --- linux-2.6.25-rc7.org/fs/super.c 2008-03-26 10:38:14.000000000 +0900 +++ linux-2.6.25-rc7-freeze/fs/super.c 2008-03-26 20:23:21.000000000 +0900 @@ -154,7 +154,7 @@ int __put_super_and_need_restart(struct * Drops a temporary reference, frees superblock if there's no * references left. */ -static void put_super(struct super_block *sb) +void put_super(struct super_block *sb) { spin_lock(&sb_lock); __put_super(sb); @@ -507,6 +507,36 @@ rescan: EXPORT_SYMBOL(get_super); +/* + * get_super_without_lock - Get super_block from block_device without lock. + * @bdev: block device struct + * + * Scan the superblock list and finds the superblock of the file system + * mounted on the block device given. This doesn't lock anyone. + * %NULL is returned if no match is found. + */ +struct super_block *get_super_without_lock(struct block_device *bdev) +{ + struct super_block *sb; + + if (!bdev) + return NULL; + + spin_lock(&sb_lock); + list_for_each_entry(sb, &super_blocks, s_list) { + if (sb->s_bdev == bdev) { + if (sb->s_root) { + sb->s_count++; + spin_unlock(&sb_lock); + return sb; + } + } + } + spin_unlock(&sb_lock); + return NULL; +} +EXPORT_SYMBOL(get_super_without_lock); + struct super_block * user_get_super(dev_t dev) { struct super_block *sb; diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7.org/include/linux/fs.h linux-2.6.25-rc7-freeze/inclu de/linux/fs.h --- linux-2.6.25-rc7.org/include/linux/fs.h 2008-03-26 10:38:14.000000000 +0900 +++ linux-2.6.25-rc7-freeze/include/linux/fs.h 2008-03-26 20:27:44.000000000 +0900 @@ -223,6 +223,8 @@ extern int dir_notify_enable; #define BMAP_IOCTL 1 /* obsolete - kept for compatibility */ #define FIBMAP _IO(0x00,1) /* bmap access */ #define FIGETBSZ _IO(0x00,2) /* get the block size used for bmap */ +#define FIFREEZE _IOWR('X', 119, int) /* Freeze */ +#define FITHAW _IOWR('X', 120, int) /* Thaw */ #define FS_IOC_GETFLAGS _IOR('f', 1, long) #define FS_IOC_SETFLAGS _IOW('f', 2, long) @@ -548,6 +550,9 @@ struct block_device { * care to not mess up bd_private for that case. */ unsigned long bd_private; + + /* Semaphore for freeze */ + struct semaphore bd_freeze_sem; }; /* @@ -1926,7 +1931,9 @@ extern int do_vfs_ioctl(struct file *fil extern void get_filesystem(struct file_system_type *fs); extern void put_filesystem(struct file_system_type *fs); extern struct file_system_type *get_fs_type(const char *name); +extern void put_super(struct super_block *sb); extern struct super_block *get_super(struct block_device *); +extern struct super_block *get_super_without_lock(struct block_device *); extern struct super_block *user_get_super(dev_t); extern void drop_super(struct super_block *sb); From owner-xfs@oss.sgi.com Tue Apr 1 05:17:36 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 05:17:48 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m31CHWiL003648 for ; Tue, 1 Apr 2008 05:17:36 -0700 X-ASG-Debug-ID: 1207052287-27d8029e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from tyo202.gate.nec.co.jp (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E13E4126E71C for ; Tue, 1 Apr 2008 05:18:08 -0700 (PDT) Received: from tyo202.gate.nec.co.jp (TYO202.gate.nec.co.jp [202.32.8.206]) by cuda.sgi.com with ESMTP id KCrfxBzQz1I9jofA for ; Tue, 01 Apr 2008 05:18:08 -0700 (PDT) Received: from mailgate3.nec.co.jp (mailgate54.nec.co.jp [10.7.69.193]) by tyo202.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31CI7fG016064; Tue, 1 Apr 2008 21:18:07 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id m31CI7S13363; Tue, 1 Apr 2008 21:18:07 +0900 (JST) Received: from kaishu.jp.nec.com (kaishu.jp.nec.com [10.26.220.5]) by mailsv4.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31CI6fA005594; Tue, 1 Apr 2008 21:18:06 +0900 (JST) Received: from TNESB07336 ([10.64.168.65] [10.64.168.65]) by mail.jp.nec.com with ESMTP; Tue, 1 Apr 2008 21:18:06 +0900 To: David Chinner Cc: "linux-fsdevel@vger.kernel.org" , "linux-ext4@vger.kernel.org" , "xfs@oss.sgi.com" , "dm-devel@redhat.com" , "linux-kernel@vger.kernel.org" X-ASG-Orig-Subj: [RFC PATCH 2/3] Remove XFS specific ioctl interfaces for freeze feature Subject: [RFC PATCH 2/3] Remove XFS specific ioctl interfaces for freeze feature Message-Id: <20080401211806t-sato@mail.jp.nec.com> Mime-Version: 1.0 X-Mailer: WeMail32[2.51] ID:1K0086 From: Takashi Sato Date: Tue, 1 Apr 2008 21:18:05 +0900 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Barracuda-Connect: TYO202.gate.nec.co.jp[202.32.8.206] X-Barracuda-Start-Time: 1207052288 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46525 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15120 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: t-sato@yk.jp.nec.com Precedence: bulk X-list: xfs It removes XFS specific ioctl interfaces and request codes for freeze feature. This patch has been supplied by David Chinner. Signed-off-by: Dave Chinner Signed-off-by: Takashi Sato --- linux-2.6/xfs_ioctl.c | 15 --------------- linux-2.6/xfs_ioctl32.c | 2 -- xfs_fs.h | 4 ++-- 3 files changed, 2 insertions(+), 19 deletions(-) diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-freeze/fs/xfs/linux-2.6/xfs_ioctl.c linux-2.6.25-rc7 -xfs/fs/xfs/linux-2.6/xfs_ioctl.c --- linux-2.6.25-rc7-freeze/fs/xfs/linux-2.6/xfs_ioctl.c 2008-04-01 13:22:43.000000000 +0900 +++ linux-2.6.25-rc7-xfs/fs/xfs/linux-2.6/xfs_ioctl.c 2008-04-01 14:34:21.000000000 +0900 @@ -906,21 +906,6 @@ xfs_ioctl( return -error; } - case XFS_IOC_FREEZE: - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - - if (inode->i_sb->s_frozen == SB_UNFROZEN) - freeze_bdev(inode->i_sb->s_bdev); - return 0; - - case XFS_IOC_THAW: - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - if (inode->i_sb->s_frozen != SB_UNFROZEN) - thaw_bdev(inode->i_sb->s_bdev, inode->i_sb); - return 0; - case XFS_IOC_GOINGDOWN: { __uint32_t in; diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-freeze/fs/xfs/linux-2.6/xfs_ioctl32.c linux-2.6.25-r c7-xfs/fs/xfs/linux-2.6/xfs_ioctl32.c --- linux-2.6.25-rc7-freeze/fs/xfs/linux-2.6/xfs_ioctl32.c 2008-04-01 13:22:43.000000000 +0900 +++ linux-2.6.25-rc7-xfs/fs/xfs/linux-2.6/xfs_ioctl32.c 2008-04-01 14:31:59.000000000 +0900 @@ -398,8 +398,6 @@ xfs_compat_ioctl( case XFS_IOC_FSGROWFSDATA: case XFS_IOC_FSGROWFSLOG: case XFS_IOC_FSGROWFSRT: - case XFS_IOC_FREEZE: - case XFS_IOC_THAW: case XFS_IOC_GOINGDOWN: case XFS_IOC_ERROR_INJECTION: case XFS_IOC_ERROR_CLEARALL: diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-freeze/fs/xfs/xfs_fs.h linux-2.6.25-rc7-xfs/fs/xfs/x fs_fs.h --- linux-2.6.25-rc7-freeze/fs/xfs/xfs_fs.h 2008-04-01 13:22:48.000000000 +0900 +++ linux-2.6.25-rc7-xfs/fs/xfs/xfs_fs.h 2008-04-01 14:31:59.000000000 +0900 @@ -473,8 +473,8 @@ typedef struct xfs_handle { #define XFS_IOC_ERROR_INJECTION _IOW ('X', 116, struct xfs_error_injection) #define XFS_IOC_ERROR_CLEARALL _IOW ('X', 117, struct xfs_error_injection) /* XFS_IOC_ATTRCTL_BY_HANDLE -- deprecated 118 */ -#define XFS_IOC_FREEZE _IOWR('X', 119, int) -#define XFS_IOC_THAW _IOWR('X', 120, int) +/* XFS_IOC_FREEZE -- FIFREEZE 119 */ +/* XFS_IOC_THAW -- FITHAW 120 */ #define XFS_IOC_FSSETDM_BY_HANDLE _IOW ('X', 121, struct xfs_fsop_setdm_handlereq) #define XFS_IOC_ATTRLIST_BY_HANDLE _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq) #define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq) From owner-xfs@oss.sgi.com Tue Apr 1 05:21:31 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 05:21:39 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_24, J_CHICKENPOX_53 autolearn=no version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m31CLVf3004684 for ; Tue, 1 Apr 2008 05:21:31 -0700 X-ASG-Debug-ID: 1207052525-27d802d30000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from tyo201.gate.nec.co.jp (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id EFA11126E7E7 for ; Tue, 1 Apr 2008 05:22:06 -0700 (PDT) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193]) by cuda.sgi.com with ESMTP id 4l3t4s8J5i8Bqj2k for ; Tue, 01 Apr 2008 05:22:06 -0700 (PDT) Received: from mailgate3.nec.co.jp (mailgate54C.nec.co.jp [10.7.69.197]) by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31CM58p003950; Tue, 1 Apr 2008 21:22:05 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id m31CM5j28845; Tue, 1 Apr 2008 21:22:05 +0900 (JST) Received: from shoin.jp.nec.com (shoin.jp.nec.com [10.26.220.3]) by mailsv4.nec.co.jp (8.13.8/8.13.4) with ESMTP id m31CM5JV007657; Tue, 1 Apr 2008 21:22:05 +0900 (JST) Received: from TNESB07336 ([10.64.168.65] [10.64.168.65]) by mail.jp.nec.com with ESMTP; Tue, 1 Apr 2008 21:22:04 +0900 To: David Chinner Cc: "linux-ext4@vger.kernel.org" , "xfs@oss.sgi.com" , "dm-devel@redhat.com" , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" X-ASG-Orig-Subj: [RFC PATCH 3/3] Add timeout feature Subject: [RFC PATCH 3/3] Add timeout feature Message-Id: <20080401212204t-sato@mail.jp.nec.com> Mime-Version: 1.0 X-Mailer: WeMail32[2.51] ID:1K0086 From: Takashi Sato Date: Tue, 1 Apr 2008 21:22:04 +0900 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Barracuda-Connect: TYO201.gate.nec.co.jp[202.32.8.193] X-Barracuda-Start-Time: 1207052526 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46525 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15121 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: t-sato@yk.jp.nec.com Precedence: bulk X-list: xfs The timeout feature is added to freeze ioctl. And new ioctl to reset the timeout period is added. o Freeze the filesystem int ioctl(int fd, int FIFREEZE, long *timeval) fd: The file descriptor of the mountpoint FIFREEZE: request code for the freeze timeval: the timeout period in seconds If it's 0 or 1, the timeout isn't set. This special case of "1" is implemented to keep the compatibility with XFS applications. Return value: 0 if the operation succeeds. Otherwise, -1 o Reset the timeout period int ioctl(int fd, int FIFREEZE_RESET_TIMEOUT, long *timeval) fd:file descriptor of mountpoint FIFREEZE_RESET_TIMEOUT: request code for reset of timeout period timeval: new timeout period in seconds Return value: 0 if the operation succeeds. Otherwise, -1 Error number: If the filesystem has already been unfrozen, errno is set to EINVAL. Signed-off-by: Takashi Sato Signed-off-by: Masayuki Hamaguchi --- drivers/md/dm.c | 2 - fs/block_dev.c | 2 + fs/buffer.c | 14 ++++++++- fs/ioctl.c | 64 +++++++++++++++++++++++++++++++++++++++++++- fs/super.c | 52 +++++++++++++++++++++++++++++++++++ fs/xfs/xfs_fsops.c | 2 - include/linux/buffer_head.h | 2 - include/linux/fs.h | 8 +++++ 8 files changed, 140 insertions(+), 6 deletions(-) diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-xfs/drivers/md/dm.c linux-2.6.25-rc7-timeout/drivers /md/dm.c --- linux-2.6.25-rc7-xfs/drivers/md/dm.c 2008-04-01 14:21:37.000000000 +0900 +++ linux-2.6.25-rc7-timeout/drivers/md/dm.c 2008-04-01 13:25:11.000000000 +0900 @@ -1407,7 +1407,7 @@ static int lock_fs(struct mapped_device WARN_ON(md->frozen_sb); - md->frozen_sb = freeze_bdev(md->suspended_bdev); + md->frozen_sb = freeze_bdev(md->suspended_bdev, 0); if (IS_ERR(md->frozen_sb)) { r = PTR_ERR(md->frozen_sb); md->frozen_sb = NULL; diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-xfs/fs/block_dev.c linux-2.6.25-rc7-timeout/fs/block _dev.c --- linux-2.6.25-rc7-xfs/fs/block_dev.c 2008-04-01 14:22:34.000000000 +0900 +++ linux-2.6.25-rc7-timeout/fs/block_dev.c 2008-04-01 13:27:38.000000000 +0900 @@ -287,6 +287,8 @@ static void init_once(struct kmem_cache /* Initialize semaphore for freeze. */ sema_init(&bdev->bd_freeze_sem, 1); + /* Setup freeze timeout function. */ + INIT_DELAYED_WORK(&bdev->bd_freeze_timeout, freeze_timeout); } static inline void __bd_forget(struct inode *inode) diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-xfs/fs/buffer.c linux-2.6.25-rc7-timeout/fs/buffer.c --- linux-2.6.25-rc7-xfs/fs/buffer.c 2008-04-01 14:22:26.000000000 +0900 +++ linux-2.6.25-rc7-timeout/fs/buffer.c 2008-04-01 13:27:14.000000000 +0900 @@ -190,14 +190,17 @@ int fsync_bdev(struct block_device *bdev /** * freeze_bdev -- lock a filesystem and force it into a consistent state - * @bdev: blockdevice to lock + * @bdev: blockdevice to lock + * @timeout_msec: timeout period * * This takes the block device bd_mount_sem to make sure no new mounts * happen on bdev until thaw_bdev() is called. * If a superblock is found on this device, we take the s_umount semaphore * on it to make sure nobody unmounts until the snapshot creation is done. + * If timeout_msec is bigger than 0, this registers the delayed work for + * timeout of the freeze feature. */ -struct super_block *freeze_bdev(struct block_device *bdev) +struct super_block *freeze_bdev(struct block_device *bdev, long timeout_msec) { struct super_block *sb; @@ -233,6 +236,10 @@ struct super_block *freeze_bdev(struct b sync_blockdev(bdev); + /* Setup unfreeze timer. */ + if (timeout_msec > 0) + add_freeze_timeout(bdev, timeout_msec); + up(&bdev->bd_freeze_sem); return sb; /* thaw_bdev releases s->s_umount and bd_mount_sem */ @@ -255,6 +262,9 @@ void thaw_bdev(struct block_device *bdev return; } + /* Delete unfreeze timer. */ + del_freeze_timeout(bdev); + if (sb) { BUG_ON(sb->s_bdev != bdev); diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-xfs/fs/ioctl.c linux-2.6.25-rc7-timeout/fs/ioctl.c --- linux-2.6.25-rc7-xfs/fs/ioctl.c 2008-04-01 14:22:38.000000000 +0900 +++ linux-2.6.25-rc7-timeout/fs/ioctl.c 2008-04-01 13:27:46.000000000 +0900 @@ -184,6 +184,8 @@ int do_vfs_ioctl(struct file *filp, unsi break; case FIFREEZE: { + long timeout_sec; + long timeout_msec; struct super_block *sb = filp->f_path.dentry->d_inode->i_sb; if (!capable(CAP_SYS_ADMIN)) { @@ -197,8 +199,31 @@ int do_vfs_ioctl(struct file *filp, unsi break; } + /* arg(sec) to tick value. */ + error = get_user(timeout_sec, (long __user *) arg); + if (error != 0) + break; + /* + * If 1 is specified as the timeout period, + * it will be changed into 0 to keep the compatibility + * of XFS application(xfs_freeze). + */ + if (timeout_sec < 0) { + error = -EINVAL; + break; + } else if (timeout_sec < 2) { + timeout_sec = 0; + } + + timeout_msec = timeout_sec * 1000; + /* overflow case */ + if (timeout_msec < 0) { + error = -EINVAL; + break; + } + /* Freeze. */ - freeze_bdev(sb->s_bdev); + freeze_bdev(sb->s_bdev, timeout_msec); break; } @@ -216,6 +241,43 @@ int do_vfs_ioctl(struct file *filp, unsi break; } + case FIFREEZE_RESET_TIMEOUT: { + long timeout_sec; + long timeout_msec; + struct super_block *sb + = filp->f_path.dentry->d_inode->i_sb; + + if (!capable(CAP_SYS_ADMIN)) { + error = -EPERM; + break; + } + + /* arg(sec) to tick value */ + error = get_user(timeout_sec, (long __user *) arg); + if (error) + break; + timeout_msec = timeout_sec * 1000; + if (timeout_msec < 0) { + error = -EINVAL; + break; + } + + if (sb) { + down(&sb->s_bdev->bd_freeze_sem); + if (sb->s_frozen == SB_UNFROZEN) { + up(&sb->s_bdev->bd_freeze_sem); + error = -EINVAL; + break; + } + /* setup unfreeze timer */ + if (timeout_msec > 0) + add_freeze_timeout(sb->s_bdev, + timeout_msec); + up(&sb->s_bdev->bd_freeze_sem); + } + break; + } + default: if (S_ISREG(filp->f_path.dentry->d_inode->i_mode)) error = file_ioctl(filp, cmd, arg); diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-xfs/fs/super.c linux-2.6.25-rc7-timeout/fs/super.c --- linux-2.6.25-rc7-xfs/fs/super.c 2008-04-01 14:22:34.000000000 +0900 +++ linux-2.6.25-rc7-timeout/fs/super.c 2008-04-01 13:27:41.000000000 +0900 @@ -983,3 +983,55 @@ struct vfsmount *kern_mount_data(struct } EXPORT_SYMBOL_GPL(kern_mount_data); + +/* + * freeze_timeout - Thaw the filesystem. + * + * @work: work queue (delayed_work.work) + * + * Called by the delayed work when elapsing the timeout period. + * Thaw the filesystem. + */ +void freeze_timeout(struct work_struct *work) +{ + struct block_device *bd = container_of(work, + struct block_device, bd_freeze_timeout.work); + + struct super_block *sb = get_super_without_lock(bd); + + thaw_bdev(bd, sb); + + if (sb) + put_super(sb); +} +EXPORT_SYMBOL_GPL(freeze_timeout); + +/* + * add_freeze_timeout - Add timeout for freeze. + * + * @bdev: block device struct + * @timeout_msec: timeout period + * + * Add the delayed work for freeze timeout to the delayed work queue. + */ +void add_freeze_timeout(struct block_device *bdev, long timeout_msec) +{ + s64 timeout_jiffies = msecs_to_jiffies(timeout_msec); + + /* Set delayed work queue */ + cancel_delayed_work(&bdev->bd_freeze_timeout); + schedule_delayed_work(&bdev->bd_freeze_timeout, timeout_jiffies); +} + +/* + * del_freeze_timeout - Delete timeout for freeze. + * + * @bdev: block device struct + * + * Delete the delayed work for freeze timeout from the delayed work queue. + */ +void del_freeze_timeout(struct block_device *bdev) +{ + if (delayed_work_pending(&bdev->bd_freeze_timeout)) + cancel_delayed_work(&bdev->bd_freeze_timeout); +} diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-xfs/fs/xfs/xfs_fsops.c linux-2.6.25-rc7-timeout/fs/x fs/xfs_fsops.c --- linux-2.6.25-rc7-xfs/fs/xfs/xfs_fsops.c 2008-04-01 14:22:33.000000000 +0900 +++ linux-2.6.25-rc7-timeout/fs/xfs/xfs_fsops.c 2008-04-01 13:27:35.000000000 +0900 @@ -623,7 +623,7 @@ xfs_fs_goingdown( { switch (inflags) { case XFS_FSOP_GOING_FLAGS_DEFAULT: { - struct super_block *sb = freeze_bdev(mp->m_super->s_bdev); + struct super_block *sb = freeze_bdev(mp->m_super->s_bdev, 0); if (sb && !IS_ERR(sb)) { xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT); diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-xfs/include/linux/buffer_head.h linux-2.6.25-rc7-tim eout/include/linux/buffer_head.h --- linux-2.6.25-rc7-xfs/include/linux/buffer_head.h 2008-04-01 14:22:39.000000000 +0900 +++ linux-2.6.25-rc7-timeout/include/linux/buffer_head.h 2008-04-01 13:27:53.000000000 +0900 @@ -170,7 +170,7 @@ int sync_blockdev(struct block_device *b void __wait_on_buffer(struct buffer_head *); wait_queue_head_t *bh_waitq_head(struct buffer_head *bh); int fsync_bdev(struct block_device *); -struct super_block *freeze_bdev(struct block_device *); +struct super_block *freeze_bdev(struct block_device *, long timeout_msec); void thaw_bdev(struct block_device *, struct super_block *); int fsync_super(struct super_block *); int fsync_no_super(struct block_device *); diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc7-xfs/include/linux/fs.h linux-2.6.25-rc7-timeout/incl ude/linux/fs.h --- linux-2.6.25-rc7-xfs/include/linux/fs.h 2008-04-01 14:22:39.000000000 +0900 +++ linux-2.6.25-rc7-timeout/include/linux/fs.h 2008-04-01 13:27:53.000000000 +0900 @@ -8,6 +8,7 @@ #include #include +#include /* * It's silly to have NR_OPEN bigger than NR_FILE, but you can change @@ -225,6 +226,7 @@ extern int dir_notify_enable; #define FIGETBSZ _IO(0x00,2) /* get the block size used for bmap */ #define FIFREEZE _IOWR('X', 119, int) /* Freeze */ #define FITHAW _IOWR('X', 120, int) /* Thaw */ +#define FIFREEZE_RESET_TIMEOUT _IO(0x00, 3) /* Reset freeze timeout */ #define FS_IOC_GETFLAGS _IOR('f', 1, long) #define FS_IOC_SETFLAGS _IOW('f', 2, long) @@ -551,6 +553,8 @@ struct block_device { */ unsigned long bd_private; + /* Delayed work for freeze */ + struct delayed_work bd_freeze_timeout; /* Semaphore for freeze */ struct semaphore bd_freeze_sem; }; @@ -2104,5 +2108,9 @@ int proc_nr_files(struct ctl_table *tabl int get_filesystem_list(char * buf); +extern void add_freeze_timeout(struct block_device *bdev, long timeout_msec); +extern void del_freeze_timeout(struct block_device *bdev); +extern void freeze_timeout(struct work_struct *work); + #endif /* __KERNEL__ */ #endif /* _LINUX_FS_H */ From owner-xfs@oss.sgi.com Tue Apr 1 18:11:03 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:11:24 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_64 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321Aogu014887 for ; Tue, 1 Apr 2008 18:11:01 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA10735; Wed, 2 Apr 2008 09:15:53 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m31NFqsT117416482; Wed, 2 Apr 2008 09:15:53 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m31NFqKd118679480; Wed, 2 Apr 2008 09:15:52 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 09:15:52 +1000 From: David Chinner To: xfs-dev Cc: xfs-oss Subject: [Patch] Cacheline align xlog_t Message-ID: <20080401231552.GV103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15129 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Reorganise xlog_t for better cacheline isolation of contention To reduce contention on the log in large CPU count, separate out different parts of the xlog_t structure onto different cachelines. Move each lock onto a different cacheline along with all the members that are accessed/modified while that lock is held. Also, move the debugging code into debug code. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 5 +--- fs/xfs/xfs_log_priv.h | 55 +++++++++++++++++++++++++++----------------------- 2 files changed, 32 insertions(+), 28 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2008-03-13 14:03:38.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2008-03-13 14:20:21.803846380 +1100 @@ -1237,9 +1237,9 @@ xlog_alloc_log(xfs_mount_t *mp, XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1); iclog->ic_bp = bp; iclog->hic_data = bp->b_addr; - +#ifdef DEBUG log->l_iclog_bak[i] = (xfs_caddr_t)&(iclog->ic_header); - +#endif head = &iclog->ic_header; memset(head, 0, sizeof(xlog_rec_header_t)); head->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM); @@ -1250,7 +1250,6 @@ xlog_alloc_log(xfs_mount_t *mp, head->h_fmt = cpu_to_be32(XLOG_FMT); memcpy(&head->h_fs_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t)); - iclog->ic_size = XFS_BUF_SIZE(bp) - log->l_iclog_hsize; iclog->ic_state = XLOG_STATE_ACTIVE; iclog->ic_log = log; Index: 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log_priv.h 2008-03-13 14:06:58.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h 2008-03-13 14:20:31.478596832 +1100 @@ -402,8 +402,29 @@ typedef struct xlog_in_core { * that round off problems won't occur when releasing partial reservations. */ typedef struct log { + /* The following fields don't need locking */ + struct xfs_mount *l_mp; /* mount point */ + struct xfs_buf *l_xbuf; /* extra buffer for log + * wrapping */ + struct xfs_buftarg *l_targ; /* buftarg of log */ + uint l_flags; + uint l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */ + struct xfs_buf_cancel **l_buf_cancel_table; + int l_iclog_hsize; /* size of iclog header */ + int l_iclog_heads; /* # of iclog header sectors */ + uint l_sectbb_log; /* log2 of sector size in BBs */ + uint l_sectbb_mask; /* sector size (in BBs) + * alignment mask */ + int l_iclog_size; /* size of log in bytes */ + int l_iclog_size_log; /* log power size of log */ + int l_iclog_bufs; /* number of iclog buffers */ + xfs_daddr_t l_logBBstart; /* start block of log */ + int l_logsize; /* size of log in bytes */ + int l_logBBsize; /* size of log in BB chunks */ + /* The following block of fields are changed while holding icloglock */ - sema_t l_flushsema; /* iclog flushing semaphore */ + sema_t l_flushsema ____cacheline_aligned_in_smp; + /* iclog flushing semaphore */ int l_flushcnt; /* # of procs waiting on this * sema */ int l_covered_state;/* state of "covering disk @@ -413,27 +434,14 @@ typedef struct log { xfs_lsn_t l_tail_lsn; /* lsn of 1st LR with unflushed * buffers */ xfs_lsn_t l_last_sync_lsn;/* lsn of last LR on disk */ - struct xfs_mount *l_mp; /* mount point */ - struct xfs_buf *l_xbuf; /* extra buffer for log - * wrapping */ - struct xfs_buftarg *l_targ; /* buftarg of log */ - xfs_daddr_t l_logBBstart; /* start block of log */ - int l_logsize; /* size of log in bytes */ - int l_logBBsize; /* size of log in BB chunks */ int l_curr_cycle; /* Cycle number of log writes */ int l_prev_cycle; /* Cycle number before last * block increment */ int l_curr_block; /* current logical log block */ int l_prev_block; /* previous logical log block */ - int l_iclog_size; /* size of log in bytes */ - int l_iclog_size_log; /* log power size of log */ - int l_iclog_bufs; /* number of iclog buffers */ - - /* The following field are used for debugging; need to hold icloglock */ - char *l_iclog_bak[XLOG_MAX_ICLOGS]; /* The following block of fields are changed while holding grant_lock */ - spinlock_t l_grant_lock; + spinlock_t l_grant_lock ____cacheline_aligned_in_smp; xlog_ticket_t *l_reserve_headq; xlog_ticket_t *l_write_headq; int l_grant_reserve_cycle; @@ -441,20 +449,17 @@ typedef struct log { int l_grant_write_cycle; int l_grant_write_bytes; - /* The following fields don't need locking */ #ifdef XFS_LOG_TRACE struct ktrace *l_trace; struct ktrace *l_grant_trace; #endif - uint l_flags; - uint l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */ - struct xfs_buf_cancel **l_buf_cancel_table; - int l_iclog_hsize; /* size of iclog header */ - int l_iclog_heads; /* # of iclog header sectors */ - uint l_sectbb_log; /* log2 of sector size in BBs */ - uint l_sectbb_mask; /* sector size (in BBs) - * alignment mask */ -} xlog_t; + + /* The following field are used for debugging; need to hold icloglock */ +#ifdef DEBUG + char *l_iclog_bak[XLOG_MAX_ICLOGS]; +#endif + +} xlog_t ____cacheline_aligned_in_smp; #define XLOG_FORCED_SHUTDOWN(log) ((log)->l_flags & XLOG_IO_ERROR) From owner-xfs@oss.sgi.com Tue Apr 1 18:10:57 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:11:13 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321Aogq014887 for ; Tue, 1 Apr 2008 18:10:55 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA12833; Wed, 2 Apr 2008 10:17:59 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m320HwsT118671985; Wed, 2 Apr 2008 10:17:59 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m320HwEw118432079; Wed, 2 Apr 2008 10:17:58 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 10:17:58 +1000 From: David Chinner To: xfs-dev Cc: xfs-oss Subject: [Patch] fix lock inversion in forced unmount Message-ID: <20080402001758.GY103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15127 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Fix lock inversion in forced shutdown. Recent changes to xlog_state_release_iclog() placed the grant_lock inside the icloglock. forced unmount of the log does this the opposite way around, but does not depend on the order for correct working. Fix the inversion by changing the order locks are gained in xfs_log_force_umount(). Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2008-04-01 21:00:13.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2008-04-02 08:35:20.282633878 +1000 @@ -3502,8 +3502,8 @@ xfs_log_force_umount( * before we mark the filesystem SHUTDOWN and wake * everybody up to tell the bad news. */ - spin_lock(&log->l_grant_lock); spin_lock(&log->l_icloglock); + spin_lock(&log->l_grant_lock); mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN; XFS_BUF_DONE(mp->m_sb_bp); /* From owner-xfs@oss.sgi.com Tue Apr 1 18:11:10 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:11:41 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_52, SUBJ_TICKET autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321Aoh0014887 for ; Tue, 1 Apr 2008 18:11:08 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA10686; Wed, 2 Apr 2008 09:14:40 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m31NEdsT118620777; Wed, 2 Apr 2008 09:14:39 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m31NEdLb118697076; Wed, 2 Apr 2008 09:14:39 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 09:14:39 +1000 From: David Chinner To: xfs-dev Cc: xfs-oss Subject: [Patch] Remove xlog_ticket allocator Message-ID: <20080401231439.GU103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15132 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Remove the xlog_ticket allocator The ticket allocator is just a simple slab implementation internal to the log. It requires the icloglock to be held when manipulating it and this contributes to contention on that lock. Just kill the entire allocator and use a memory zone instead. While there, allow us to gracefully fail allocation with ENOMEM. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 137 ++++---------------------------------------------- fs/xfs/xfs_log_priv.h | 9 +-- fs/xfs/xfs_vfsops.c | 12 ++-- fs/xfs/xfsidbg.c | 11 +--- 4 files changed, 25 insertions(+), 144 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2008-03-13 13:58:08.866070224 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2008-03-13 14:03:38.448138656 +1100 @@ -41,6 +41,7 @@ #include "xfs_inode.h" #include "xfs_rw.h" +kmem_zone_t *xfs_log_ticket_zone; #define xlog_write_adv_cnt(ptr, len, off, bytes) \ { (ptr) += (bytes); \ @@ -73,8 +74,6 @@ STATIC int xlog_state_get_iclog_space(x xlog_ticket_t *ticket, int *continued_write, int *logoffsetp); -STATIC void xlog_state_put_ticket(xlog_t *log, - xlog_ticket_t *tic); STATIC int xlog_state_release_iclog(xlog_t *log, xlog_in_core_t *iclog); STATIC void xlog_state_switch_iclogs(xlog_t *log, @@ -101,7 +100,6 @@ STATIC void xlog_ungrant_log_space(xlog_ /* local ticket functions */ -STATIC void xlog_state_ticket_alloc(xlog_t *log); STATIC xlog_ticket_t *xlog_ticket_get(xlog_t *log, int unit_bytes, int count, @@ -330,7 +328,7 @@ xfs_log_done(xfs_mount_t *mp, */ xlog_trace_loggrant(log, ticket, "xfs_log_done: (non-permanent)"); xlog_ungrant_log_space(log, ticket); - xlog_state_put_ticket(log, ticket); + xlog_ticket_put(log, ticket); } else { xlog_trace_loggrant(log, ticket, "xfs_log_done: (permanent)"); xlog_regrant_reserve_log_space(log, ticket); @@ -469,6 +467,8 @@ xfs_log_reserve(xfs_mount_t *mp, /* may sleep if need to allocate more tickets */ internal_ticket = xlog_ticket_get(log, unit_bytes, cnt, client, flags); + if (!internal_ticket) + return XFS_ERROR(ENOMEM); internal_ticket->t_trans_type = t_type; *ticket = internal_ticket; xlog_trace_loggrant(log, internal_ticket, @@ -693,7 +693,7 @@ xfs_log_unmount_write(xfs_mount_t *mp) if (tic) { xlog_trace_loggrant(log, tic, "unmount rec"); xlog_ungrant_log_space(log, tic); - xlog_state_put_ticket(log, tic); + xlog_ticket_put(log, tic); } } else { /* @@ -1208,7 +1208,6 @@ xlog_alloc_log(xfs_mount_t *mp, spin_lock_init(&log->l_icloglock); spin_lock_init(&log->l_grant_lock); initnsema(&log->l_flushsema, 0, "ic-flush"); - xlog_state_ticket_alloc(log); /* wait until after icloglock inited */ /* log record size must be multiple of BBSIZE; see xlog_rec_header_t */ ASSERT((XFS_BUF_SIZE(bp) & BBMASK) == 0); @@ -1541,7 +1540,6 @@ STATIC void xlog_dealloc_log(xlog_t *log) { xlog_in_core_t *iclog, *next_iclog; - xlog_ticket_t *tic, *next_tic; int i; iclog = log->l_iclog; @@ -1562,22 +1560,6 @@ xlog_dealloc_log(xlog_t *log) spinlock_destroy(&log->l_icloglock); spinlock_destroy(&log->l_grant_lock); - /* XXXsup take a look at this again. */ - if ((log->l_ticket_cnt != log->l_ticket_tcnt) && - !XLOG_FORCED_SHUTDOWN(log)) { - xfs_fs_cmn_err(CE_WARN, log->l_mp, - "xlog_dealloc_log: (cnt: %d, total: %d)", - log->l_ticket_cnt, log->l_ticket_tcnt); - /* ASSERT(log->l_ticket_cnt == log->l_ticket_tcnt); */ - - } else { - tic = log->l_unmount_free; - while (tic) { - next_tic = tic->t_next; - kmem_free(tic, PAGE_SIZE); - tic = next_tic; - } - } xfs_buf_free(log->l_xbuf); #ifdef XFS_LOG_TRACE if (log->l_trace != NULL) { @@ -2798,18 +2780,6 @@ xlog_ungrant_log_space(xlog_t *log, /* - * Atomically put back used ticket. - */ -STATIC void -xlog_state_put_ticket(xlog_t *log, - xlog_ticket_t *tic) -{ - spin_lock(&log->l_icloglock); - xlog_ticket_put(log, tic); - spin_unlock(&log->l_icloglock); -} /* xlog_state_put_ticket */ - -/* * Flush iclog to disk if this is the last reference to the given iclog and * the WANT_SYNC bit is set. * @@ -3179,92 +3149,19 @@ xlog_state_want_sync(xlog_t *log, xlog_i */ /* - * Algorithm doesn't take into account page size. ;-( - */ -STATIC void -xlog_state_ticket_alloc(xlog_t *log) -{ - xlog_ticket_t *t_list; - xlog_ticket_t *next; - xfs_caddr_t buf; - uint i = (PAGE_SIZE / sizeof(xlog_ticket_t)) - 2; - - /* - * The kmem_zalloc may sleep, so we shouldn't be holding the - * global lock. XXXmiken: may want to use zone allocator. - */ - buf = (xfs_caddr_t) kmem_zalloc(PAGE_SIZE, KM_SLEEP); - - spin_lock(&log->l_icloglock); - - /* Attach 1st ticket to Q, so we can keep track of allocated memory */ - t_list = (xlog_ticket_t *)buf; - t_list->t_next = log->l_unmount_free; - log->l_unmount_free = t_list++; - log->l_ticket_cnt++; - log->l_ticket_tcnt++; - - /* Next ticket becomes first ticket attached to ticket free list */ - if (log->l_freelist != NULL) { - ASSERT(log->l_tail != NULL); - log->l_tail->t_next = t_list; - } else { - log->l_freelist = t_list; - } - log->l_ticket_cnt++; - log->l_ticket_tcnt++; - - /* Cycle through rest of alloc'ed memory, building up free Q */ - for ( ; i > 0; i--) { - next = t_list + 1; - t_list->t_next = next; - t_list = next; - log->l_ticket_cnt++; - log->l_ticket_tcnt++; - } - t_list->t_next = NULL; - log->l_tail = t_list; - spin_unlock(&log->l_icloglock); -} /* xlog_state_ticket_alloc */ - - -/* - * Put ticket into free list - * - * Assumption: log lock is held around this call. + * Free a used ticket. */ STATIC void xlog_ticket_put(xlog_t *log, xlog_ticket_t *ticket) { sv_destroy(&ticket->t_sema); - - /* - * Don't think caching will make that much difference. It's - * more important to make debug easier. - */ -#if 0 - /* real code will want to use LIFO for caching */ - ticket->t_next = log->l_freelist; - log->l_freelist = ticket; - /* no need to clear fields */ -#else - /* When we debug, it is easier if tickets are cycled */ - ticket->t_next = NULL; - if (log->l_tail) { - log->l_tail->t_next = ticket; - } else { - ASSERT(log->l_freelist == NULL); - log->l_freelist = ticket; - } - log->l_tail = ticket; -#endif /* DEBUG */ - log->l_ticket_cnt++; + kmem_zone_free(xfs_log_ticket_zone, ticket); } /* xlog_ticket_put */ /* - * Grab ticket off freelist or allocation some more + * Allocate and initialise a new log ticket. */ STATIC xlog_ticket_t * xlog_ticket_get(xlog_t *log, @@ -3276,21 +3173,9 @@ xlog_ticket_get(xlog_t *log, xlog_ticket_t *tic; uint num_headers; - alloc: - if (log->l_freelist == NULL) - xlog_state_ticket_alloc(log); /* potentially sleep */ - - spin_lock(&log->l_icloglock); - if (log->l_freelist == NULL) { - spin_unlock(&log->l_icloglock); - goto alloc; - } - tic = log->l_freelist; - log->l_freelist = tic->t_next; - if (log->l_freelist == NULL) - log->l_tail = NULL; - log->l_ticket_cnt--; - spin_unlock(&log->l_icloglock); + tic = kmem_zone_zalloc(xfs_log_ticket_zone, KM_SLEEP|KM_MAYFAIL); + if (!tic) + return NULL; /* * Permanent reservations have up to 'cnt'-1 active log operations Index: 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log_priv.h 2008-03-13 13:59:10.806160556 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h 2008-03-13 14:06:58.110733971 +1100 @@ -242,7 +242,7 @@ typedef struct xlog_res { typedef struct xlog_ticket { sv_t t_sema; /* sleep on this semaphore : 20 */ - struct xlog_ticket *t_next; /* :4|8 */ + struct xlog_ticket *t_next; /* :4|8 */ struct xlog_ticket *t_prev; /* :4|8 */ xlog_tid_t t_tid; /* transaction identifier : 4 */ int t_curr_res; /* current reservation in bytes : 4 */ @@ -406,13 +406,8 @@ typedef struct log { sema_t l_flushsema; /* iclog flushing semaphore */ int l_flushcnt; /* # of procs waiting on this * sema */ - int l_ticket_cnt; /* free ticket count */ - int l_ticket_tcnt; /* total ticket count */ int l_covered_state;/* state of "covering disk * log entries" */ - xlog_ticket_t *l_freelist; /* free list of tickets */ - xlog_ticket_t *l_unmount_free;/* kmem_free these addresses */ - xlog_ticket_t *l_tail; /* free list of tickets */ xlog_in_core_t *l_iclog; /* head log queue */ spinlock_t l_icloglock; /* grab to change iclog state */ xfs_lsn_t l_tail_lsn; /* lsn of 1st LR with unflushed @@ -478,6 +473,8 @@ extern struct xfs_buf *xlog_get_bp(xlog_ extern void xlog_put_bp(struct xfs_buf *); extern int xlog_bread(xlog_t *, xfs_daddr_t, int, struct xfs_buf *); +extern kmem_zone_t *xfs_log_ticket_zone; + /* iclog tracing */ #define XLOG_TRACE_GRAB_FLUSH 1 #define XLOG_TRACE_REL_FLUSH 2 Index: 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vfsops.c 2008-03-13 13:58:08.866070224 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c 2008-03-13 13:59:59.208010688 +1100 @@ -68,15 +68,17 @@ xfs_init(void) /* * Initialize all of the zone allocators we use. */ + xfs_log_ticket_zone = kmem_zone_init(sizeof(xlog_ticket_t), + "xfs_log_ticket"); xfs_bmap_free_item_zone = kmem_zone_init(sizeof(xfs_bmap_free_item_t), - "xfs_bmap_free_item"); + "xfs_bmap_free_item"); xfs_btree_cur_zone = kmem_zone_init(sizeof(xfs_btree_cur_t), - "xfs_btree_cur"); - xfs_trans_zone = kmem_zone_init(sizeof(xfs_trans_t), "xfs_trans"); - xfs_da_state_zone = - kmem_zone_init(sizeof(xfs_da_state_t), "xfs_da_state"); + "xfs_btree_cur"); + xfs_da_state_zone = kmem_zone_init(sizeof(xfs_da_state_t), + "xfs_da_state"); xfs_dabuf_zone = kmem_zone_init(sizeof(xfs_dabuf_t), "xfs_dabuf"); xfs_ifork_zone = kmem_zone_init(sizeof(xfs_ifork_t), "xfs_ifork"); + xfs_trans_zone = kmem_zone_init(sizeof(xfs_trans_t), "xfs_trans"); xfs_acl_zone_init(xfs_acl_zone, "xfs_acl"); xfs_mru_cache_init(); xfs_filestream_init(); Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2008-03-13 13:07:25.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2008-03-13 14:10:13.489855395 +1100 @@ -5607,9 +5607,9 @@ xfsidbg_xiclog(xlog_in_core_t *iclog) be32_to_cpu(iclog->ic_header.h_magicno), be32_to_cpu(iclog->ic_header.h_cycle), be32_to_cpu(iclog->ic_header.h_version), - be64_to_cpu(iclog->ic_header.h_lsn)); + (unsigned long long)be64_to_cpu(iclog->ic_header.h_lsn)); kdb_printf("tail_lsn: 0x%Lx len: %d prev_block: %d num_ops: %d\n", - be64_to_cpu(iclog->ic_header.h_tail_lsn), + (unsigned long long)be64_to_cpu(iclog->ic_header.h_tail_lsn), be32_to_cpu(iclog->ic_header.h_len), be32_to_cpu(iclog->ic_header.h_prev_block), be32_to_cpu(iclog->ic_header.h_num_logops)); @@ -5829,11 +5829,8 @@ xfsidbg_xlog(xlog_t *log) }; kdb_printf("xlog at 0x%p\n", log); - kdb_printf("&flushsm: 0x%p flushcnt: %d tic_cnt: %d tic_tcnt: %d \n", - &log->l_flushsema, log->l_flushcnt, - log->l_ticket_cnt, log->l_ticket_tcnt); - kdb_printf("freelist: 0x%p tail: 0x%p ICLOG: 0x%p \n", - log->l_freelist, log->l_tail, log->l_iclog); + kdb_printf("&flushsm: 0x%p flushcnt: %d ICLOG: 0x%p \n", + &log->l_flushsema, log->l_flushcnt, log->l_iclog); kdb_printf("&icloglock: 0x%p tail_lsn: %s last_sync_lsn: %s \n", &log->l_icloglock, xfs_fmtlsn(&log->l_tail_lsn), xfs_fmtlsn(&log->l_last_sync_lsn)); From owner-xfs@oss.sgi.com Tue Apr 1 18:10:54 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:11:09 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321Aogo014887 for ; Tue, 1 Apr 2008 18:10:52 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA13155; Wed, 2 Apr 2008 10:29:43 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m320TgsT118520072; Wed, 2 Apr 2008 10:29:42 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m320TeXR116352804; Wed, 2 Apr 2008 10:29:40 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 10:29:40 +1000 From: David Chinner To: Eric Sandeen Cc: xfs-oss Subject: Re: [PATCH] combined features2 fixup patches (updating/rewriting what was sent in other threads) Message-ID: <20080402002940.GZ103491721@sgi.com> References: <47F0546C.9070709@sandeen.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47F0546C.9070709@sandeen.net> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15126 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Sun, Mar 30, 2008 at 10:03:08PM -0500, Eric Sandeen wrote: > Ensure "both" features2 slots are consistent, and set mp attr2 flag. > > Since older kernels may look in the sb_bad_features2 slot for > flags, rather than zeroing it out on fixup, we should make it > equal to the sb_features2 value. > > Also, if the ATTR2 flag was not found prior to features2 > fixup, it was not set in the mount flags, so re-check after the > fixup so that the current session will use the feature. > > Also fix up the comments to reflect these changes. > > Signed-off-by: Eric Sandeen > --- > > Index: linux-2.6-xfs/fs/xfs/xfs_mount.c > =================================================================== > --- linux-2.6-xfs.orig/fs/xfs/xfs_mount.c > +++ linux-2.6-xfs/fs/xfs/xfs_mount.c > @@ -967,22 +967,26 @@ xfs_mountfs( > xfs_mount_common(mp, sbp); > > /* > - * Check for a bad features2 field alignment. This happened on > - * some platforms due to xfs_sb_t not being 64bit size aligned > - * when sb_features was added and hence the compiler put it in > - * the wrong place. > + * Check for a mismatched features2 values. Older kernels > + * read & wrote into the wrong sb offset for sb_features2 > + * on some platforms due to xfs_sb_t not being 64bit size aligned > + * when sb_features2 was added, which made older superblock > + * reading/writing routines swap it as a 64-bit value. > * > - * If we detect a bad field, we or the set bits into the existing > - * features2 field in case it has already been modified and we > - * don't want to lose any features. Zero the bad one and mark > - * the two fields as needing updates once the transaction subsystem > - * is online. > + * For backwards compatibility, we make both slots equal. > + * > + * If we detect a mismatched field, we OR the set bits into the > + * existing features2 field in case it has already been modified; we > + * don't want to lose any features. We then update the bad location > + * with the ORed value so that older kernels will see any features2 > + * flags, and mark the two fields as needing updates once the > + * transaction subsystem is online. > */ > - if (xfs_sb_has_bad_features2(sbp)) { > + if (xfs_sb_has_mismatched_features2(sbp)) { > cmn_err(CE_WARN, > "XFS: correcting sb_features alignment problem"); > sbp->sb_features2 |= sbp->sb_bad_features2; > - sbp->sb_bad_features2 = 0; > + sbp->sb_bad_features2 = sbp->sb_features2; > update_flags |= XFS_SB_FEATURES2 | XFS_SB_BAD_FEATURES2; Probably should update XFS_MOUNT_ATTR2 here, not later. i.e. before we mount he log and start recovery. > @@ -1181,6 +1185,12 @@ xfs_mountfs( > xfs_mount_log_sb(mp, update_flags); > > /* > + * Re-check for ATTR2 in case it was found in bad_features2 slot. > + */ > + if (xfs_sb_version_hasattr2(&mp->m_sb)) > + mp->m_flags |= XFS_MOUNT_ATTR2; > + Rather than here. > /* > - * Detect a bad features2 field > + * Detect a mismatched features2 field. Older kernels read/wrote > + * this into the wrong slot, so to be safe we keep them in sync. > */ > -static inline int xfs_sb_has_bad_features2(xfs_sb_t *sbp) > +static inline int xfs_sb_has_mismatched_features2(xfs_sb_t *sbp) > { > - return (sbp->sb_bad_features2 != 0); > + return (sbp->sb_bad_features2 != sbp->sb_features2); > } Yep, makes sense. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 18:11:07 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:11:41 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321Aogw014887 for ; Tue, 1 Apr 2008 18:11:04 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA10655; Wed, 2 Apr 2008 09:13:49 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m31NDnsT118704719; Wed, 2 Apr 2008 09:13:49 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m31NDmb3116282386; Wed, 2 Apr 2008 09:13:48 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 09:13:48 +1000 From: David Chinner To: xfs-dev Cc: xfs-oss Subject: [Patch] Per iclog callback chain lock Message-ID: <20080401231348.GT103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15131 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Introduce an iclog callback chain lock. Rather than use the icloglock for protecting the iclog completion callback chain, use a new per-iclog lock so that walking the callback chain doesn't require holding a global lock. This reduces contention on the icloglock during log buffer I/O completion as the callback chain lock is take for every callback that is issued. On large log buffers, this can number in the hundreds to thousands per iclog so isolating the lock to the iclog makes a lot of sense. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 35 +++++++++++++++++++---------------- fs/xfs/xfs_log_priv.h | 33 ++++++++++++++++++++++++++------- 2 files changed, 45 insertions(+), 23 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2008-03-13 13:10:23.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2008-03-13 19:35:51.251913648 +1100 @@ -397,12 +397,10 @@ xfs_log_notify(xfs_mount_t *mp, /* mo void *iclog_hndl, /* iclog to hang callback off */ xfs_log_callback_t *cb) { - xlog_t *log = mp->m_log; xlog_in_core_t *iclog = (xlog_in_core_t *)iclog_hndl; int abortflg; - cb->cb_next = NULL; - spin_lock(&log->l_icloglock); + spin_lock(&iclog->ic_callback_lock); abortflg = (iclog->ic_state & XLOG_STATE_IOERROR); if (!abortflg) { ASSERT_ALWAYS((iclog->ic_state == XLOG_STATE_ACTIVE) || @@ -411,7 +409,7 @@ xfs_log_notify(xfs_mount_t *mp, /* mo *(iclog->ic_callback_tail) = cb; iclog->ic_callback_tail = &(cb->cb_next); } - spin_unlock(&log->l_icloglock); + spin_unlock(&iclog->ic_callback_lock); return abortflg; } /* xfs_log_notify */ @@ -1257,6 +1255,8 @@ xlog_alloc_log(xfs_mount_t *mp, iclog->ic_size = XFS_BUF_SIZE(bp) - log->l_iclog_hsize; iclog->ic_state = XLOG_STATE_ACTIVE; iclog->ic_log = log; + atomic_set(&iclog->ic_refcnt, 0); + spin_lock_init(&iclog->ic_callback_lock); iclog->ic_callback_tail = &(iclog->ic_callback); iclog->ic_datap = (char *)iclog->hic_data + log->l_iclog_hsize; @@ -1990,7 +1990,7 @@ xlog_state_clean_log(xlog_t *log) if (iclog->ic_state == XLOG_STATE_DIRTY) { iclog->ic_state = XLOG_STATE_ACTIVE; iclog->ic_offset = 0; - iclog->ic_callback = NULL; /* don't need to free */ + ASSERT(iclog->ic_callback == NULL); /* * If the number of ops in this iclog indicate it just * contains the dummy transaction, we can @@ -2193,37 +2193,40 @@ xlog_state_do_callback( be64_to_cpu(iclog->ic_header.h_lsn); spin_unlock(&log->l_grant_lock); - /* - * Keep processing entries in the callback list - * until we come around and it is empty. We - * need to atomically see that the list is - * empty and change the state to DIRTY so that - * we don't miss any more callbacks being added. - */ - spin_lock(&log->l_icloglock); } else { + spin_unlock(&log->l_icloglock); ioerrors++; } - cb = iclog->ic_callback; + /* + * Keep processing entries in the callback list until + * we come around and it is empty. We need to + * atomically see that the list is empty and change the + * state to DIRTY so that we don't miss any more + * callbacks being added. + */ + spin_lock(&iclog->ic_callback_lock); + cb = iclog->ic_callback; while (cb) { iclog->ic_callback_tail = &(iclog->ic_callback); iclog->ic_callback = NULL; - spin_unlock(&log->l_icloglock); + spin_unlock(&iclog->ic_callback_lock); /* perform callbacks in the order given */ for (; cb; cb = cb_next) { cb_next = cb->cb_next; cb->cb_func(cb->cb_arg, aborted); } - spin_lock(&log->l_icloglock); + spin_lock(&iclog->ic_callback_lock); cb = iclog->ic_callback; } loopdidcallbacks++; funcdidcallbacks++; + spin_lock(&log->l_icloglock); ASSERT(iclog->ic_callback == NULL); + spin_unlock(&iclog->ic_callback_lock); if (!(iclog->ic_state & XLOG_STATE_IOERROR)) iclog->ic_state = XLOG_STATE_DIRTY; Index: 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log_priv.h 2008-02-22 13:48:25.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h 2008-03-13 19:34:57.430809151 +1100 @@ -324,6 +324,19 @@ typedef struct xlog_rec_ext_header { * - ic_offset is the current number of bytes written to in this iclog. * - ic_refcnt is bumped when someone is writing to the log. * - ic_state is the state of the iclog. + * + * Because of cacheline contention on large machines, we need to separate + * various resources onto different cachelines. To start with, make the + * structure cacheline aligned. The following fields can be contended on + * by independent processes: + * + * - ic_callback_* + * - ic_refcnt + * - fields protected by the global l_icloglock + * + * so we need to ensure that these fields are located in separate cachelines. + * We'll put all the read-only and l_icloglock fields in the first cacheline, + * and move everything else out to subsequent cachelines. */ typedef struct xlog_iclog_fields { sv_t ic_forcesema; @@ -332,18 +345,23 @@ typedef struct xlog_iclog_fields { struct xlog_in_core *ic_prev; struct xfs_buf *ic_bp; struct log *ic_log; - xfs_log_callback_t *ic_callback; - xfs_log_callback_t **ic_callback_tail; -#ifdef XFS_LOG_TRACE - struct ktrace *ic_trace; -#endif int ic_size; int ic_offset; - atomic_t ic_refcnt; int ic_bwritecnt; ushort_t ic_state; char *ic_datap; /* pointer to iclog data */ -} xlog_iclog_fields_t; +#ifdef XFS_LOG_TRACE + struct ktrace *ic_trace; +#endif + + /* Callback structures need their own cacheline */ + spinlock_t ic_callback_lock ____cacheline_aligned_in_smp; + xfs_log_callback_t *ic_callback; + xfs_log_callback_t **ic_callback_tail; + + /* reference counts need their own cacheline */ + atomic_t ic_refcnt ____cacheline_aligned_in_smp; +} xlog_iclog_fields_t ____cacheline_aligned_in_smp; typedef union xlog_in_core2 { xlog_rec_header_t hic_header; @@ -366,6 +384,7 @@ typedef struct xlog_in_core { #define ic_bp hic_fields.ic_bp #define ic_log hic_fields.ic_log #define ic_callback hic_fields.ic_callback +#define ic_callback_lock hic_fields.ic_callback_lock #define ic_callback_tail hic_fields.ic_callback_tail #define ic_trace hic_fields.ic_trace #define ic_size hic_fields.ic_size From owner-xfs@oss.sgi.com Tue Apr 1 18:11:14 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:11:27 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321Aoh2014887 for ; Tue, 1 Apr 2008 18:11:11 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA11016; Wed, 2 Apr 2008 09:24:08 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m31NO7sT118601451; Wed, 2 Apr 2008 09:24:07 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m31NO6CX118689677; Wed, 2 Apr 2008 09:24:06 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 09:24:06 +1000 From: David Chinner To: David Chinner Cc: xfs-dev , xfs-oss Subject: Re: [Patch] Per iclog callback chain lock Message-ID: <20080401232406.GX103491721@sgi.com> References: <20080401231348.GT103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080401231348.GT103491721@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15130 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 09:13:48AM +1000, David Chinner wrote: > Introduce an iclog callback chain lock. > > Rather than use the icloglock for protecting the iclog completion > callback chain, use a new per-iclog lock so that walking the > callback chain doesn't require holding a global lock. > > This reduces contention on the icloglock during log buffer I/O > completion as the callback chain lock is take for every callback > that is issued. This is not accurate - the callback chain is removed in bulk then walked without the lock, but will loop over the iclog chain in case callbacks were added while processing the chain (not sure if that can even happen, though). [mental note - don't write patch descriptions before first coffee completion occurs.] Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 18:11:00 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:11:19 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321Aogs014887 for ; Tue, 1 Apr 2008 18:10:58 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA10899; Wed, 2 Apr 2008 09:18:16 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m31NIGsT118164335; Wed, 2 Apr 2008 09:18:16 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m31NIFof118690659; Wed, 2 Apr 2008 09:18:15 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 09:18:15 +1000 From: David Chinner To: xfs-dev Cc: xfs-oss Subject: [Patch] unique per-AG inode generation number initialisation Message-ID: <20080401231815.GW103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15128 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Don't initialise new inode generation numbers to zero When we allocation new inode chunks, we initialise the generation numbers to zero. This works fine until we delete a chunk and then reallocate it, resulting in the same inode numbers but with a reset generation count. This can result in inode/generation pairs of different inodes occurring relatively close together. Given that the inode/gen pair makes up the "unique" portion of an NFS filehandle on XFS, this can result in file handles cached on clients being seen on the wire from the server but refer to a different file. This causes .... issues for NFS clients. Hence we need a unique generation number initialisation for each inode to prevent reuse of a small portion of the generation number space. Make this initialiser per-allocation group so that it is not a single point of contention in the filesystem, and increment it on every allocation within an AG to reduce the chance that a generation number is reused for a given inode number if the inode chunk is deleted and reallocated immediately afterwards. It is safe to add the agi_newinogen field to the AGI without using a feature bit. If an older kernel is used, it simply will not update the field on allocation. If the kernel is updated and the field has garbage in it, then it's like having a random seed to the generation number.... Signed-off-by: Dave Chinner --- fs/xfs/xfs_ag.h | 4 +++- fs/xfs/xfs_ialloc.c | 30 ++++++++++++++++++++++-------- 2 files changed, 25 insertions(+), 9 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_ag.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ag.h 2008-01-18 18:30:06.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_ag.h 2008-03-26 13:03:41.122918236 +1100 @@ -121,6 +121,7 @@ typedef struct xfs_agi { * still being referenced. */ __be32 agi_unlinked[XFS_AGI_UNLINKED_BUCKETS]; + __be32 agi_newinogen; /* inode cluster generation */ } xfs_agi_t; #define XFS_AGI_MAGICNUM 0x00000001 @@ -134,7 +135,8 @@ typedef struct xfs_agi { #define XFS_AGI_NEWINO 0x00000100 #define XFS_AGI_DIRINO 0x00000200 #define XFS_AGI_UNLINKED 0x00000400 -#define XFS_AGI_NUM_BITS 11 +#define XFS_AGI_NEWINOGEN 0x00000800 +#define XFS_AGI_NUM_BITS 12 #define XFS_AGI_ALL_BITS ((1 << XFS_AGI_NUM_BITS) - 1) /* disk block (xfs_daddr_t) in the AG */ Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-25 15:41:27.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-26 14:29:47.998554368 +1100 @@ -309,6 +309,8 @@ xfs_ialloc_ag_alloc( free = XFS_MAKE_IPTR(args.mp, fbuf, i); free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC); free->di_core.di_version = version; + free->di_core.di_gen = agi->agi_newinogen; + be32_add_cpu(&agi->agi_newinogen, 1); free->di_next_unlinked = cpu_to_be32(NULLAGINO); xfs_ialloc_log_di(tp, fbuf, i, XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED); @@ -347,7 +349,8 @@ xfs_ialloc_ag_alloc( * Log allocation group header fields */ xfs_ialloc_log_agi(tp, agbp, - XFS_AGI_COUNT | XFS_AGI_FREECOUNT | XFS_AGI_NEWINO); + XFS_AGI_COUNT | XFS_AGI_FREECOUNT | + XFS_AGI_NEWINO | XFS_AGI_NEWINOGEN); /* * Modify/log superblock values for inode count and inode free count. */ @@ -896,11 +899,12 @@ nextag: ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino + offset); XFS_INOBT_CLR_FREE(&rec, offset); rec.ir_freecount--; + be32_add_cpu(&agi->agi_newinogen, 1); if ((error = xfs_inobt_update(cur, rec.ir_startino, rec.ir_freecount, rec.ir_free))) goto error0; be32_add(&agi->agi_freecount, -1); - xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT); + xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT | XFS_AGI_NEWINOGEN); down_read(&mp->m_peraglock); mp->m_perag[tagno].pagi_freecount--; up_read(&mp->m_peraglock); @@ -1320,6 +1324,11 @@ xfs_ialloc_compute_maxlevels( /* * Log specified fields for the ag hdr (inode section) + * + * We don't log the unlinked inode fields through here; they + * get logged directly to the buffer. Hence we have a discontinuity + * in the fields we are logging and we need two calls to map all + * the dirtied parts of the agi.... */ void xfs_ialloc_log_agi( @@ -1342,22 +1351,27 @@ xfs_ialloc_log_agi( offsetof(xfs_agi_t, agi_newino), offsetof(xfs_agi_t, agi_dirino), offsetof(xfs_agi_t, agi_unlinked), + offsetof(xfs_agi_t, agi_newinogen), sizeof(xfs_agi_t) }; + int log_newino = fields & XFS_AGI_NEWINOGEN; + #ifdef DEBUG xfs_agi_t *agi; /* allocation group header */ agi = XFS_BUF_TO_AGI(bp); ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC); #endif - /* - * Compute byte offsets for the first and last fields. - */ + fields &= ~XFS_AGI_NEWINOGEN; + + /* Compute byte offsets for the first and last fields. */ xfs_btree_offsets(fields, offsets, XFS_AGI_NUM_BITS, &first, &last); - /* - * Log the allocation group inode header buffer. - */ xfs_trans_log_buf(tp, bp, first, last); + if (log_newino) { + xfs_btree_offsets(XFS_AGI_NEWINOGEN, offsets, XFS_AGI_NUM_BITS, + &first, &last); + xfs_trans_log_buf(tp, bp, first, last); + } } /* From owner-xfs@oss.sgi.com Tue Apr 1 18:11:17 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:11:42 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321Aoh4014887 for ; Tue, 1 Apr 2008 18:11:15 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA10354; Wed, 2 Apr 2008 09:00:46 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m31N0jsT118685679; Wed, 2 Apr 2008 09:00:46 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m31N0jA4117633674; Wed, 2 Apr 2008 09:00:45 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 09:00:44 +1000 From: David Chinner To: David Chinner Cc: xfs-dev , xfs@oss.sgi.com Subject: Re: [Review] Improve XFS error checking and propagation Message-ID: <20080401230044.GS103491721@sgi.com> References: <20080311010420.GD155407@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080311010420.GD155407@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15133 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs ping? On Tue, Mar 11, 2008 at 12:04:21PM +1100, David Chinner wrote: > A recent paper at the FAST08 conference highlighted a large number > of unchecked error paths in Linux filesystems and I/O layers. As a > subsystem, XFS had the highest aggregate numbers of bad error > propagation. A tarball which contains a quilt patch series of 32 > patches aimed at improving this situation can be found here: > > http://oss.sgi.com/~dgc/xfs/error-check/xfs-error-checking.tar.gz > > The paper "EIO: Error Handling is Occasionally Correct" can be found > here: > > http://www.cs.wisc.edu/adsl/Publications/eio-fast08.html > > And the in depth results here: > > http://www.cs.wisc.edu/adsl/Publications/eio-fast08/readme.html > http://www.cs.wisc.edu/adsl/Publications/eio-fast08/ > > The XFS results I've been working from are here: > > http://www.cs.wisc.edu/adsl/Publications/eio-fast08/fullfs-xfs-without-false-positives.txt > > and included below is an annotated version of this file as I've > worked through it. The graph of the XFS error paths is a good > visual representation of how the bad error paths tend to cluster > together: > > http://www.cs.wisc.edu/adsl/Publications/eio-fast08/singlefs-xfs.pdf > > (you'll need at least 800% zoom to be able to read it at all) > > The paper analysed a 2.6.15 kernel, but I've been working against an > xfs-dev tree (~2.6.24). Of the 101 reported problems for the 2.6.15 > kernel that was analysed: > > - 7 did not exist anymore (bhv layer, dirv1, write path changes) > - 11 were false positives that were not modified > - 24 were false positives that have been patched to remove > (e.g. int xfs_foo() to void xfs_foo()) > - 37 real problems where an error needed to be returned and are > fixed in the patch series. > - 3 where there is no error path to return an error and no > point in even warning about it (ENOSPC flushing) > - 10 where there is no error path to return an error, but > patched to warn to the syslog about potential data loss > or metadata I/O errors > - 4 were already fixed in the xfs-dev tree > - 2 where the error is ignored because we must continue anyway > (patched to warn to syslog) > - 4 that I haven't yet fixed (xfs_buf_iostrategy and > xfs_buf_iostart) because I need to think about them more. > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group > > > ---------------------------------------- fs/xfs/ ---- > > d 1 xfs_write -> _xfs_log_force fs/xfs/linux-2.6/xfs_lrw.c 881 > d 2 xfs_write -> _xfs_log_force fs/xfs/linux-2.6/xfs_lrw.c 884 > F 3 xfs_flush_device -> _xfs_log_force fs/xfs/linux-2.6/xfs_super.c 547 > F 4 xfs_qm_dqflush -> _xfs_log_force fs/xfs/quota/xfs_dquot.c 1294 > F 5 xfs_qm_dqflock_pushbuf_wait -> _xfs_log_force fs/xfs/quota/xfs_dquot.c 1591 > F 6 xfs_qm_dqunpin_wait -> _xfs_log_force fs/xfs/quota/xfs_dquot_item.c 204 > F 7 xfs_qm_dquot_logitem_pushbuf -> _xfs_log_force fs/xfs/quota/xfs_dquot_item.c 267 > F 8 xfs_alloc_search_busy -> _xfs_log_force fs/xfs/xfs_alloc.c 2593 > F 9 xfs_iunpin_wait -> _xfs_log_force fs/xfs/xfs_inode.c 2847 > F 10 xfs_iflush -> _xfs_log_force fs/xfs/xfs_inode.c 3243 > F 11 xfs_inode_item_pushbuf -> _xfs_log_force fs/xfs/xfs_inode_item.c 819 > P 12 xfs_log_unmount_write -> _xfs_log_force fs/xfs/xfs_log.c 529 > F 13 xlog_recover_finish -> _xfs_log_force fs/xfs/xfs_log_recover.c 3961 > F 14 xfs_unmountfs -> _xfs_log_force fs/xfs/xfs_mount.c 1088 > F 15 xfs_trans_push_ail -> _xfs_log_force fs/xfs/xfs_trans_ail.c 198 > F 16 xfs_syncsub -> _xfs_log_force fs/xfs/xfs_vfsops.c 1440 > F 17 xfs_syncsub -> _xfs_log_force fs/xfs/xfs_vfsops.c 1455 > F 18 xfs_syncsub -> _xfs_log_force fs/xfs/xfs_vfsops.c 1491 > F 19 xfs_syncsub -> _xfs_log_force fs/xfs/xfs_vfsops.c 1543 > P 20 xfs_fsync -> _xfs_log_force fs/xfs/xfs_vnodeops.c 1129 > P 21 xfs_qm_write_sb_changes -> _xfs_trans_commit fs/xfs/quota/xfs_qm.c 2414 > P 22 xfs_qm_scall_setqlim -> _xfs_trans_commit fs/xfs/quota/xfs_qm_syscalls.c 739 > P 23 xfs_itruncate_finish -> _xfs_trans_commit fs/xfs/xfs_inode.c 1718 > P 24 xlog_recover_process_efi -> _xfs_trans_commit fs/xfs/xfs_log_recover.c 3047 > P 25 xlog_recover_clear_agi_bucket -> _xfs_trans_commit fs/xfs/xfs_log_recover.c 3174 > PB 26 xfs_mount_log_sbunit -> _xfs_trans_commit fs/xfs/xfs_mount.c 1579 > P 27 xfs_growfs_rt_alloc -> _xfs_trans_commit fs/xfs/xfs_rtalloc.c 154 > P 28 xfs_growfs_rt_alloc -> _xfs_trans_commit fs/xfs/xfs_rtalloc.c 191 > P 29 xfs_growfs_rt -> _xfs_trans_commit fs/xfs/xfs_rtalloc.c 2103 > P 30 xfs_inactive_attrs -> _xfs_trans_commit fs/xfs/xfs_vnodeops.c 1505 > C 31 xfs_inactive -> _xfs_trans_commit fs/xfs/xfs_vnodeops.c 1790 > d 32 xfs_initialize_vnode -> bhv_insert fs/xfs/linux-2.6/xfs_super.c 220 > d 33 vfs_insertops -> bhv_insert fs/xfs/linux-2.6/xfs_vfs.c 259 > N 34 linvfs_truncate -> block_truncate_page fs/xfs/linux-2.6/xfs_iops.c 651 > G 35 fs_flushinval_pages -> filemap_fdatawait fs/xfs/linux-2.6/xfs_fs_subr.c 83 > G 36 fs_flush_pages -> filemap_fdatawait fs/xfs/linux-2.6/xfs_fs_subr.c 108 > G 37 fs_flushinval_pages -> filemap_fdatawrite fs/xfs/linux-2.6/xfs_fs_subr.c 82 > G 38 fs_flush_pages -> filemap_fdatawrite fs/xfs/linux-2.6/xfs_fs_subr.c 105 > n 39 xfs_flush_inode_work -> filemap_flush fs/xfs/linux-2.6/xfs_super.c 508 > PM 40 xlog_sync -> pagebuf_associate_memory fs/xfs/xfs_log.c 1358 > PM 41 xlog_sync -> pagebuf_associate_memory fs/xfs/xfs_log.c 1395 > PM 42 xlog_write_log_records -> pagebuf_associate_memory fs/xfs/xfs_log_recover.c 1156 > PM 43 xlog_write_log_records -> pagebuf_associate_memory fs/xfs/xfs_log_recover.c 1159 > PM 44 xlog_do_recovery_pass -> pagebuf_associate_memory fs/xfs/xfs_log_recover.c 3646 > PM 45 xlog_do_recovery_pass -> pagebuf_associate_memory fs/xfs/xfs_log_recover.c 3653 > PM 46 xlog_do_recovery_pass -> pagebuf_associate_memory fs/xfs/xfs_log_recover.c 3705 > PM 47 xlog_do_recovery_pass -> pagebuf_associate_memory fs/xfs/xfs_log_recover.c 3711 > M 48 xfs_buf_read_flags -> pagebuf_iostart fs/xfs/linux-2.6/xfs_buf.c 636 > M 49 xfsbufd -> pagebuf_iostrategy fs/xfs/linux-2.6/xfs_buf.c 1755 > M 50 xfs_flush_buftarg -> pagebuf_iostrategy fs/xfs/linux-2.6/xfs_buf.c 1816 > M 51 XFS_bwrite -> pagebuf_iostrategy fs/xfs/linux-2.6/xfs_buf.h 503 > n 52 xfs_flush_device_work -> sync_blockdev fs/xfs/linux-2.6/xfs_super.c 533 > f 53 exit_xfs_fs -> unregister_filesystem fs/xfs/linux-2.6/xfs_super.c 999 > PM 54 xfs_acl_vset -> xfs_acl_vremove fs/xfs/xfs_acl.c 326 > f 55 xfs_ialloc_ag_select -> xfs_alloc_pagf_init fs/xfs/xfs_ialloc.c 411 > P 56 xfs_qm_dqflush -> xfs_bawrite fs/xfs/quota/xfs_dquot.c 1300 > N 57 xfs_qm_dqflock_pushbuf_wait -> xfs_bawrite fs/xfs/quota/xfs_dquot.c 1595 > N 58 xfs_qm_dquot_logitem_pushbuf -> xfs_bawrite fs/xfs/quota/xfs_dquot_item.c 275 > N 59 xfs_buf_item_push -> xfs_bawrite fs/xfs/xfs_buf_item.c 669 > P 60 xfs_iflush -> xfs_bawrite fs/xfs/xfs_inode.c 3249 > N 61 xfs_inode_item_pushbuf -> xfs_bawrite fs/xfs/xfs_inode_item.c 823 > F 62 xfs_qm_dqflush -> xfs_bdwrite fs/xfs/quota/xfs_dquot.c 1298 > F 63 xfs_qm_dqiter_bufs -> xfs_bdwrite fs/xfs/quota/xfs_qm.c 1551 > F 64 xfs_iflush -> xfs_bdwrite fs/xfs/xfs_inode.c 3247 > F 65 xlog_recover_do_buffer_trans -> xfs_bdwrite fs/xfs/xfs_log_recover.c 2271 > F 66 xlog_recover_do_inode_trans -> xfs_bdwrite fs/xfs/xfs_log_recover.c 2535 > F 67 xlog_recover_do_dquot_trans -> xfs_bdwrite fs/xfs/xfs_log_recover.c 2664 > C 68 xfs_inactive -> xfs_bmap_finish fs/xfs/xfs_vnodeops.c 1788 > P 69 xfs_iomap_write_allocate -> xfs_bmap_last_offset fs/xfs/xfs_iomap.c 787 > d 70 xfs_dir_leaf_rebalance -> xfs_dir_leaf_compact fs/xfs/xfs_dir_leaf.c 1146 > d 71 xfs_dir_leaf_rebalance -> xfs_dir_leaf_compact fs/xfs/xfs_dir_leaf.c 1176 > d 72 xfs_dir_leaf_to_shortform -> xfs_dir_shortform_addname fs/xfs/xfs_dir_leaf.c 693 > P 73 xlog_recover_process_efi -> xfs_free_extent fs/xfs/xfs_log_recover.c 3041 > n 74 xfs_inode_item_push -> xfs_iflush fs/xfs/xfs_inode_item.c 879 > P 75 xlog_recover_do_inode_trans -> xfs_imap fs/xfs/xfs_log_recover.c 2320 > f 76 xfs_bmap_add_extent -> xfs_mod_incore_sb fs/xfs/xfs_bmap.c 689 > f 77 xfs_bmap_add_extent_hole_delay -> xfs_mod_incore_sb fs/xfs/xfs_bmap.c 1918 > f 78 xfs_bmap_del_extent -> xfs_mod_incore_sb fs/xfs/xfs_bmap.c 3117 > f 79 xfs_bmapi -> xfs_mod_incore_sb fs/xfs/xfs_bmap.c 4801 > f 80 xfs_bmapi -> xfs_mod_incore_sb fs/xfs/xfs_bmap.c 4805 > f 81 xfs_bunmapi -> xfs_mod_incore_sb fs/xfs/xfs_bmap.c 5452 > f 82 xfs_bunmapi -> xfs_mod_incore_sb fs/xfs/xfs_bmap.c 5458 > f 83 xfs_trans_reserve -> xfs_mod_incore_sb fs/xfs/xfs_trans.c 305 > P 84 xfs_qm_quotacheck -> xfs_mount_reset_sbqflags fs/xfs/quota/xfs_qm.c 1962 > N 85 xfs_qm_dqpurge -> xfs_qm_dqflush fs/xfs/quota/xfs_dquot.c 1505 > NB 86 xfs_qm_dquot_logitem_push -> xfs_qm_dqflush fs/xfs/quota/xfs_dquot_item.c 168 > N 87 xfs_qm_shake_freelist -> xfs_qm_dqflush fs/xfs/quota/xfs_qm.c 2134 > N 88 xfs_qm_dqreclaim_one -> xfs_qm_dqflush fs/xfs/quota/xfs_qm.c 2306 > PM 89 xfs_qm_quotacheck -> xfs_qm_dqflush_all fs/xfs/quota/xfs_qm.c 1930 > P 90 xfs_qm_scall_quotaoff -> xfs_qm_log_quotaoff fs/xfs/quota/xfs_qm_syscalls.c 291 > P 91 xfs_qm_scall_quotaoff -> xfs_qm_log_quotaoff_end fs/xfs/quota/xfs_qm_syscalls.c 347 > F 92 xfs_qm_newmount -> xfs_qm_mount_quotas fs/xfs/quota/xfs_qm_bhv.c 273 > F 93 xfs_qm_endmount -> xfs_qm_mount_quotas fs/xfs/quota/xfs_qm_bhv.c 301 > PM 94 xfs_quiesce_fs -> xfs_syncsub fs/xfs/xfs_vfsops.c 632 > P 95 xlog_recover_process_efi -> xfs_trans_reserve fs/xfs/xfs_log_recover.c 3036 > P 96 xlog_recover_clear_agi_bucket -> xfs_trans_reserve fs/xfs/xfs_log_recover.c 3152 > P 97 xfs_qm_scall_trunc_qfiles -> xfs_truncate_file fs/xfs/quota/xfs_qm_syscalls.c 395 > P 98 xfs_qm_scall_trunc_qfiles -> xfs_truncate_file fs/xfs/quota/xfs_qm_syscalls.c 404 > P 99 xfs_log_unmount_write -> xlog_state_release_iclog fs/xfs/xfs_log.c 570 > P 100 xfs_log_unmount_write -> xlog_state_release_iclog fs/xfs/xfs_log.c 606 > f 101 xfs_log_force_umount -> xlog_state_sync_all fs/xfs/xfs_log.c 3586 > > f = false positive > F = false positive + patch to remove condition > G = patch in mainline git tree already > M = __must_check annotations found this as well > P = real, patch to fix > n = no error path to return error > N = no error path to return error, patch to warn about error added > d = does not exist anymore. > B = some other bug found and fixed at same time > C = error ignored, must continue anyway. If silent, made noisy > > Notes: > > - all the xfs_mod_incore_sb() are false positive because they are freeing > blocks or extents which means there can never be an error returned. The only > error that can be returned is ENOSPC when trying to allocate blocks.... > > - none of the callers of xfs_mount_log_sb() check the return value. > > - new function xfs_log_sbcount failed to check return of xfs_trans_commit. > Callers are failing to check return value. > > - most of the callers to xfs_log_force() are not interested in errors - they'll > get them through other means (i.e. log error implies filesystem shutdown). > Only a handful of callers really should return errors, such as fsync(), > sync writes or synchronous transaction commits. > -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 18:24:33 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:24:45 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m321OUSP019932 for ; Tue, 1 Apr 2008 18:24:31 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14823; Wed, 2 Apr 2008 11:24:58 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m321OvsT62904127; Wed, 2 Apr 2008 11:24:57 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m321Oqp6116799419; Wed, 2 Apr 2008 11:24:52 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 11:24:52 +1000 From: David Chinner To: Michael Nishimoto Cc: David Chinner , XFS Mailing List Subject: Re: Definition of XFS_DQUOT_LOGRES() Message-ID: <20080402012452.GA103491721@sgi.com> References: <47F16988.2080406@agami.com> <20080401012856.GL103491721@sgi.com> <47F2A321.60907@agami.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47F2A321.60907@agami.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15134 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Apr 01, 2008 at 02:03:29PM -0700, Michael Nishimoto wrote: > David Chinner wrote: > >On Mon, Mar 31, 2008 at 03:45:28PM -0700, Michael Nishimoto wrote: > >>The comment for XFS_DQUOT_LOGRES states that we need to reserve space > >>for 3 dquots. I can't figure out why we need to add this amount to *all* > >>operations and why this amount wasn't added after doing a runtime > >>quotaon check. > > > >It probably could be done that way. But given that: > > > >>/* > >> * In the worst case, when both user and group quotas are on, > >> * we can have a max of three dquots changing in a single transaction. > >> */ > >>#define XFS_DQUOT_LOGRES(mp) (sizeof(xfs_disk_dquot_t) * 3) > > > >sizeof(xfs_disk_dquot_t) = 104 bytes, > > > >the overall addition to the reservations is minor considering: > > > >[0]kdb> xtrres 0xe0000038055ac6c0 > >write: 109752 truncate: 223672 rename: 305976 > >link: 153144 remove: 153144 symlink: 158520 > >create: 158392 mkdir: 158392 ifree: 58936 > >ichange: 2104 growdata: 45696 swrite: 384 > >addafork: 70584 writeid: 384 attrinval: 179328 > >attrset: 22968 attrrm: 90552 clearagi: 1152 > >growrtalloc: 66048 growrtzero: 4224 growrtfree: 6272 > >[0]kdb> > > > >on a 14GB filesystem most of the transactions this is added to > >are on the far side of 150k and that means we're talking about less > >than 0.2% of the entire reservation comes from the dquot. With > >larger block sizes and/or larger filesystems, these get much > >larger. e.g. same 14GB device, 64k block size instead of 4k: > > > >[0]kdb> xtrres 0xe00000b8027d39f8 > >write: 987576 truncate: 1977272 rename: 2891064 > >link: 1445688 remove: 1445688 symlink: 1512504 > >create: 1511864 mkdir: 1511864 ifree: 470584 > >ichange: 1592 growdata: 395904 swrite: 384 > >addafork: 658616 writeid: 384 attrinval: 1581696 > >attrset: 329656 attrrm: 791480 clearagi: 640 > >growrtalloc: 592640 growrtzero: 65664 growrtfree: 67200 > > > >The rename reservation is *2.8MB* (up from 300k). IOWs, 300 bytes is > >really noise when it comes to reservation space. (OT: See why I want to > >increase the log size now? :) > > > >Is it worth the complexity of adding this dquot reservation at > >runtime for a best case reduction of 0.2% in log space reservation > >usage? Probably not, but patches can be convincing ;) > > > >Cheers, > > > >Dave. > > Here is a patch to fix a sign problem when growing the log to 2G. > Michael, can you repost this patch with a Signed-off-by tag and..... > --- xfs_log.2.c 2008-04-01 11:55:45.000000000 -0700 > +++ xfs_log.3.c 2008-04-01 11:56:53.000000000 -0700 > @@ -230,20 +230,24 @@ > static void > xlog_grant_add_space_write(struct log *log, int bytes) > { > - log->l_grant_write_bytes += bytes; > - if (log->l_grant_write_bytes > log->l_logsize) { > - log->l_grant_write_bytes -= log->l_logsize; > - log->l_grant_write_cycle++; > + int __tmp = (log)->l_logsize - (log)->l_grant_write_bytes; No need for "__" in the tmp var, nor the () around log... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 18:50:03 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 18:50:12 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m321o0oJ026399 for ; Tue, 1 Apr 2008 18:50:02 -0700 X-ASG-Debug-ID: 1207101033-62f700850000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ext.agami.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D07B67268CC for ; Tue, 1 Apr 2008 18:50:33 -0700 (PDT) Received: from ext.agami.com (64.221.212.177.ptr.us.xo.net [64.221.212.177]) by cuda.sgi.com with ESMTP id oowllHUU0NRSfBDv for ; Tue, 01 Apr 2008 18:50:33 -0700 (PDT) Received: from agami.com (mail [192.168.168.5]) by ext.agami.com (8.12.5/8.12.5) with ESMTP id m31L37eL029945 for ; Tue, 1 Apr 2008 14:03:07 -0700 Received: from mx1.agami.com (mx1.agami.com [10.123.10.30]) by agami.com (8.12.11/8.12.11) with ESMTP id m31L37ec015683 for ; Tue, 1 Apr 2008 14:03:07 -0700 Received: from [10.123.4.142] ([10.123.4.142]) by mx1.agami.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 1 Apr 2008 14:03:30 -0700 Message-ID: <47F2A321.60907@agami.com> Date: Tue, 01 Apr 2008 14:03:29 -0700 From: Michael Nishimoto User-Agent: Mail/News 1.5.0.4 (X11/20060629) MIME-Version: 1.0 To: David Chinner CC: XFS Mailing List X-ASG-Orig-Subj: Re: Definition of XFS_DQUOT_LOGRES() Subject: Re: Definition of XFS_DQUOT_LOGRES() References: <47F16988.2080406@agami.com> <20080401012856.GL103491721@sgi.com> In-Reply-To: <20080401012856.GL103491721@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 01 Apr 2008 21:03:30.0160 (UTC) FILETIME=[D66E6700:01C8943B] X-Barracuda-Connect: 64.221.212.177.ptr.us.xo.net[64.221.212.177] X-Barracuda-Start-Time: 1207101036 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46580 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15135 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: miken@agami.com Precedence: bulk X-list: xfs David Chinner wrote: > On Mon, Mar 31, 2008 at 03:45:28PM -0700, Michael Nishimoto wrote: >> The comment for XFS_DQUOT_LOGRES states that we need to reserve space >> for 3 dquots. I can't figure out why we need to add this amount to *all* >> operations and why this amount wasn't added after doing a runtime >> quotaon check. > > It probably could be done that way. But given that: > >> /* >> * In the worst case, when both user and group quotas are on, >> * we can have a max of three dquots changing in a single transaction. >> */ >> #define XFS_DQUOT_LOGRES(mp) (sizeof(xfs_disk_dquot_t) * 3) > > sizeof(xfs_disk_dquot_t) = 104 bytes, > > the overall addition to the reservations is minor considering: > > [0]kdb> xtrres 0xe0000038055ac6c0 > write: 109752 truncate: 223672 rename: 305976 > link: 153144 remove: 153144 symlink: 158520 > create: 158392 mkdir: 158392 ifree: 58936 > ichange: 2104 growdata: 45696 swrite: 384 > addafork: 70584 writeid: 384 attrinval: 179328 > attrset: 22968 attrrm: 90552 clearagi: 1152 > growrtalloc: 66048 growrtzero: 4224 growrtfree: 6272 > [0]kdb> > > on a 14GB filesystem most of the transactions this is added to > are on the far side of 150k and that means we're talking about less > than 0.2% of the entire reservation comes from the dquot. With > larger block sizes and/or larger filesystems, these get much > larger. e.g. same 14GB device, 64k block size instead of 4k: > > [0]kdb> xtrres 0xe00000b8027d39f8 > write: 987576 truncate: 1977272 rename: 2891064 > link: 1445688 remove: 1445688 symlink: 1512504 > create: 1511864 mkdir: 1511864 ifree: 470584 > ichange: 1592 growdata: 395904 swrite: 384 > addafork: 658616 writeid: 384 attrinval: 1581696 > attrset: 329656 attrrm: 791480 clearagi: 640 > growrtalloc: 592640 growrtzero: 65664 growrtfree: 67200 > > The rename reservation is *2.8MB* (up from 300k). IOWs, 300 bytes is > really noise when it comes to reservation space. (OT: See why I want to > increase the log size now? :) > > Is it worth the complexity of adding this dquot reservation at > runtime for a best case reduction of 0.2% in log space reservation > usage? Probably not, but patches can be convincing ;) > > Cheers, > > Dave. Here is a patch to fix a sign problem when growing the log to 2G. --- xfs_log.2.c 2008-04-01 11:55:45.000000000 -0700 +++ xfs_log.3.c 2008-04-01 11:56:53.000000000 -0700 @@ -230,20 +230,24 @@ static void xlog_grant_add_space_write(struct log *log, int bytes) { - log->l_grant_write_bytes += bytes; - if (log->l_grant_write_bytes > log->l_logsize) { - log->l_grant_write_bytes -= log->l_logsize; - log->l_grant_write_cycle++; + int __tmp = (log)->l_logsize - (log)->l_grant_write_bytes; + if (__tmp > bytes) + (log)->l_grant_write_bytes += bytes; + else { + (log)->l_grant_write_cycle++; + (log)->l_grant_write_bytes = bytes - __tmp; } } static void xlog_grant_add_space_reserve(struct log *log, int bytes) { - log->l_grant_reserve_bytes += bytes; - if (log->l_grant_reserve_bytes > log->l_logsize) { - log->l_grant_reserve_bytes -= log->l_logsize; - log->l_grant_reserve_cycle++; + int __tmp = (log)->l_logsize - (log)->l_grant_reserve_bytes; + if (__tmp > bytes) + (log)->l_grant_reserve_bytes += bytes; + else { + (log)->l_grant_reserve_cycle++; + (log)->l_grant_reserve_bytes = bytes - __tmp; } } From owner-xfs@oss.sgi.com Tue Apr 1 19:57:42 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 19:57:52 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from relay.sgi.com (relay2.corp.sgi.com [192.26.58.22]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m322vfsn003159 for ; Tue, 1 Apr 2008 19:57:42 -0700 Received: from outhouse.melbourne.sgi.com (outhouse.melbourne.sgi.com [134.14.52.145]) by relay2.corp.sgi.com (Postfix) with ESMTP id 7456B30409A; Tue, 1 Apr 2008 19:58:13 -0700 (PDT) Received: from itchy (xaiki@itchy.melbourne.sgi.com [134.14.55.96]) by outhouse.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m322w9jm083470; Wed, 2 Apr 2008 13:58:11 +1100 (AEDT) From: Niv Sardi To: David Chinner Cc: xfs-dev , xfs@oss.sgi.com Subject: Re: [Review] Improve XFS error checking and propagation References: <20080311010420.GD155407@sgi.com> <20080401230044.GS103491721@sgi.com> Date: Wed, 02 Apr 2008 13:58:09 +1100 In-Reply-To: <20080401230044.GS103491721@sgi.com> (David Chinner's message of "Wed, 2 Apr 2008 09:00:44 +1000") Message-ID: User-Agent: Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.60 (i486-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15136 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@cxhome.ath.cx Precedence: bulk X-list: xfs David Chinner writes: > On Tue, Mar 11, 2008 at 12:04:21PM +1100, David Chinner wrote: >> A recent paper at the FAST08 conference highlighted a large number >> of unchecked error paths in Linux filesystems and I/O layers. As a >> subsystem, XFS had the highest aggregate numbers of bad error >> propagation. A tarball which contains a quilt patch series of 32 >> patches aimed at improving this situation can be found here: >> >> http://oss.sgi.com/~dgc/xfs/error-check/xfs-error-checking.tar.gz All looks good except some minor typo-editing, and NOK xfs-mustcheck-quotamount.patch # need to check if can happen when forcing quotas I'm not sure what happens if we really DO want quotas (specified on mount line and such). OK xfs-mustcheck-reset-dqcounts.patch OK xfs-mustcheck-dqflushall.patch OK xfs-mustcheck-acl-setmode.patch OK xfs-mustcheck-search-busy.patch EDITED xfs-mustcheck-compute-diff.patch # xfs_fs_cmn_err alignment OK xfs-mustcheck-bmap-adjacent.patch OK xfs-mustcheck-iflush-fork.patch # less error handeling !! OK xfs-mustcheck-bulkstat-dinode.patch OK xfs-mustcheck-quiesce-fs.patch OK xfs-mustcheck-bdstrat.patch OK xfs-fix-error-prototypes.patch # not error handeling related OK xfs-mustcheck-acl-vremove.patch OK xfs-mustcheck-icsb-disable.patch OK xfs-mustcheck-ioend-unwritten.patch OK xfs-mustcheck-buf-associate.patch OK xfs-mustcheck-reserve-blocks.patch EDITED xfs-mustcheck-bawrite.patch # xfs_fs_cmn_err alignment OK xfs-mustcheck-bdwrite.patch OK xfs-mustcheck-truncate-page.patch # might be incomplete EDITED xfs-mustcheck-dqflush.patch # slight style change/typo OK xfs-mustcheck-reset-sbqflags.patch OK xfs-mustcheck-quotaoff.patch EDITED xfs-mustcheck-inactive.patch # slight style change/typo OK xfs-mustcheck-trans-reserve.patch OK xfs-mustcheck-quota-trunc.patch OK xfs-mustcheck-log-unmount.patch OK xfs-mustcheck-free-extent.patch OK xfs-mustcheck-xfs-imap.patch OK xfs-mustcheck-bmap-last-offset.patch OK xfs-mustcheck-trans-commit.patch OK xfs-mustcheck-log-force.patch Cheers, -- Niv Sardi From owner-xfs@oss.sgi.com Tue Apr 1 21:02:18 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 21:02:37 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from relay.sgi.com (netops-testserver-3.corp.sgi.com [192.26.57.72]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m3242GBC017636 for ; Tue, 1 Apr 2008 21:02:17 -0700 Received: from outhouse.melbourne.sgi.com (outhouse.melbourne.sgi.com [134.14.52.145]) by netops-testserver-3.corp.sgi.com (Postfix) with ESMTP id 8D1DC9088D; Tue, 1 Apr 2008 21:02:46 -0700 (PDT) Received: from itchy (xaiki@itchy.melbourne.sgi.com [134.14.55.96]) by outhouse.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m3242fjm085594; Wed, 2 Apr 2008 15:02:44 +1100 (AEDT) From: Niv Sardi To: David Chinner Cc: xfs-dev , xfs-oss Subject: Re: [Patch] unique per-AG inode generation number initialisation References: <20080401231815.GW103491721@sgi.com> Date: Wed, 02 Apr 2008 15:02:42 +1100 In-Reply-To: <20080401231815.GW103491721@sgi.com> (David Chinner's message of "Wed, 2 Apr 2008 09:18:15 +1000") Message-ID: User-Agent: Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.60 (i486-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15137 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@cxhome.ath.cx Precedence: bulk X-list: xfs David Chinner writes: > Don't initialise new inode generation numbers to zero > > When we allocation new inode chunks, we initialise the generation > numbers to zero. This works fine until we delete a chunk and then > reallocate it, resulting in the same inode numbers but with a > reset generation count. This can result in inode/generation > pairs of different inodes occurring relatively close together. > > Given that the inode/gen pair makes up the "unique" portion of > an NFS filehandle on XFS, this can result in file handles cached > on clients being seen on the wire from the server but refer to > a different file. This causes .... issues for NFS clients. > > Hence we need a unique generation number initialisation for > each inode to prevent reuse of a small portion of the generation > number space. Make this initialiser per-allocation group so > that it is not a single point of contention in the filesystem, > and increment it on every allocation within an AG to reduce the > chance that a generation number is reused for a given inode number > if the inode chunk is deleted and reallocated immediately > afterwards. > > It is safe to add the agi_newinogen field to the AGI without > using a feature bit. If an older kernel is used, it simply > will not update the field on allocation. If the kernel is > updated and the field has garbage in it, then it's like having a > random seed to the generation number.... > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_ag.h | 4 +++- > fs/xfs/xfs_ialloc.c | 30 ++++++++++++++++++++++-------- > 2 files changed, 25 insertions(+), 9 deletions(-) Appart from the bit of overhead all seems good. > Index: 2.6.x-xfs-new/fs/xfs/xfs_ag.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ag.h 2008-01-18 18:30:06.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_ag.h 2008-03-26 13:03:41.122918236 +1100 > @@ -121,6 +121,7 @@ typedef struct xfs_agi { > * still being referenced. > */ > __be32 agi_unlinked[XFS_AGI_UNLINKED_BUCKETS]; > + __be32 agi_newinogen; /* inode cluster generation */ > } xfs_agi_t; > > #define XFS_AGI_MAGICNUM 0x00000001 > @@ -134,7 +135,8 @@ typedef struct xfs_agi { > #define XFS_AGI_NEWINO 0x00000100 > #define XFS_AGI_DIRINO 0x00000200 > #define XFS_AGI_UNLINKED 0x00000400 > -#define XFS_AGI_NUM_BITS 11 > +#define XFS_AGI_NEWINOGEN 0x00000800 > +#define XFS_AGI_NUM_BITS 12 > #define XFS_AGI_ALL_BITS ((1 << XFS_AGI_NUM_BITS) - 1) > > /* disk block (xfs_daddr_t) in the AG */ > Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-25 15:41:27.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-26 14:29:47.998554368 +1100 > @@ -309,6 +309,8 @@ xfs_ialloc_ag_alloc( > free = XFS_MAKE_IPTR(args.mp, fbuf, i); > free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC); > free->di_core.di_version = version; > + free->di_core.di_gen = agi->agi_newinogen; > + be32_add_cpu(&agi->agi_newinogen, 1); > free->di_next_unlinked = cpu_to_be32(NULLAGINO); > xfs_ialloc_log_di(tp, fbuf, i, > XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED); > @@ -347,7 +349,8 @@ xfs_ialloc_ag_alloc( > * Log allocation group header fields > */ > xfs_ialloc_log_agi(tp, agbp, > - XFS_AGI_COUNT | XFS_AGI_FREECOUNT | XFS_AGI_NEWINO); > + XFS_AGI_COUNT | XFS_AGI_FREECOUNT | > + XFS_AGI_NEWINO | XFS_AGI_NEWINOGEN); > /* > * Modify/log superblock values for inode count and inode free count. > */ > @@ -896,11 +899,12 @@ nextag: > ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino + offset); > XFS_INOBT_CLR_FREE(&rec, offset); > rec.ir_freecount--; > + be32_add_cpu(&agi->agi_newinogen, 1); > if ((error = xfs_inobt_update(cur, rec.ir_startino, rec.ir_freecount, > rec.ir_free))) > goto error0; > be32_add(&agi->agi_freecount, -1); > - xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT); > + xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT | XFS_AGI_NEWINOGEN); > down_read(&mp->m_peraglock); > mp->m_perag[tagno].pagi_freecount--; > up_read(&mp->m_peraglock); > @@ -1320,6 +1324,11 @@ xfs_ialloc_compute_maxlevels( > > /* > * Log specified fields for the ag hdr (inode section) > + * > + * We don't log the unlinked inode fields through here; they > + * get logged directly to the buffer. Hence we have a discontinuity > + * in the fields we are logging and we need two calls to map all > + * the dirtied parts of the agi.... > */ > void > xfs_ialloc_log_agi( > @@ -1342,22 +1351,27 @@ xfs_ialloc_log_agi( > offsetof(xfs_agi_t, agi_newino), > offsetof(xfs_agi_t, agi_dirino), > offsetof(xfs_agi_t, agi_unlinked), > + offsetof(xfs_agi_t, agi_newinogen), > sizeof(xfs_agi_t) > }; > + int log_newino = fields & XFS_AGI_NEWINOGEN; > + > #ifdef DEBUG > xfs_agi_t *agi; /* allocation group header */ > > agi = XFS_BUF_TO_AGI(bp); > ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC); > #endif > - /* > - * Compute byte offsets for the first and last fields. > - */ > + fields &= ~XFS_AGI_NEWINOGEN; > + > + /* Compute byte offsets for the first and last fields. */ > xfs_btree_offsets(fields, offsets, XFS_AGI_NUM_BITS, &first, &last); > - /* > - * Log the allocation group inode header buffer. > - */ > xfs_trans_log_buf(tp, bp, first, last); > + if (log_newino) { > + xfs_btree_offsets(XFS_AGI_NEWINOGEN, offsets, XFS_AGI_NUM_BITS, > + &first, &last); > + xfs_trans_log_buf(tp, bp, first, last); > + } > } > > /* -- Niv Sardi From owner-xfs@oss.sgi.com Tue Apr 1 21:25:53 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 21:26:10 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m324PnuU021284 for ; Tue, 1 Apr 2008 21:25:51 -0700 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA19804; Wed, 2 Apr 2008 14:11:43 +1000 Message-ID: <47F3150B.6000106@sgi.com> Date: Wed, 02 Apr 2008 15:09:31 +1000 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.12 (X11/20080213) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [Patch] Per iclog callback chain lock References: <20080401231348.GT103491721@sgi.com> In-Reply-To: <20080401231348.GT103491721@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15139 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Looks fine to me - just one small comment. David Chinner wrote: > Introduce an iclog callback chain lock. > > Rather than use the icloglock for protecting the iclog completion > callback chain, use a new per-iclog lock so that walking the > callback chain doesn't require holding a global lock. > > This reduces contention on the icloglock during log buffer I/O > completion as the callback chain lock is take for every callback > that is issued. On large log buffers, this can number in the > hundreds to thousands per iclog so isolating the lock to the > iclog makes a lot of sense. > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_log.c | 35 +++++++++++++++++++---------------- > fs/xfs/xfs_log_priv.h | 33 ++++++++++++++++++++++++++------- > 2 files changed, 45 insertions(+), 23 deletions(-) > > Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2008-03-13 13:10:23.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2008-03-13 19:35:51.251913648 +1100 > @@ -397,12 +397,10 @@ xfs_log_notify(xfs_mount_t *mp, /* mo > void *iclog_hndl, /* iclog to hang callback off */ > xfs_log_callback_t *cb) > { > - xlog_t *log = mp->m_log; > xlog_in_core_t *iclog = (xlog_in_core_t *)iclog_hndl; > int abortflg; > > - cb->cb_next = NULL; > - spin_lock(&log->l_icloglock); > + spin_lock(&iclog->ic_callback_lock); > abortflg = (iclog->ic_state & XLOG_STATE_IOERROR); > if (!abortflg) { > ASSERT_ALWAYS((iclog->ic_state == XLOG_STATE_ACTIVE) || > @@ -411,7 +409,7 @@ xfs_log_notify(xfs_mount_t *mp, /* mo > *(iclog->ic_callback_tail) = cb; > iclog->ic_callback_tail = &(cb->cb_next); > } > - spin_unlock(&log->l_icloglock); > + spin_unlock(&iclog->ic_callback_lock); > return abortflg; > } /* xfs_log_notify */ > > @@ -1257,6 +1255,8 @@ xlog_alloc_log(xfs_mount_t *mp, > iclog->ic_size = XFS_BUF_SIZE(bp) - log->l_iclog_hsize; > iclog->ic_state = XLOG_STATE_ACTIVE; > iclog->ic_log = log; > + atomic_set(&iclog->ic_refcnt, 0); Not related to this change but looks like we need it anyway. Did you mean to include it in this change? > + spin_lock_init(&iclog->ic_callback_lock); > iclog->ic_callback_tail = &(iclog->ic_callback); > iclog->ic_datap = (char *)iclog->hic_data + log->l_iclog_hsize; > > @@ -1990,7 +1990,7 @@ xlog_state_clean_log(xlog_t *log) > if (iclog->ic_state == XLOG_STATE_DIRTY) { > iclog->ic_state = XLOG_STATE_ACTIVE; > iclog->ic_offset = 0; > - iclog->ic_callback = NULL; /* don't need to free */ > + ASSERT(iclog->ic_callback == NULL); > /* > * If the number of ops in this iclog indicate it just > * contains the dummy transaction, we can > @@ -2193,37 +2193,40 @@ xlog_state_do_callback( > be64_to_cpu(iclog->ic_header.h_lsn); > spin_unlock(&log->l_grant_lock); > > - /* > - * Keep processing entries in the callback list > - * until we come around and it is empty. We > - * need to atomically see that the list is > - * empty and change the state to DIRTY so that > - * we don't miss any more callbacks being added. > - */ > - spin_lock(&log->l_icloglock); > } else { > + spin_unlock(&log->l_icloglock); > ioerrors++; > } > - cb = iclog->ic_callback; > > + /* > + * Keep processing entries in the callback list until > + * we come around and it is empty. We need to > + * atomically see that the list is empty and change the > + * state to DIRTY so that we don't miss any more > + * callbacks being added. > + */ > + spin_lock(&iclog->ic_callback_lock); > + cb = iclog->ic_callback; > while (cb) { > iclog->ic_callback_tail = &(iclog->ic_callback); > iclog->ic_callback = NULL; > - spin_unlock(&log->l_icloglock); > + spin_unlock(&iclog->ic_callback_lock); > > /* perform callbacks in the order given */ > for (; cb; cb = cb_next) { > cb_next = cb->cb_next; > cb->cb_func(cb->cb_arg, aborted); > } > - spin_lock(&log->l_icloglock); > + spin_lock(&iclog->ic_callback_lock); > cb = iclog->ic_callback; > } > > loopdidcallbacks++; > funcdidcallbacks++; > > + spin_lock(&log->l_icloglock); > ASSERT(iclog->ic_callback == NULL); > + spin_unlock(&iclog->ic_callback_lock); Okay so we can acquire the l_icloglock while holding ic_callback_lock. > if (!(iclog->ic_state & XLOG_STATE_IOERROR)) > iclog->ic_state = XLOG_STATE_DIRTY; > > Index: 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log_priv.h 2008-02-22 13:48:25.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h 2008-03-13 19:34:57.430809151 +1100 > @@ -324,6 +324,19 @@ typedef struct xlog_rec_ext_header { > * - ic_offset is the current number of bytes written to in this iclog. > * - ic_refcnt is bumped when someone is writing to the log. > * - ic_state is the state of the iclog. > + * > + * Because of cacheline contention on large machines, we need to separate > + * various resources onto different cachelines. To start with, make the > + * structure cacheline aligned. The following fields can be contended on > + * by independent processes: > + * > + * - ic_callback_* > + * - ic_refcnt > + * - fields protected by the global l_icloglock > + * > + * so we need to ensure that these fields are located in separate cachelines. > + * We'll put all the read-only and l_icloglock fields in the first cacheline, > + * and move everything else out to subsequent cachelines. > */ > typedef struct xlog_iclog_fields { > sv_t ic_forcesema; > @@ -332,18 +345,23 @@ typedef struct xlog_iclog_fields { > struct xlog_in_core *ic_prev; > struct xfs_buf *ic_bp; > struct log *ic_log; > - xfs_log_callback_t *ic_callback; > - xfs_log_callback_t **ic_callback_tail; > -#ifdef XFS_LOG_TRACE > - struct ktrace *ic_trace; > -#endif > int ic_size; > int ic_offset; > - atomic_t ic_refcnt; > int ic_bwritecnt; > ushort_t ic_state; > char *ic_datap; /* pointer to iclog data */ > -} xlog_iclog_fields_t; > +#ifdef XFS_LOG_TRACE > + struct ktrace *ic_trace; > +#endif > + > + /* Callback structures need their own cacheline */ > + spinlock_t ic_callback_lock ____cacheline_aligned_in_smp; > + xfs_log_callback_t *ic_callback; > + xfs_log_callback_t **ic_callback_tail; > + > + /* reference counts need their own cacheline */ > + atomic_t ic_refcnt ____cacheline_aligned_in_smp; > +} xlog_iclog_fields_t ____cacheline_aligned_in_smp; > > typedef union xlog_in_core2 { > xlog_rec_header_t hic_header; > @@ -366,6 +384,7 @@ typedef struct xlog_in_core { > #define ic_bp hic_fields.ic_bp > #define ic_log hic_fields.ic_log > #define ic_callback hic_fields.ic_callback > +#define ic_callback_lock hic_fields.ic_callback_lock > #define ic_callback_tail hic_fields.ic_callback_tail > #define ic_trace hic_fields.ic_trace > #define ic_size hic_fields.ic_size > From owner-xfs@oss.sgi.com Tue Apr 1 21:31:10 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 21:31:18 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from relay.sgi.com (relay1.corp.sgi.com [192.26.58.214]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m324VA63022405 for ; Tue, 1 Apr 2008 21:31:10 -0700 Received: from outhouse.melbourne.sgi.com (outhouse.melbourne.sgi.com [134.14.52.145]) by relay1.corp.sgi.com (Postfix) with ESMTP id DD9548F8092; Tue, 1 Apr 2008 21:31:42 -0700 (PDT) Received: from itchy (xaiki@itchy.melbourne.sgi.com [134.14.55.96]) by outhouse.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m324Vajm084801; Wed, 2 Apr 2008 15:31:38 +1100 (AEDT) From: Niv Sardi To: David Chinner Cc: xfs-dev , xfs@oss.sgi.com Subject: Re: [Review] Improve XFS error checking and propagation References: <20080311010420.GD155407@sgi.com> <20080401230044.GS103491721@sgi.com> <20080402040708.GB103491721@sgi.com> Date: Wed, 02 Apr 2008 15:31:37 +1100 In-Reply-To: <20080402040708.GB103491721@sgi.com> (David Chinner's message of "Wed, 2 Apr 2008 14:07:08 +1000") Message-ID: User-Agent: Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.60 (i486-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15140 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@cxhome.ath.cx Precedence: bulk X-list: xfs David Chinner writes: > On Wed, Apr 02, 2008 at 01:58:09PM +1100, Niv Sardi wrote: >> David Chinner writes: >> > On Tue, Mar 11, 2008 at 12:04:21PM +1100, David Chinner wrote: >> >> A recent paper at the FAST08 conference highlighted a large number >> >> of unchecked error paths in Linux filesystems and I/O layers. As a >> >> subsystem, XFS had the highest aggregate numbers of bad error >> >> propagation. A tarball which contains a quilt patch series of 32 >> >> patches aimed at improving this situation can be found here: >> >> >> >> http://oss.sgi.com/~dgc/xfs/error-check/xfs-error-checking.tar.gz >> >> All looks good except some minor typo-editing, >> >> and >> >> NOK xfs-mustcheck-quotamount.patch # need to check if can happen when forcing quotas >> >> I'm not sure what happens if we really DO want quotas (specified on >> mount line and such). > > The behaviour will be exactly the same as previously, because the > error returned by xfs_qm_mount_quotas() is ignored. i.e. if we try > to mount with quotas and the quota mount fails, we continue (after > issuing a warning to syslog) that quotas were not turned on. > > This is especially important for root filesystems with quota > enabled.... OK, I wasn't sure. All the rest are minor aesthetics/typos my messed up notes, and can be ignored… >> OK xfs-mustcheck-reset-dqcounts.patch >> OK xfs-mustcheck-dqflushall.patch >> OK xfs-mustcheck-acl-setmode.patch >> OK xfs-mustcheck-search-busy.patch >> EDITED xfs-mustcheck-compute-diff.patch # xfs_fs_cmn_err alignment > > That patch doesn't have any calls to xfs_fs_cmn_err() in it. Can you > clarify, please? Oops, the edit was for: -+STATIC void /* success (>= minlen) */ ++STATIC void as it didn't really make sense anymore. >> OK xfs-mustcheck-bmap-adjacent.patch >> OK xfs-mustcheck-iflush-fork.patch # less error handeling !! > > You can't have less error handling than intentionally ignoring > the return from a function that can't return an error. You can > have simpler code, though, by declaring the function void.... hum, I can't remember why I wrote that anymore, oh well… looks good now. >> OK xfs-mustcheck-bulkstat-dinode.patch >> OK xfs-mustcheck-quiesce-fs.patch >> OK xfs-mustcheck-bdstrat.patch >> OK xfs-fix-error-prototypes.patch # not error handeling related >> OK xfs-mustcheck-acl-vremove.patch >> OK xfs-mustcheck-icsb-disable.patch >> OK xfs-mustcheck-ioend-unwritten.patch >> OK xfs-mustcheck-buf-associate.patch >> OK xfs-mustcheck-reserve-blocks.patch >> EDITED xfs-mustcheck-bawrite.patch # xfs_fs_cmn_err alignment > > Which means? That's purely aesthetic, sometimes we split the string and keep it aligned, and sometimes we pad it left so that it fits, I prefer splitting. >> OK xfs-mustcheck-bdwrite.patch >> OK xfs-mustcheck-truncate-page.patch # might be incomplete Note to self: re-read one's notes before sending them out, I wanted to look at why we couldn't propagate error better, but now it's all understood =) >> EDITED xfs-mustcheck-dqflush.patch # slight style change/typo > Details? -hence we nevre know if we've failed to flush a dquot to disk. +hence we never know if we've failed to flush a dquot to disk. and xfs_fs_cmn_err stuff. >> OK xfs-mustcheck-reset-sbqflags.patch >> OK xfs-mustcheck-quotaoff.patch >> EDITED xfs-mustcheck-inactive.patch # slight style change/typo > > Details? -correctly. if we fail to write the final quota off trnasaction, +correctly. if we fail to write the final quota off transaction, and xfs_fs_cmn_err stuff. Cheers, -- Niv Sardi From owner-xfs@oss.sgi.com Tue Apr 1 21:39:13 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 21:39:20 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m324d70W023550 for ; Tue, 1 Apr 2008 21:39:11 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA20458; Wed, 2 Apr 2008 14:39:43 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m324dgsT118759982; Wed, 2 Apr 2008 14:39:42 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m324df1O118778255; Wed, 2 Apr 2008 14:39:41 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 14:39:41 +1000 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [Patch] Per iclog callback chain lock Message-ID: <20080402043941.GC103491721@sgi.com> References: <20080401231348.GT103491721@sgi.com> <47F3150B.6000106@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47F3150B.6000106@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15141 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 03:09:31PM +1000, Lachlan McIlroy wrote: > Looks fine to me - just one small comment. ..... > >@@ -1257,6 +1255,8 @@ xlog_alloc_log(xfs_mount_t *mp, > > iclog->ic_size = XFS_BUF_SIZE(bp) - log->l_iclog_hsize; > > iclog->ic_state = XLOG_STATE_ACTIVE; > > iclog->ic_log = log; > >+ atomic_set(&iclog->ic_refcnt, 0); > Not related to this change but looks like we need it anyway. > Did you mean to include it in this change? I added it just to be explicit. It's not actually needed as we kmem_zalloc() the log structure.... > >+ spin_lock(&iclog->ic_callback_lock); > >+ cb = iclog->ic_callback; .... > >+ spin_lock(&log->l_icloglock); > > ASSERT(iclog->ic_callback == NULL); > >+ spin_unlock(&iclog->ic_callback_lock); > Okay so we can acquire the l_icloglock while holding ic_callback_lock. Yup - it's a new lock, so we can make whatever rules we like ;) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 22:11:59 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 22:12:06 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m325BtWl027241 for ; Tue, 1 Apr 2008 22:11:57 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA21138; Wed, 2 Apr 2008 15:12:29 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m325CSsT118809405; Wed, 2 Apr 2008 15:12:29 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m325CNf0118789674; Wed, 2 Apr 2008 15:12:23 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 15:12:23 +1000 From: David Chinner To: Niv Sardi Cc: David Chinner , xfs-dev , xfs@oss.sgi.com Subject: Re: [Review] Improve XFS error checking and propagation Message-ID: <20080402051223.GD103491721@sgi.com> References: <20080311010420.GD155407@sgi.com> <20080401230044.GS103491721@sgi.com> <20080402040708.GB103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15142 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 03:31:37PM +1100, Niv Sardi wrote: > David Chinner writes: > > > On Wed, Apr 02, 2008 at 01:58:09PM +1100, Niv Sardi wrote: > >> OK xfs-mustcheck-reset-dqcounts.patch > >> OK xfs-mustcheck-dqflushall.patch > >> OK xfs-mustcheck-acl-setmode.patch > >> OK xfs-mustcheck-search-busy.patch > >> EDITED xfs-mustcheck-compute-diff.patch # xfs_fs_cmn_err alignment > > > > That patch doesn't have any calls to xfs_fs_cmn_err() in it. Can you > > clarify, please? > > Oops, the edit was for: > -+STATIC void /* success (>= minlen) */ > ++STATIC void > > as it didn't really make sense anymore. Ok, Fixed. > >> OK xfs-mustcheck-bulkstat-dinode.patch > >> OK xfs-mustcheck-quiesce-fs.patch > >> OK xfs-mustcheck-bdstrat.patch > >> OK xfs-fix-error-prototypes.patch # not error handeling related > >> OK xfs-mustcheck-acl-vremove.patch > >> OK xfs-mustcheck-icsb-disable.patch > >> OK xfs-mustcheck-ioend-unwritten.patch > >> OK xfs-mustcheck-buf-associate.patch > >> OK xfs-mustcheck-reserve-blocks.patch > >> EDITED xfs-mustcheck-bawrite.patch # xfs_fs_cmn_err alignment > > > > Which means? > > That's purely aesthetic, sometimes we split the string and keep it > aligned, and sometimes we pad it left so that it fits, I prefer > splitting. Ok. If it involves splitting a string into 3 lines because of indentation, then I tend to do one long line. i.e. this: if (error) xfs_fs_cmn_err(CE_WARN, mp, "xfs_qm_dquot_logitem_pushbuf: pushbuf error %d on qip %p, bp %p", error, qip, bp); Instead of: if (error) xfs_fs_cmn_err(CE_WARN, mp, "xfs_qm_dquot_logitem_pushbuf:" " pushbuf error %d on qip %p, " " bp %p", error, qip, bp); The first is much easier to read and grep for.... > >> EDITED xfs-mustcheck-dqflush.patch # slight style change/typo > > Details? > > -hence we nevre know if we've failed to flush a dquot to disk. > +hence we never know if we've failed to flush a dquot to disk. Oh, in the patch description.... > and xfs_fs_cmn_err stuff. Same as above... > >> OK xfs-mustcheck-reset-sbqflags.patch > >> OK xfs-mustcheck-quotaoff.patch > >> EDITED xfs-mustcheck-inactive.patch # slight style change/typo > > > > Details? > > -correctly. if we fail to write the final quota off trnasaction, > +correctly. if we fail to write the final quota off transaction, Ah, that's from the quotaoff patch. No wonder I couldn't find the typo ;) But I also split the xfs_fs_cmn_err() in xfs-mustcheck-inactive (much nicer). All fixed up. Thanks Niv. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 22:20:47 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 22:20:54 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m325KgLQ028745 for ; Tue, 1 Apr 2008 22:20:45 -0700 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA21438; Wed, 2 Apr 2008 15:21:15 +1000 Message-ID: <47F32557.2030300@sgi.com> Date: Wed, 02 Apr 2008 16:19:03 +1000 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.12 (X11/20080213) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [Patch] Per iclog callback chain lock References: <20080401231348.GT103491721@sgi.com> <47F3150B.6000106@sgi.com> <20080402043941.GC103491721@sgi.com> In-Reply-To: <20080402043941.GC103491721@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15143 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > On Wed, Apr 02, 2008 at 03:09:31PM +1000, Lachlan McIlroy wrote: >> Looks fine to me - just one small comment. > ..... >>> @@ -1257,6 +1255,8 @@ xlog_alloc_log(xfs_mount_t *mp, >>> iclog->ic_size = XFS_BUF_SIZE(bp) - log->l_iclog_hsize; >>> iclog->ic_state = XLOG_STATE_ACTIVE; >>> iclog->ic_log = log; >>> + atomic_set(&iclog->ic_refcnt, 0); >> Not related to this change but looks like we need it anyway. >> Did you mean to include it in this change? > > I added it just to be explicit. It's not actually needed as we > kmem_zalloc() the log structure.... Ahh, it is zeroed. I was wondering why it hadn't caused any problems. > >>> + spin_lock(&iclog->ic_callback_lock); >>> + cb = iclog->ic_callback; > .... >>> + spin_lock(&log->l_icloglock); >>> ASSERT(iclog->ic_callback == NULL); >>> + spin_unlock(&iclog->ic_callback_lock); >> Okay so we can acquire the l_icloglock while holding ic_callback_lock. > > Yup - it's a new lock, so we can make whatever rules we like ;) > > Cheers, > > Dave. From owner-xfs@oss.sgi.com Tue Apr 1 22:24:38 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 22:24:45 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_52, SUBJ_TICKET autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m325OU0J029569 for ; Tue, 1 Apr 2008 22:24:34 -0700 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA21517; Wed, 2 Apr 2008 15:25:02 +1000 Message-ID: <47F3263A.7030401@sgi.com> Date: Wed, 02 Apr 2008 16:22:50 +1000 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.12 (X11/20080213) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [Patch] Remove xlog_ticket allocator References: <20080401231439.GU103491721@sgi.com> In-Reply-To: <20080401231439.GU103491721@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15144 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Nice one - cleans up a lot of code. Looks good to me. David Chinner wrote: > Remove the xlog_ticket allocator > > The ticket allocator is just a simple slab implementation > internal to the log. It requires the icloglock to be held > when manipulating it and this contributes to contention > on that lock. > > Just kill the entire allocator and use a memory zone instead. > While there, allow us to gracefully fail allocation with ENOMEM. > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_log.c | 137 ++++---------------------------------------------- > fs/xfs/xfs_log_priv.h | 9 +-- > fs/xfs/xfs_vfsops.c | 12 ++-- > fs/xfs/xfsidbg.c | 11 +--- > 4 files changed, 25 insertions(+), 144 deletions(-) > > Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2008-03-13 13:58:08.866070224 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2008-03-13 14:03:38.448138656 +1100 > @@ -41,6 +41,7 @@ > #include "xfs_inode.h" > #include "xfs_rw.h" > > +kmem_zone_t *xfs_log_ticket_zone; > > #define xlog_write_adv_cnt(ptr, len, off, bytes) \ > { (ptr) += (bytes); \ > @@ -73,8 +74,6 @@ STATIC int xlog_state_get_iclog_space(x > xlog_ticket_t *ticket, > int *continued_write, > int *logoffsetp); > -STATIC void xlog_state_put_ticket(xlog_t *log, > - xlog_ticket_t *tic); > STATIC int xlog_state_release_iclog(xlog_t *log, > xlog_in_core_t *iclog); > STATIC void xlog_state_switch_iclogs(xlog_t *log, > @@ -101,7 +100,6 @@ STATIC void xlog_ungrant_log_space(xlog_ > > > /* local ticket functions */ > -STATIC void xlog_state_ticket_alloc(xlog_t *log); > STATIC xlog_ticket_t *xlog_ticket_get(xlog_t *log, > int unit_bytes, > int count, > @@ -330,7 +328,7 @@ xfs_log_done(xfs_mount_t *mp, > */ > xlog_trace_loggrant(log, ticket, "xfs_log_done: (non-permanent)"); > xlog_ungrant_log_space(log, ticket); > - xlog_state_put_ticket(log, ticket); > + xlog_ticket_put(log, ticket); > } else { > xlog_trace_loggrant(log, ticket, "xfs_log_done: (permanent)"); > xlog_regrant_reserve_log_space(log, ticket); > @@ -469,6 +467,8 @@ xfs_log_reserve(xfs_mount_t *mp, > /* may sleep if need to allocate more tickets */ > internal_ticket = xlog_ticket_get(log, unit_bytes, cnt, > client, flags); > + if (!internal_ticket) > + return XFS_ERROR(ENOMEM); > internal_ticket->t_trans_type = t_type; > *ticket = internal_ticket; > xlog_trace_loggrant(log, internal_ticket, > @@ -693,7 +693,7 @@ xfs_log_unmount_write(xfs_mount_t *mp) > if (tic) { > xlog_trace_loggrant(log, tic, "unmount rec"); > xlog_ungrant_log_space(log, tic); > - xlog_state_put_ticket(log, tic); > + xlog_ticket_put(log, tic); > } > } else { > /* > @@ -1208,7 +1208,6 @@ xlog_alloc_log(xfs_mount_t *mp, > spin_lock_init(&log->l_icloglock); > spin_lock_init(&log->l_grant_lock); > initnsema(&log->l_flushsema, 0, "ic-flush"); > - xlog_state_ticket_alloc(log); /* wait until after icloglock inited */ > > /* log record size must be multiple of BBSIZE; see xlog_rec_header_t */ > ASSERT((XFS_BUF_SIZE(bp) & BBMASK) == 0); > @@ -1541,7 +1540,6 @@ STATIC void > xlog_dealloc_log(xlog_t *log) > { > xlog_in_core_t *iclog, *next_iclog; > - xlog_ticket_t *tic, *next_tic; > int i; > > iclog = log->l_iclog; > @@ -1562,22 +1560,6 @@ xlog_dealloc_log(xlog_t *log) > spinlock_destroy(&log->l_icloglock); > spinlock_destroy(&log->l_grant_lock); > > - /* XXXsup take a look at this again. */ > - if ((log->l_ticket_cnt != log->l_ticket_tcnt) && > - !XLOG_FORCED_SHUTDOWN(log)) { > - xfs_fs_cmn_err(CE_WARN, log->l_mp, > - "xlog_dealloc_log: (cnt: %d, total: %d)", > - log->l_ticket_cnt, log->l_ticket_tcnt); > - /* ASSERT(log->l_ticket_cnt == log->l_ticket_tcnt); */ > - > - } else { > - tic = log->l_unmount_free; > - while (tic) { > - next_tic = tic->t_next; > - kmem_free(tic, PAGE_SIZE); > - tic = next_tic; > - } > - } > xfs_buf_free(log->l_xbuf); > #ifdef XFS_LOG_TRACE > if (log->l_trace != NULL) { > @@ -2798,18 +2780,6 @@ xlog_ungrant_log_space(xlog_t *log, > > > /* > - * Atomically put back used ticket. > - */ > -STATIC void > -xlog_state_put_ticket(xlog_t *log, > - xlog_ticket_t *tic) > -{ > - spin_lock(&log->l_icloglock); > - xlog_ticket_put(log, tic); > - spin_unlock(&log->l_icloglock); > -} /* xlog_state_put_ticket */ > - > -/* > * Flush iclog to disk if this is the last reference to the given iclog and > * the WANT_SYNC bit is set. > * > @@ -3179,92 +3149,19 @@ xlog_state_want_sync(xlog_t *log, xlog_i > */ > > /* > - * Algorithm doesn't take into account page size. ;-( > - */ > -STATIC void > -xlog_state_ticket_alloc(xlog_t *log) > -{ > - xlog_ticket_t *t_list; > - xlog_ticket_t *next; > - xfs_caddr_t buf; > - uint i = (PAGE_SIZE / sizeof(xlog_ticket_t)) - 2; > - > - /* > - * The kmem_zalloc may sleep, so we shouldn't be holding the > - * global lock. XXXmiken: may want to use zone allocator. > - */ > - buf = (xfs_caddr_t) kmem_zalloc(PAGE_SIZE, KM_SLEEP); > - > - spin_lock(&log->l_icloglock); > - > - /* Attach 1st ticket to Q, so we can keep track of allocated memory */ > - t_list = (xlog_ticket_t *)buf; > - t_list->t_next = log->l_unmount_free; > - log->l_unmount_free = t_list++; > - log->l_ticket_cnt++; > - log->l_ticket_tcnt++; > - > - /* Next ticket becomes first ticket attached to ticket free list */ > - if (log->l_freelist != NULL) { > - ASSERT(log->l_tail != NULL); > - log->l_tail->t_next = t_list; > - } else { > - log->l_freelist = t_list; > - } > - log->l_ticket_cnt++; > - log->l_ticket_tcnt++; > - > - /* Cycle through rest of alloc'ed memory, building up free Q */ > - for ( ; i > 0; i--) { > - next = t_list + 1; > - t_list->t_next = next; > - t_list = next; > - log->l_ticket_cnt++; > - log->l_ticket_tcnt++; > - } > - t_list->t_next = NULL; > - log->l_tail = t_list; > - spin_unlock(&log->l_icloglock); > -} /* xlog_state_ticket_alloc */ > - > - > -/* > - * Put ticket into free list > - * > - * Assumption: log lock is held around this call. > + * Free a used ticket. > */ > STATIC void > xlog_ticket_put(xlog_t *log, > xlog_ticket_t *ticket) > { > sv_destroy(&ticket->t_sema); > - > - /* > - * Don't think caching will make that much difference. It's > - * more important to make debug easier. > - */ > -#if 0 > - /* real code will want to use LIFO for caching */ > - ticket->t_next = log->l_freelist; > - log->l_freelist = ticket; > - /* no need to clear fields */ > -#else > - /* When we debug, it is easier if tickets are cycled */ > - ticket->t_next = NULL; > - if (log->l_tail) { > - log->l_tail->t_next = ticket; > - } else { > - ASSERT(log->l_freelist == NULL); > - log->l_freelist = ticket; > - } > - log->l_tail = ticket; > -#endif /* DEBUG */ > - log->l_ticket_cnt++; > + kmem_zone_free(xfs_log_ticket_zone, ticket); > } /* xlog_ticket_put */ > > > /* > - * Grab ticket off freelist or allocation some more > + * Allocate and initialise a new log ticket. > */ > STATIC xlog_ticket_t * > xlog_ticket_get(xlog_t *log, > @@ -3276,21 +3173,9 @@ xlog_ticket_get(xlog_t *log, > xlog_ticket_t *tic; > uint num_headers; > > - alloc: > - if (log->l_freelist == NULL) > - xlog_state_ticket_alloc(log); /* potentially sleep */ > - > - spin_lock(&log->l_icloglock); > - if (log->l_freelist == NULL) { > - spin_unlock(&log->l_icloglock); > - goto alloc; > - } > - tic = log->l_freelist; > - log->l_freelist = tic->t_next; > - if (log->l_freelist == NULL) > - log->l_tail = NULL; > - log->l_ticket_cnt--; > - spin_unlock(&log->l_icloglock); > + tic = kmem_zone_zalloc(xfs_log_ticket_zone, KM_SLEEP|KM_MAYFAIL); > + if (!tic) > + return NULL; > > /* > * Permanent reservations have up to 'cnt'-1 active log operations > Index: 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log_priv.h 2008-03-13 13:59:10.806160556 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h 2008-03-13 14:06:58.110733971 +1100 > @@ -242,7 +242,7 @@ typedef struct xlog_res { > > typedef struct xlog_ticket { > sv_t t_sema; /* sleep on this semaphore : 20 */ > - struct xlog_ticket *t_next; /* :4|8 */ > + struct xlog_ticket *t_next; /* :4|8 */ > struct xlog_ticket *t_prev; /* :4|8 */ > xlog_tid_t t_tid; /* transaction identifier : 4 */ > int t_curr_res; /* current reservation in bytes : 4 */ > @@ -406,13 +406,8 @@ typedef struct log { > sema_t l_flushsema; /* iclog flushing semaphore */ > int l_flushcnt; /* # of procs waiting on this > * sema */ > - int l_ticket_cnt; /* free ticket count */ > - int l_ticket_tcnt; /* total ticket count */ > int l_covered_state;/* state of "covering disk > * log entries" */ > - xlog_ticket_t *l_freelist; /* free list of tickets */ > - xlog_ticket_t *l_unmount_free;/* kmem_free these addresses */ > - xlog_ticket_t *l_tail; /* free list of tickets */ > xlog_in_core_t *l_iclog; /* head log queue */ > spinlock_t l_icloglock; /* grab to change iclog state */ > xfs_lsn_t l_tail_lsn; /* lsn of 1st LR with unflushed > @@ -478,6 +473,8 @@ extern struct xfs_buf *xlog_get_bp(xlog_ > extern void xlog_put_bp(struct xfs_buf *); > extern int xlog_bread(xlog_t *, xfs_daddr_t, int, struct xfs_buf *); > > +extern kmem_zone_t *xfs_log_ticket_zone; > + > /* iclog tracing */ > #define XLOG_TRACE_GRAB_FLUSH 1 > #define XLOG_TRACE_REL_FLUSH 2 > Index: 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vfsops.c 2008-03-13 13:58:08.866070224 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c 2008-03-13 13:59:59.208010688 +1100 > @@ -68,15 +68,17 @@ xfs_init(void) > /* > * Initialize all of the zone allocators we use. > */ > + xfs_log_ticket_zone = kmem_zone_init(sizeof(xlog_ticket_t), > + "xfs_log_ticket"); > xfs_bmap_free_item_zone = kmem_zone_init(sizeof(xfs_bmap_free_item_t), > - "xfs_bmap_free_item"); > + "xfs_bmap_free_item"); > xfs_btree_cur_zone = kmem_zone_init(sizeof(xfs_btree_cur_t), > - "xfs_btree_cur"); > - xfs_trans_zone = kmem_zone_init(sizeof(xfs_trans_t), "xfs_trans"); > - xfs_da_state_zone = > - kmem_zone_init(sizeof(xfs_da_state_t), "xfs_da_state"); > + "xfs_btree_cur"); > + xfs_da_state_zone = kmem_zone_init(sizeof(xfs_da_state_t), > + "xfs_da_state"); > xfs_dabuf_zone = kmem_zone_init(sizeof(xfs_dabuf_t), "xfs_dabuf"); > xfs_ifork_zone = kmem_zone_init(sizeof(xfs_ifork_t), "xfs_ifork"); > + xfs_trans_zone = kmem_zone_init(sizeof(xfs_trans_t), "xfs_trans"); > xfs_acl_zone_init(xfs_acl_zone, "xfs_acl"); > xfs_mru_cache_init(); > xfs_filestream_init(); > Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2008-03-13 13:07:25.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2008-03-13 14:10:13.489855395 +1100 > @@ -5607,9 +5607,9 @@ xfsidbg_xiclog(xlog_in_core_t *iclog) > be32_to_cpu(iclog->ic_header.h_magicno), > be32_to_cpu(iclog->ic_header.h_cycle), > be32_to_cpu(iclog->ic_header.h_version), > - be64_to_cpu(iclog->ic_header.h_lsn)); > + (unsigned long long)be64_to_cpu(iclog->ic_header.h_lsn)); > kdb_printf("tail_lsn: 0x%Lx len: %d prev_block: %d num_ops: %d\n", > - be64_to_cpu(iclog->ic_header.h_tail_lsn), > + (unsigned long long)be64_to_cpu(iclog->ic_header.h_tail_lsn), > be32_to_cpu(iclog->ic_header.h_len), > be32_to_cpu(iclog->ic_header.h_prev_block), > be32_to_cpu(iclog->ic_header.h_num_logops)); > @@ -5829,11 +5829,8 @@ xfsidbg_xlog(xlog_t *log) > }; > > kdb_printf("xlog at 0x%p\n", log); > - kdb_printf("&flushsm: 0x%p flushcnt: %d tic_cnt: %d tic_tcnt: %d \n", > - &log->l_flushsema, log->l_flushcnt, > - log->l_ticket_cnt, log->l_ticket_tcnt); > - kdb_printf("freelist: 0x%p tail: 0x%p ICLOG: 0x%p \n", > - log->l_freelist, log->l_tail, log->l_iclog); > + kdb_printf("&flushsm: 0x%p flushcnt: %d ICLOG: 0x%p \n", > + &log->l_flushsema, log->l_flushcnt, log->l_iclog); > kdb_printf("&icloglock: 0x%p tail_lsn: %s last_sync_lsn: %s \n", > &log->l_icloglock, xfs_fmtlsn(&log->l_tail_lsn), > xfs_fmtlsn(&log->l_last_sync_lsn)); > From owner-xfs@oss.sgi.com Tue Apr 1 22:34:27 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 22:34:35 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m325YMgp030991 for ; Tue, 1 Apr 2008 22:34:26 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA21758; Wed, 2 Apr 2008 15:34:54 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m325YrsT118437010; Wed, 2 Apr 2008 15:34:54 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m325Ypgk118820456; Wed, 2 Apr 2008 15:34:51 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 15:34:51 +1000 From: David Chinner To: xfs-dev Cc: xfs-oss Subject: [Patch] xfsqa: 091 needs to support sector size != 512 bytes Message-ID: <20080402053451.GE103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15145 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Test 091 assumes a direct I/O alignment of 512 bytes, a hold over from 2.4 kernels. On 2.6. kernels, direct I/O needs to be aligned to the sector size the filesystem was mkfs'd with. Teach 091 about 2.6 kernels and grab the sector size from the xfs_info output. Signed-off-by: Dave Chinner --- xfstests/091 | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) Index: xfs-cmds/xfstests/091 =================================================================== --- xfs-cmds.orig/xfstests/091 2007-03-19 08:49:37.000000000 +1100 +++ xfs-cmds/xfstests/091 2008-04-02 15:27:39.266824430 +1000 @@ -43,10 +43,20 @@ run_fsx() psize=`$here/src/feature -s` bsize=512 -# 2.4 Linux kernels support bsize aligned direct I/O only kernel=`uname -r | sed -e 's/\(2\..\).*/\1/'` + +# 2.4 Linux kernels support bsize aligned direct I/O only [ "$HOSTOS" = "Linux" -a "$kernel" = "2.4" ] && bsize=$psize +# 2.6 Linux kernels support sector aligned direct I/O only +if [ "$HOSTOS" = "Linux" -a "$kernel" = "2.6" ]; then + xfs_info $TEST_DIR | _filter_mkfs 2> $tmp.info + if [ $? -eq 0 ]; then + source $tmp.info + bsize=$sectsz + fi +fi + # fsx usage: # # -N numops: total # operations to do From owner-xfs@oss.sgi.com Tue Apr 1 22:37:28 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 22:37:35 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_64 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m325bNsj031692 for ; Tue, 1 Apr 2008 22:37:26 -0700 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA21840; Wed, 2 Apr 2008 15:37:52 +1000 Message-ID: <47F3293C.6090708@sgi.com> Date: Wed, 02 Apr 2008 16:35:40 +1000 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.12 (X11/20080213) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [Patch] Cacheline align xlog_t References: <20080401231552.GV103491721@sgi.com> In-Reply-To: <20080401231552.GV103491721@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15146 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Looks good - just one comment. David Chinner wrote: > Reorganise xlog_t for better cacheline isolation of contention > > To reduce contention on the log in large CPU count, separate > out different parts of the xlog_t structure onto different cachelines. > Move each lock onto a different cacheline along with all the members > that are accessed/modified while that lock is held. > > Also, move the debugging code into debug code. > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_log.c | 5 +--- > fs/xfs/xfs_log_priv.h | 55 +++++++++++++++++++++++++++----------------------- > 2 files changed, 32 insertions(+), 28 deletions(-) > > Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2008-03-13 14:03:38.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2008-03-13 14:20:21.803846380 +1100 > @@ -1237,9 +1237,9 @@ xlog_alloc_log(xfs_mount_t *mp, > XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1); > iclog->ic_bp = bp; > iclog->hic_data = bp->b_addr; > - > +#ifdef DEBUG > log->l_iclog_bak[i] = (xfs_caddr_t)&(iclog->ic_header); > - > +#endif > head = &iclog->ic_header; > memset(head, 0, sizeof(xlog_rec_header_t)); > head->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM); > @@ -1250,7 +1250,6 @@ xlog_alloc_log(xfs_mount_t *mp, > head->h_fmt = cpu_to_be32(XLOG_FMT); > memcpy(&head->h_fs_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t)); > > - > iclog->ic_size = XFS_BUF_SIZE(bp) - log->l_iclog_hsize; > iclog->ic_state = XLOG_STATE_ACTIVE; > iclog->ic_log = log; > Index: 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log_priv.h 2008-03-13 14:06:58.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log_priv.h 2008-03-13 14:20:31.478596832 +1100 > @@ -402,8 +402,29 @@ typedef struct xlog_in_core { > * that round off problems won't occur when releasing partial reservations. > */ > typedef struct log { > + /* The following fields don't need locking */ > + struct xfs_mount *l_mp; /* mount point */ > + struct xfs_buf *l_xbuf; /* extra buffer for log > + * wrapping */ > + struct xfs_buftarg *l_targ; /* buftarg of log */ > + uint l_flags; > + uint l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */ > + struct xfs_buf_cancel **l_buf_cancel_table; > + int l_iclog_hsize; /* size of iclog header */ > + int l_iclog_heads; /* # of iclog header sectors */ > + uint l_sectbb_log; /* log2 of sector size in BBs */ > + uint l_sectbb_mask; /* sector size (in BBs) > + * alignment mask */ > + int l_iclog_size; /* size of log in bytes */ > + int l_iclog_size_log; /* log power size of log */ > + int l_iclog_bufs; /* number of iclog buffers */ > + xfs_daddr_t l_logBBstart; /* start block of log */ > + int l_logsize; /* size of log in bytes */ > + int l_logBBsize; /* size of log in BB chunks */ > + > /* The following block of fields are changed while holding icloglock */ > - sema_t l_flushsema; /* iclog flushing semaphore */ > + sema_t l_flushsema ____cacheline_aligned_in_smp; > + /* iclog flushing semaphore */ > int l_flushcnt; /* # of procs waiting on this > * sema */ > int l_covered_state;/* state of "covering disk > @@ -413,27 +434,14 @@ typedef struct log { > xfs_lsn_t l_tail_lsn; /* lsn of 1st LR with unflushed > * buffers */ > xfs_lsn_t l_last_sync_lsn;/* lsn of last LR on disk */ > - struct xfs_mount *l_mp; /* mount point */ > - struct xfs_buf *l_xbuf; /* extra buffer for log > - * wrapping */ > - struct xfs_buftarg *l_targ; /* buftarg of log */ > - xfs_daddr_t l_logBBstart; /* start block of log */ > - int l_logsize; /* size of log in bytes */ > - int l_logBBsize; /* size of log in BB chunks */ > int l_curr_cycle; /* Cycle number of log writes */ > int l_prev_cycle; /* Cycle number before last > * block increment */ > int l_curr_block; /* current logical log block */ > int l_prev_block; /* previous logical log block */ > - int l_iclog_size; /* size of log in bytes */ > - int l_iclog_size_log; /* log power size of log */ > - int l_iclog_bufs; /* number of iclog buffers */ > - > - /* The following field are used for debugging; need to hold icloglock */ > - char *l_iclog_bak[XLOG_MAX_ICLOGS]; > > /* The following block of fields are changed while holding grant_lock */ > - spinlock_t l_grant_lock; > + spinlock_t l_grant_lock ____cacheline_aligned_in_smp; > xlog_ticket_t *l_reserve_headq; > xlog_ticket_t *l_write_headq; > int l_grant_reserve_cycle; > @@ -441,20 +449,17 @@ typedef struct log { > int l_grant_write_cycle; > int l_grant_write_bytes; > > - /* The following fields don't need locking */ > #ifdef XFS_LOG_TRACE > struct ktrace *l_trace; > struct ktrace *l_grant_trace; > #endif > - uint l_flags; > - uint l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */ > - struct xfs_buf_cancel **l_buf_cancel_table; > - int l_iclog_hsize; /* size of iclog header */ > - int l_iclog_heads; /* # of iclog header sectors */ > - uint l_sectbb_log; /* log2 of sector size in BBs */ > - uint l_sectbb_mask; /* sector size (in BBs) > - * alignment mask */ > -} xlog_t; > + > + /* The following field are used for debugging; need to hold icloglock */ > +#ifdef DEBUG > + char *l_iclog_bak[XLOG_MAX_ICLOGS]; > +#endif > + > +} xlog_t ____cacheline_aligned_in_smp; Is it necessary to add a ____cacheline_aligned_in_smp tag here? The important sections of the xlog_t structure are already tagged. > > #define XLOG_FORCED_SHUTDOWN(log) ((log)->l_flags & XLOG_IO_ERROR) > > From owner-xfs@oss.sgi.com Tue Apr 1 22:43:33 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 22:43:40 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m325hToY032646 for ; Tue, 1 Apr 2008 22:43:32 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA21975; Wed, 2 Apr 2008 15:44:05 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m325i4sT118800444; Wed, 2 Apr 2008 15:44:04 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m325i3vU118576416; Wed, 2 Apr 2008 15:44:03 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 15:44:03 +1000 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [Patch] Cacheline align xlog_t Message-ID: <20080402054403.GF103491721@sgi.com> References: <20080401231552.GV103491721@sgi.com> <47F3293C.6090708@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47F3293C.6090708@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15147 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:35:40PM +1000, Lachlan McIlroy wrote: > >*/ > >- uint l_sectbb_log; /* log2 of sector size in > >BBs */ > >- uint l_sectbb_mask; /* sector size (in BBs) > >- * alignment mask */ > >-} xlog_t; > >+ > >+ /* The following field are used for debugging; need to hold > >icloglock */ > >+#ifdef DEBUG > >+ char *l_iclog_bak[XLOG_MAX_ICLOGS]; > >+#endif > >+ > >+} xlog_t ____cacheline_aligned_in_smp; > Is it necessary to add a ____cacheline_aligned_in_smp tag here? The > important sections of the xlog_t structure are already tagged. This just means that the start of the structure is cacheline aligned. I don't think the internal alignment commands force the entire structure to be cacheline aligned, merely pad the struture internally. In that case, even though the specific internal parts of the structure are on separate cache lines, there's no guarantee that all the related members are on the same cacheline. Hence I'm explicitly stating the exact alignment I want for the structure.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 22:58:09 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 22:58:17 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m325w6RR002075 for ; Tue, 1 Apr 2008 22:58:08 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA22376; Wed, 2 Apr 2008 15:58:35 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m325wXsT118666214; Wed, 2 Apr 2008 15:58:34 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m325wVpS118673323; Wed, 2 Apr 2008 15:58:31 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 15:58:31 +1000 From: David Chinner To: Emmanuel Florac Cc: David Chinner , xfs@oss.sgi.com Subject: Re: Serious XFS crash Message-ID: <20080402055831.GG103491721@sgi.com> References: <20080325185453.3a1957dd@galadriel.home> <20080325233611.GW103491721@sgi.com> <20080401140035.46470306@galadriel.home> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20080401140035.46470306@galadriel.home> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15148 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Apr 01, 2008 at 02:00:35PM +0200, Emmanuel Florac wrote: > Le Wed, 26 Mar 2008 10:36:11 +1100 vous écriviez: > > > What sector size is being used for the XFS filesystem? If it's > > not the same as teh filesystem block size, then XFS can't have done > > this itself because the offset that this garbage starts at would > > not be block aligned..... > > I've gone thru the logs. This machine had a serious XFS crash on march > 6 due to bad blocks (failed drive in the RAID-5). Is it possible that > the March 19 XFS crash is related to this, i. e. after running > xfs_repair on march 6 it remained some on-disk garbage that provoked a > new crash a couple of weeks later? > > Here is the march 6 crash : > > Mar 6 10:42:46 system3 kernel: [xfs_alloc_read_agf+244/432] > xfs_alloc_read_agf+0xf4/0x1b0 Mar 6 10:42:46 system3 kernel: > [xfs_alloc_fix_freelist+1000/1120] xfs_alloc_fix_freelist+0x3e8/0x460 > Mar 6 10:42:46 system3 last message repeated 2 times Mar 6 10:42:46 > system3 kernel: [_xfs_trans_commit+489/928] .... The log is rather garbled - can you repost? Also, XFS usually outputs an error message before the stack trace; can you make sure you paste that as well (if it exists)? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 23:21:34 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:21:42 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326LUIS005231 for ; Tue, 1 Apr 2008 23:21:32 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA22959; Wed, 2 Apr 2008 16:21:55 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m326LqsT118778853; Wed, 2 Apr 2008 16:21:54 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m326LlJD118466688; Wed, 2 Apr 2008 16:21:47 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 2 Apr 2008 16:21:47 +1000 From: David Chinner To: Takashi Sato Cc: David Chinner , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com, dm-devel@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/2] Add timeout feature Message-ID: <20080402062147.GH103491721@sgi.com> References: <20080328180736t-sato@mail.jp.nec.com> <20080331000057.GI108924158@sgi.com> <2530BB4B166747659C8F65C9C3DE7CFB@nsl.ad.nec.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2530BB4B166747659C8F65C9C3DE7CFB@nsl.ad.nec.co.jp> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15149 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Apr 01, 2008 at 07:54:42PM +0900, Takashi Sato wrote: > Hi, > > David Chinner wrote: > >The timeout is not for the freeze operation - the timeout is > >only set up once the freeze is complete. i.e: > > > >$ time sudo ~/test_src/xfs_io -f -x -c 'gfreeze 10' /mnt/scratch/test > >freezing with level = 10 > > > >real 0m23.204s > >user 0m0.008s > >sys 0m0.012s > > > >The freeze takes 23s, and then the 10s timeout is started. So > >this timeout does not protect against freeze_bdev() hangs at all. > >All it does is introduce silent unfreezing of the block device that > >can not be synchronised with the application that is operating > >on the frozen device. > > Exactly my timeout feature is only for an application, not for > freeze_bdev(). > I think it is needed for the situation we can't unfreeze from userspace. > (e.g. Freezing the root filesystem) Ummm - why can't you unfreeze the root fs from userspace? freezing only prevents modification to the filesystem. A frozen filesystem is effectively a read-only filesystem... On XFS: # xfs_freeze -f / # echo $? 0 # xfs_freeze -u / # echo $? 0 The underlying filesystem is broken w.r.t. freezing if you can't read from it successfully once it's been frozen.... > >FWIW, resetting this timeout from userspace is unreliable - there's > >no guarantee that under load your userspace process will get to run > >again inside the timeout to reset it, hence leaving you with a > >unfrozen filesystem when you really want it frozen... > > The timeout period specified to the reset ioctl should be much larger than > the interval for calling the reset ioctl repeatedly. > (e.g timeout period = 2 minutes, calling interval = 5 seconds) What application developer will ever use this? > The reset ioctl will work under such setting. > If a timeout still occurs before a reset, it would imply that an unexpected > problem (e.g. deadlock) occur in an application. Right - the application is broken and needs fixing. We don't need to supply a crutch in a "new" API to support hypothetically broken applications that don't actually exist yet. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Apr 1 23:26:39 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:26:51 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_65 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326QXgu006083 for ; Tue, 1 Apr 2008 23:26:36 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23147; Wed, 2 Apr 2008 16:27:08 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 1161) id 9B57A58C4C16; Wed, 2 Apr 2008 16:27:08 +1000 (EST) Message-Id: <20080402062708.380299192@chook.melbourne.sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 02 Apr 2008 16:25:11 +1000 From: Barry Naujok To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 3/7] XFS: Refactor node format directory lookup/addname Content-Disposition: inline; filename=refactor_leafn_lookup.patch X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15150 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs The next step for case-insensitive support is to avoid polution of the dentry cache with entries pointing to the same inode, but with names that only differ in case. To perform this, we will need to pass the actual filename that matched backup to the XFS/VFS interface and make sure the dentry cache only contains entries with the actual case-sensitive name. But, before we can do this, it was found that the directory lookup code with multiple leaves was shared with code adding a name to that directory. Most of xfs_dir2_leafn_lookup_int() could be broken into two functions determined by if (args->addname) { } else { }. For the following patch, only the lookup case needs to handle the various xfs_nameops, with case-insensitive match handling in addition to returning the actual name. So, this patch separates xfs_dir2_leafn_lookup_int() into xfs_dir2_leafn_lookup_for_addname() and xfs_dir2_leafn_lookup_for_entry(). xfs_dir2_leafn_lookup_for_addname() iterates through the data blocks looking for a suitable empty space to insert the name while xfs_dir2_leafn_lookup_for_entry() uses the xfs_nameops to find the entry. xfs_dir2_leafn_lookup_for_entry() path also retains the data block where the first case-insensitive match occured as in the next patch which will return the name, the name is obtained from that block. Signed-off-by: Barry Naujok --- fs/xfs/xfs_dir2_node.c | 373 +++++++++++++++++++++++++++++-------------------- 1 file changed, 225 insertions(+), 148 deletions(-) Index: kern_ci/fs/xfs/xfs_dir2_node.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_node.c +++ kern_ci/fs/xfs/xfs_dir2_node.c @@ -387,12 +387,11 @@ xfs_dir2_leafn_lasthash( } /* - * Look up a leaf entry in a node-format leaf block. - * If this is an addname then the extrablk in state is a freespace block, - * otherwise it's a data block. + * Look up a leaf entry for space to add a name in a node-format leaf block. + * The extrablk in state is a freespace block. */ -int -xfs_dir2_leafn_lookup_int( +static int +xfs_dir2_leafn_lookup_for_addname( xfs_dabuf_t *bp, /* leaf buffer */ xfs_da_args_t *args, /* operation arguments */ int *indexp, /* out: leaf entry index */ @@ -401,7 +400,6 @@ xfs_dir2_leafn_lookup_int( xfs_dabuf_t *curbp; /* current data/free buffer */ xfs_dir2_db_t curdb; /* current data block number */ xfs_dir2_db_t curfdb; /* current free block number */ - xfs_dir2_data_entry_t *dep; /* data block entry */ xfs_inode_t *dp; /* incore directory inode */ int error; /* error return value */ int fi; /* free entry index */ @@ -414,7 +412,6 @@ xfs_dir2_leafn_lookup_int( xfs_dir2_db_t newdb; /* new data block number */ xfs_dir2_db_t newfdb; /* new free block number */ xfs_trans_t *tp; /* transaction pointer */ - xfs_dacmp_t cmp; /* comparison result */ dp = args->dp; tp = args->trans; @@ -432,27 +429,15 @@ xfs_dir2_leafn_lookup_int( /* * Do we have a buffer coming in? */ - if (state->extravalid) - curbp = state->extrablk.bp; - else - curbp = NULL; + curbp = state->extravalid ? state->extrablk.bp : NULL; /* * For addname, it's a free block buffer, get the block number. */ - if (args->addname) { - curfdb = curbp ? state->extrablk.blkno : -1; - curdb = -1; - length = xfs_dir2_data_entsize(args->namelen); - if ((free = (curbp ? curbp->data : NULL))) - ASSERT(be32_to_cpu(free->hdr.magic) == XFS_DIR2_FREE_MAGIC); - } - /* - * For others, it's a data block buffer, get the block number. - */ - else { - curfdb = -1; - curdb = curbp ? state->extrablk.blkno : -1; - } + curfdb = curbp ? state->extrablk.blkno : -1; + free = curbp ? curbp->data : NULL; + curdb = -1; + length = xfs_dir2_data_entsize(args->namelen); + ASSERT(!free || be32_to_cpu(free->hdr.magic) == XFS_DIR2_FREE_MAGIC); /* * Loop over leaf entries with the right hash value. */ @@ -472,134 +457,69 @@ xfs_dir2_leafn_lookup_int( * For addname, we're looking for a place to put the new entry. * We want to use a data block with an entry of equal * hash value to ours if there is one with room. + * + * If this block isn't the data block we already have + * in hand, take a look at it. */ - if (args->addname) { + if (newdb != curdb) { + curdb = newdb; /* - * If this block isn't the data block we already have - * in hand, take a look at it. + * Convert the data block to the free block + * holding its freespace information. */ - if (newdb != curdb) { - curdb = newdb; - /* - * Convert the data block to the free block - * holding its freespace information. - */ - newfdb = xfs_dir2_db_to_fdb(mp, newdb); - /* - * If it's not the one we have in hand, - * read it in. - */ - if (newfdb != curfdb) { - /* - * If we had one before, drop it. - */ - if (curbp) - xfs_da_brelse(tp, curbp); - /* - * Read the free block. - */ - if ((error = xfs_da_read_buf(tp, dp, - xfs_dir2_db_to_da(mp, - newfdb), - -1, &curbp, - XFS_DATA_FORK))) { - return error; - } - free = curbp->data; - ASSERT(be32_to_cpu(free->hdr.magic) == - XFS_DIR2_FREE_MAGIC); - ASSERT((be32_to_cpu(free->hdr.firstdb) % - XFS_DIR2_MAX_FREE_BESTS(mp)) == - 0); - ASSERT(be32_to_cpu(free->hdr.firstdb) <= curdb); - ASSERT(curdb < - be32_to_cpu(free->hdr.firstdb) + - be32_to_cpu(free->hdr.nvalid)); - } - /* - * Get the index for our entry. - */ - fi = xfs_dir2_db_to_fdindex(mp, curdb); - /* - * If it has room, return it. - */ - if (unlikely(be16_to_cpu(free->bests[fi]) == NULLDATAOFF)) { - XFS_ERROR_REPORT("xfs_dir2_leafn_lookup_int", - XFS_ERRLEVEL_LOW, mp); - if (curfdb != newfdb) - xfs_da_brelse(tp, curbp); - return XFS_ERROR(EFSCORRUPTED); - } - curfdb = newfdb; - if (be16_to_cpu(free->bests[fi]) >= length) { - *indexp = index; - state->extravalid = 1; - state->extrablk.bp = curbp; - state->extrablk.blkno = curfdb; - state->extrablk.index = fi; - state->extrablk.magic = - XFS_DIR2_FREE_MAGIC; - ASSERT(args->oknoent); - return XFS_ERROR(ENOENT); - } - } - } - /* - * Not adding a new entry, so we really want to find - * the name given to us. - */ - else { + newfdb = xfs_dir2_db_to_fdb(mp, newdb); /* - * If it's a different data block, go get it. + * If it's not the one we have in hand, + * read it in. */ - if (newdb != curdb) { + if (newfdb != curfdb) { /* - * If we had a block before, drop it. + * If we had one before, drop it. */ if (curbp) xfs_da_brelse(tp, curbp); /* - * Read the data block. + * Read the free block. */ - if ((error = - xfs_da_read_buf(tp, dp, - xfs_dir2_db_to_da(mp, newdb), -1, - &curbp, XFS_DATA_FORK))) { + error = xfs_da_read_buf(tp, dp, + xfs_dir2_db_to_da(mp, newfdb), + -1, &curbp, XFS_DATA_FORK); + if (error) return error; - } - xfs_dir2_data_check(dp, curbp); - curdb = newdb; + + free = curbp->data; + ASSERT(be32_to_cpu(free->hdr.magic) == + XFS_DIR2_FREE_MAGIC); + ASSERT((be32_to_cpu(free->hdr.firstdb) % + XFS_DIR2_MAX_FREE_BESTS(mp)) == 0); + ASSERT(be32_to_cpu(free->hdr.firstdb) <= curdb); + ASSERT(curdb < be32_to_cpu(free->hdr.firstdb) + + be32_to_cpu(free->hdr.nvalid)); } /* - * Point to the data entry. + * Get the index for our entry. + */ + fi = xfs_dir2_db_to_fdindex(mp, curdb); + /* + * If it has room, return it. */ - dep = (xfs_dir2_data_entry_t *) - ((char *)curbp->data + - xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address))); - /* - * Compare the entry, return it if it matches. - */ - cmp = args->oknoent ? - xfs_dir_compname(dp, dep->name, dep->namelen, - args->name, args->namelen): - xfs_da_compname(dep->name, dep->namelen, - args->name, args->namelen); - if (cmp != XFS_CMP_DIFFERENT && - cmp != args->cmpresult) { - args->cmpresult = cmp; - args->inumber = be64_to_cpu(dep->inumber); + if (unlikely(be16_to_cpu(free->bests[fi]) == NULLDATAOFF)) { + XFS_ERROR_REPORT("xfs_dir2_leafn_lookup_int", + XFS_ERRLEVEL_LOW, mp); + if (curfdb != newfdb) + xfs_da_brelse(tp, curbp); + return XFS_ERROR(EFSCORRUPTED); + } + curfdb = newfdb; + if (be16_to_cpu(free->bests[fi]) >= length) { *indexp = index; - if (cmp == XFS_CMP_EXACT) { - state->extravalid = 1; - state->extrablk.blkno = curdb; - state->extrablk.index = - (int)((char *)dep - - (char *)curbp->data); - state->extrablk.magic = - XFS_DIR2_DATA_MAGIC; - state->extrablk.bp = curbp; - return XFS_ERROR(EEXIST); - } + state->extravalid = 1; + state->extrablk.bp = curbp; + state->extrablk.blkno = curfdb; + state->extrablk.index = fi; + state->extrablk.magic = XFS_DIR2_FREE_MAGIC; + ASSERT(args->oknoent); + return XFS_ERROR(ENOENT); } } } @@ -608,31 +528,166 @@ xfs_dir2_leafn_lookup_int( * If we are holding a buffer, give it back in case our caller * finds it useful. */ - if ((state->extravalid = (curbp != NULL))) { + if (curbp != NULL) { + state->extravalid = 1; state->extrablk.bp = curbp; state->extrablk.index = -1; /* * For addname, giving back a free block. */ - if (args->addname) { - state->extrablk.blkno = curfdb; - state->extrablk.magic = XFS_DIR2_FREE_MAGIC; + state->extrablk.blkno = curfdb; + state->extrablk.magic = XFS_DIR2_FREE_MAGIC; + } + /* + * Return the final index, that will be the insertion point. + */ + *indexp = index; + ASSERT(index == be16_to_cpu(leaf->hdr.count) || args->oknoent); + return XFS_ERROR(ENOENT); +} + +/* + * Look up a leaf entry in a node-format leaf block. + * The extrablk in state a data block. + */ +static int +xfs_dir2_leafn_lookup_for_entry( + xfs_dabuf_t *bp, /* leaf buffer */ + xfs_da_args_t *args, /* operation arguments */ + int *indexp, /* out: leaf entry index */ + xfs_da_state_t *state) /* state to fill in */ +{ + xfs_dabuf_t *curbp; /* current data/free buffer */ + xfs_dir2_db_t curdb; /* current data block number */ + xfs_dir2_data_entry_t *dep; /* data block entry */ + xfs_inode_t *dp; /* incore directory inode */ + int error; /* error return value */ + int index; /* leaf entry index */ + xfs_dir2_leaf_t *leaf; /* leaf structure */ + xfs_dir2_leaf_entry_t *lep; /* leaf entry */ + xfs_mount_t *mp; /* filesystem mount point */ + xfs_dir2_db_t newdb; /* new data block number */ + xfs_trans_t *tp; /* transaction pointer */ + xfs_dacmp_t cmp; /* comparison result */ + xfs_dabuf_t *ci_bp = NULL; /* buffer with CI match */ + + dp = args->dp; + tp = args->trans; + mp = dp->i_mount; + leaf = bp->data; + ASSERT(be16_to_cpu(leaf->hdr.info.magic) == XFS_DIR2_LEAFN_MAGIC); +#ifdef __KERNEL__ + ASSERT(be16_to_cpu(leaf->hdr.count) > 0); +#endif + xfs_dir2_leafn_check(dp, bp); + /* + * Look up the hash value in the leaf entries. + */ + index = xfs_dir2_leaf_search_hash(args, bp); + /* + * Do we have a buffer coming in? + */ + if (state->extravalid) { + curbp = state->extrablk.bp; + curdb = state->extrablk.blkno; + if (args->cmpresult == XFS_CMP_CASE) + ci_bp = curbp; + } else { + curbp = NULL; + curdb = -1; + } + /* + * Loop over leaf entries with the right hash value. + */ + for (lep = &leaf->ents[index]; + index < be16_to_cpu(leaf->hdr.count) && + be32_to_cpu(lep->hashval) == args->hashval; + lep++, index++) { + /* + * Skip stale leaf entries. + */ + if (be32_to_cpu(lep->address) == XFS_DIR2_NULL_DATAPTR) + continue; + /* + * Pull the data block number from the entry. + */ + newdb = xfs_dir2_dataptr_to_db(mp, be32_to_cpu(lep->address)); + /* + * Not adding a new entry, so we really want to find + * the name given to us. + * + * If it's a different data block, go get it. + */ + if (newdb != curdb) { + /* + * If we had a block before, drop it (unless it + * contains a case-insensitive match). + */ + if (curbp && curbp != ci_bp) + xfs_da_brelse(tp, curbp); + /* + * Read the data block. + */ + error = xfs_da_read_buf(tp, dp, + xfs_dir2_db_to_da(mp, newdb), -1, + &curbp, XFS_DATA_FORK); + if (error) + return error; + xfs_dir2_data_check(dp, curbp); + curdb = newdb; } /* - * For other callers, giving back a data block. + * Point to the data entry. */ - else { + dep = (xfs_dir2_data_entry_t *)((char *)curbp->data + + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address))); + /* + * Compare the entry, return it if it matches. + */ + cmp = args->oknoent ? + xfs_dir_compname(dp, dep->name, dep->namelen, + args->name, args->namelen): + xfs_da_compname(dep->name, dep->namelen, + args->name, args->namelen); + if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) { + args->cmpresult = cmp; + args->inumber = be64_to_cpu(dep->inumber); + *indexp = index; + if (ci_bp && ci_bp != curbp) + xfs_da_brelse(tp, ci_bp); + state->extravalid = 1; state->extrablk.blkno = curdb; + state->extrablk.index = (int)((char *)dep - + (char *)curbp->data); state->extrablk.magic = XFS_DIR2_DATA_MAGIC; + state->extrablk.bp = curbp; + if (cmp == XFS_CMP_EXACT) + return XFS_ERROR(EEXIST); } } /* - * For lookup (where args->oknoent is set, and args->addname is not - * set, the state->extrablk info is not used, just freed. + * if we have a case-insensitive match, we have to return ENOENT + * so xfs_da_node_lookup_int() can try the next leaf if one exists + * for the hash that may have an exact match. + * xfs_dir2_node_lookup() below handles the ENOENT and args->cmpresult + * to find the case-insensitive match and returns EEXIST. */ - if (args->cmpresult == XFS_CMP_CASE) { - ASSERT(!args->addname); - return XFS_ERROR(EEXIST); + if (args->cmpresult == XFS_CMP_CASE) + return XFS_ERROR(ENOENT); + /* + * Didn't find a match. + * If we are holding a buffer, give it back in case our caller + * finds it useful. + */ + if (curbp != NULL) { + state->extravalid = 1; + state->extrablk.bp = curbp; + state->extrablk.index = -1; + /* + * Giving back a data block. + */ + state->extrablk.blkno = curdb; + state->extrablk.magic = XFS_DIR2_DATA_MAGIC; } /* * Return the final index, that will be the insertion point. @@ -643,6 +698,23 @@ xfs_dir2_leafn_lookup_int( } /* + * Look up a leaf entry in a node-format leaf block. + * If this is an addname then the extrablk in state is a freespace block, + * otherwise it's a data block. + */ +int +xfs_dir2_leafn_lookup_int( + xfs_dabuf_t *bp, /* leaf buffer */ + xfs_da_args_t *args, /* operation arguments */ + int *indexp, /* out: leaf entry index */ + xfs_da_state_t *state) /* state to fill in */ +{ + return args->addname ? + xfs_dir2_leafn_lookup_for_addname(bp, args, indexp, state) : + xfs_dir2_leafn_lookup_for_entry(bp, args, indexp, state); +} + +/* * Move count leaf entries from source to destination leaf. * Log entries and headers. Stale entries are preserved. */ @@ -1785,6 +1857,11 @@ xfs_dir2_node_lookup( if (error) rval = error; /* + * If case-insensitive match was found in a leaf, return EEXIST. + */ + else if (rval == ENOENT && args->cmpresult == XFS_CMP_CASE) + rval = EEXIST; + /* * Release the btree blocks and leaf block. */ for (i = 0; i < state->path.active; i++) { -- From owner-xfs@oss.sgi.com Tue Apr 1 23:26:37 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:26:55 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326QWtO006081 for ; Tue, 1 Apr 2008 23:26:36 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23141; Wed, 2 Apr 2008 16:27:07 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 1161) id BFC4758C4C0F; Wed, 2 Apr 2008 16:27:07 +1000 (EST) Message-Id: <20080402062508.017738664@chook.melbourne.sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 02 Apr 2008 16:25:08 +1000 From: Barry Naujok To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 0/7] XFS: case-insensitive lookup and Unicode support X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15151 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs The following sequence of patches (must be applied in order) implements case-insensitive support in XFS in two ways: 1. ASCII only case-insensitive 2. Unicode case-insensitive mount ASCII only case-insensitive support is a mkfs option and is primary implemented to support existing IRIX filesystems migrating to Linux. Unicode support is also a mkfs option, but the case-insensitive mode is a mount time option. The user space patches were posted back in January: http://oss.sgi.com/archives/xfs/2008-01/msg00102.html -- From owner-xfs@oss.sgi.com Tue Apr 1 23:26:42 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:27:21 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_35, J_CHICKENPOX_41,J_CHICKENPOX_43,J_CHICKENPOX_53,J_CHICKENPOX_71 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326QYsn006089 for ; Tue, 1 Apr 2008 23:26:36 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23154; Wed, 2 Apr 2008 16:27:09 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 1161) id 88CD858C4C0F; Wed, 2 Apr 2008 16:27:09 +1000 (EST) Message-Id: <20080402062709.286398420@chook.melbourne.sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 02 Apr 2008 16:25:14 +1000 From: Barry Naujok To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 6/7] XFS: Native Language Support for Unicode in XFS Content-Disposition: inline; filename=nls_support.patch X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15154 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs Implement the "-o nls=" mount option and required conversion of older style charater sets to/from UTF-8 to support non-UTF8 locales. This option is compatible with other Linux filesystems supporting the "nls" mount option. NLS conversion is performed on filename operations including readdir and also user domain extended attribute names. The name zone defined in the "return name" patch is used for temporarily holding the converted name. If Unicode is not configed Y, then the NLS support is virtually a no-op. Signed-off-by: Barry Naujok --- fs/xfs/linux-2.6/xfs_linux.h | 5 + fs/xfs/linux-2.6/xfs_super.c | 21 ++++++ fs/xfs/xfs_attr.c | 78 +++++++++++++++--------- fs/xfs/xfs_attr.h | 6 - fs/xfs/xfs_attr_leaf.c | 74 ++++++++++++++++------- fs/xfs/xfs_clnt.h | 1 fs/xfs/xfs_dir2_block.c | 14 +++- fs/xfs/xfs_dir2_leaf.c | 15 ++++ fs/xfs/xfs_dir2_sf.c | 12 +++ fs/xfs/xfs_mount.h | 2 fs/xfs/xfs_rename.c | 12 +++ fs/xfs/xfs_unicode.c | 137 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_unicode.h | 16 +++++ fs/xfs/xfs_vfsops.c | 21 ++++++ fs/xfs/xfs_vnodeops.c | 117 +++++++++++++++++++++++++----------- 15 files changed, 429 insertions(+), 102 deletions(-) Index: kern_ci/fs/xfs/linux-2.6/xfs_linux.h =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_linux.h +++ kern_ci/fs/xfs/linux-2.6/xfs_linux.h @@ -181,6 +181,11 @@ #define howmany(x, y) (((x)+((y)-1))/(y)) /* + * NLS UTF-8 (unicode) character set + */ +#define XFS_NLS_UTF8 "utf8" + +/* * Various platform dependent calls that don't fit anywhere else */ #define xfs_sort(a,n,s,fn) sort(a,n,s,fn,NULL) Index: kern_ci/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_super.c +++ kern_ci/fs/xfs/linux-2.6/xfs_super.c @@ -126,6 +126,7 @@ xfs_args_allocate( #define MNTOPT_NOATTR2 "noattr2" /* do not use attr2 attribute format */ #define MNTOPT_FILESTREAM "filestreams" /* use filestreams allocator */ #define MNTOPT_CILOOKUP "ci" /* case-insensitive dir lookup */ +#define MNTOPT_NLS "nls" /* NLS code page to use */ #define MNTOPT_QUOTA "quota" /* disk quotas (user) */ #define MNTOPT_NOQUOTA "noquota" /* no quotas */ #define MNTOPT_USRQUOTA "usrquota" /* user quota enabled */ @@ -320,9 +321,20 @@ xfs_parseargs( args->flags &= ~XFSMNT_ATTR2; } else if (!strcmp(this_char, MNTOPT_FILESTREAM)) { args->flags2 |= XFSMNT2_FILESTREAMS; +#ifdef CONFIG_XFS_UNICODE } else if (!strcmp(this_char, MNTOPT_CILOOKUP)) { args->flags2 |= XFSMNT2_CILOOKUP; -#ifndef CONFIG_XFS_UNICODE + } else if (!strcmp(this_char, MNTOPT_NLS)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + strncpy(args->nls, value, MAXNAMELEN); +#else + } else if (!strcmp(this_char, MNTOPT_CILOOKUP) || + !strcmp(this_char, MNTOPT_NLS)) { cmn_err(CE_WARN, "XFS: %s option requires Unicode support", this_char); @@ -530,6 +542,13 @@ xfs_showargs( if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT)) seq_puts(m, "," MNTOPT_NOQUOTA); + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + if (mp->m_nls) + seq_printf(m, "," MNTOPT_NLS "=%s", mp->m_nls->charset); + else + seq_puts(m, "," MNTOPT_NLS "=" XFS_NLS_UTF8); + } + return 0; } __uint64_t Index: kern_ci/fs/xfs/xfs_attr.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_attr.c +++ kern_ci/fs/xfs/xfs_attr.c @@ -108,7 +108,7 @@ ktrace_t *xfs_attr_trace_buf; *========================================================================*/ int -xfs_attr_fetch(xfs_inode_t *ip, const char *name, int namelen, +xfs_attr_fetch(xfs_inode_t *ip, const uchar_t *name, int namelen, char *value, int *valuelenp, int flags, struct cred *cred) { xfs_da_args_t args; @@ -167,6 +167,7 @@ xfs_attr_get( cred_t *cred) { int error, namelen; + const uchar_t *uni_name; XFS_STATS_INC(xs_attr_get); @@ -176,24 +177,29 @@ xfs_attr_get( if (namelen >= MAXNAMELEN) return(EFAULT); /* match IRIX behaviour */ + if (XFS_FORCED_SHUTDOWN(ip->i_mount)) + return(EIO); + /* Enforce UTF-8 only for user attr names */ if (xfs_sb_version_hasunicode(&ip->i_mount->m_sb) && (flags & (ATTR_ROOT | ATTR_SECURE)) == 0) { - error = xfs_unicode_validate(name, namelen); + error = xfs_nls_to_unicode(ip->i_mount, name, namelen, + &uni_name, &namelen); if (error) return error; - } - if (XFS_FORCED_SHUTDOWN(ip->i_mount)) - return(EIO); + } else + uni_name = name; xfs_ilock(ip, XFS_ILOCK_SHARED); - error = xfs_attr_fetch(ip, name, namelen, value, valuelenp, flags, cred); + error = xfs_attr_fetch(ip, uni_name, namelen, value, valuelenp, + flags, cred); xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_unicode_nls_free(name, uni_name); return(error); } int -xfs_attr_set_int(xfs_inode_t *dp, const char *name, int namelen, +xfs_attr_set_int(xfs_inode_t *dp, const uchar_t *name, int namelen, char *value, int valuelen, int flags) { xfs_da_args_t args; @@ -437,26 +443,31 @@ xfs_attr_set( int valuelen, int flags) { - int namelen; + int error, namelen; + const uchar_t *uni_name; namelen = strlen(name); if (namelen >= MAXNAMELEN) return EFAULT; /* match IRIX behaviour */ + XFS_STATS_INC(xs_attr_set); + + if (XFS_FORCED_SHUTDOWN(dp->i_mount)) + return (EIO); + /* Enforce UTF-8 only for user attr names */ if (xfs_sb_version_hasunicode(&dp->i_mount->m_sb) && (flags & (ATTR_ROOT | ATTR_SECURE)) == 0) { - int error = xfs_unicode_validate(name, namelen); + error = xfs_nls_to_unicode(dp->i_mount, name, namelen, + &uni_name, &namelen); if (error) return error; - } - - XFS_STATS_INC(xs_attr_set); - - if (XFS_FORCED_SHUTDOWN(dp->i_mount)) - return (EIO); + } else + uni_name = name; - return xfs_attr_set_int(dp, name, namelen, value, valuelen, flags); + error = xfs_attr_set_int(dp, uni_name, namelen, value, valuelen, flags); + xfs_unicode_nls_free(name, uni_name); + return error; } /* @@ -464,7 +475,8 @@ xfs_attr_set( * Transitions attribute list from Btree to shortform as necessary. */ int -xfs_attr_remove_int(xfs_inode_t *dp, const char *name, int namelen, int flags) +xfs_attr_remove_int(xfs_inode_t *dp, const uchar_t *name, int namelen, + int flags) { xfs_da_args_t args; xfs_fsblock_t firstblock; @@ -591,35 +603,41 @@ xfs_attr_remove( const char *name, int flags) { - int namelen; + int error, namelen; + const uchar_t *uni_name; namelen = strlen(name); if (namelen >= MAXNAMELEN) return EFAULT; /* match IRIX behaviour */ + XFS_STATS_INC(xs_attr_remove); + + if (XFS_FORCED_SHUTDOWN(dp->i_mount)) + return (EIO); + /* Enforce UTF-8 only for user attr names */ if (xfs_sb_version_hasunicode(&dp->i_mount->m_sb) && (flags & (ATTR_ROOT | ATTR_SECURE)) == 0) { - int error = xfs_unicode_validate(name, namelen); + error = xfs_nls_to_unicode(dp->i_mount, name, namelen, + &uni_name, &namelen); if (error) return error; - } - - XFS_STATS_INC(xs_attr_remove); - - if (XFS_FORCED_SHUTDOWN(dp->i_mount)) - return (EIO); + } else + uni_name = name; xfs_ilock(dp, XFS_ILOCK_SHARED); if (XFS_IFORK_Q(dp) == 0 || (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS && dp->i_d.di_anextents == 0)) { xfs_iunlock(dp, XFS_ILOCK_SHARED); + xfs_unicode_nls_free(name, uni_name); return(XFS_ERROR(ENOATTR)); } xfs_iunlock(dp, XFS_ILOCK_SHARED); - return xfs_attr_remove_int(dp, name, namelen, flags); + error = xfs_attr_remove_int(dp, uni_name, namelen, flags); + xfs_unicode_nls_free(name, uni_name); + return error; } int /* error */ @@ -658,9 +676,9 @@ xfs_attr_list_int(xfs_attr_list_context_ */ /*ARGSUSED*/ STATIC int -xfs_attr_put_listent(xfs_attr_list_context_t *context, attrnames_t *namesp, - char *name, int namelen, - int valuelen, char *value) +xfs_attr_user_list(xfs_attr_list_context_t *context, attrnames_t *namesp, + char *name, int namelen, + int valuelen, char *value) { attrlist_ent_t *aep; int arraytop; @@ -789,7 +807,7 @@ xfs_attr_list( context.alist->al_count = 0; context.alist->al_more = 0; context.alist->al_offset[0] = context.bufsize; - context.put_listent = xfs_attr_put_listent; + context.put_listent = xfs_attr_user_list; } if (XFS_FORCED_SHUTDOWN(dp->i_mount)) Index: kern_ci/fs/xfs/xfs_attr.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_attr.h +++ kern_ci/fs/xfs/xfs_attr.h @@ -158,13 +158,13 @@ struct xfs_da_args; /* * Overall external interface routines. */ -int xfs_attr_set_int(struct xfs_inode *, const char *, int, char *, int, int); -int xfs_attr_remove_int(struct xfs_inode *, const char *, int, int); +int xfs_attr_set_int(struct xfs_inode *, const uchar_t *, int, char *, int, int); +int xfs_attr_remove_int(struct xfs_inode *, const uchar_t *, int, int); int xfs_attr_list_int(struct xfs_attr_list_context *); int xfs_attr_inactive(struct xfs_inode *dp); int xfs_attr_shortform_getvalue(struct xfs_da_args *); -int xfs_attr_fetch(struct xfs_inode *, const char *, int, +int xfs_attr_fetch(struct xfs_inode *, const uchar_t *, int, char *, int *, int, struct cred *); int xfs_attr_rmtval_get(struct xfs_da_args *args); Index: kern_ci/fs/xfs/xfs_attr_leaf.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_attr_leaf.c +++ kern_ci/fs/xfs/xfs_attr_leaf.c @@ -42,6 +42,7 @@ #include "xfs_attr.h" #include "xfs_attr_leaf.h" #include "xfs_error.h" +#include "xfs_unicode.h" /* * xfs_attr_leaf.c @@ -89,6 +90,9 @@ STATIC void xfs_attr_leaf_moveents(xfs_a int dst_start, int move_count, xfs_mount_t *mp); STATIC int xfs_attr_leaf_entsize(xfs_attr_leafblock_t *leaf, int index); +STATIC int xfs_attr_put_listent(xfs_attr_list_context_t *context, + attrnames_t *namesp, char *name, int namelen, + int valuelen, char *value); /*======================================================================== * Namespace helper routines @@ -150,7 +154,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t int offset; int minforkoff; /* lower limit on valid forkoff locations */ int maxforkoff; /* upper limit on valid forkoff locations */ - int dsize; + int dsize; xfs_mount_t *mp = dp->i_mount; offset = (XFS_LITINO(mp) - bytes) >> 3; /* rounded down */ @@ -171,39 +175,39 @@ xfs_attr_shortform_bytesfit(xfs_inode_t } dsize = dp->i_df.if_bytes; - + switch (dp->i_d.di_format) { case XFS_DINODE_FMT_EXTENTS: - /* - * If there is no attr fork and the data fork is extents, - * determine if creating the default attr fork will result - * in the extents form migrating to btree. If so, the - * minimum offset only needs to be the space required for + /* + * If there is no attr fork and the data fork is extents, + * determine if creating the default attr fork will result + * in the extents form migrating to btree. If so, the + * minimum offset only needs to be the space required for * the btree root. - */ + */ if (!dp->i_d.di_forkoff && dp->i_df.if_bytes > mp->m_attroffset) dsize = XFS_BMDR_SPACE_CALC(MINDBTPTRS); break; - + case XFS_DINODE_FMT_BTREE: /* * If have data btree then keep forkoff if we have one, - * otherwise we are adding a new attr, so then we set - * minforkoff to where the btree root can finish so we have + * otherwise we are adding a new attr, so then we set + * minforkoff to where the btree root can finish so we have * plenty of room for attrs */ if (dp->i_d.di_forkoff) { - if (offset < dp->i_d.di_forkoff) + if (offset < dp->i_d.di_forkoff) return 0; - else + else return dp->i_d.di_forkoff; } else dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot); break; } - - /* - * A data fork btree root must have space for at least + + /* + * A data fork btree root must have space for at least * MINDBTPTRS key/ptr pairs if the data fork is small or empty. */ minforkoff = MAX(dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS)); @@ -370,7 +374,7 @@ xfs_attr_shortform_remove(xfs_da_args_t */ totsize -= size; if (totsize == sizeof(xfs_attr_sf_hdr_t) && !args->addname && - (mp->m_flags & XFS_MOUNT_ATTR2) && + (mp->m_flags & XFS_MOUNT_ATTR2) && (dp->i_d.di_format != XFS_DINODE_FMT_BTREE)) { /* * Last attribute now removed, revert to original @@ -631,7 +635,7 @@ xfs_attr_shortform_list(xfs_attr_list_co continue; } namesp = xfs_attr_flags_namesp(sfe->flags); - error = context->put_listent(context, + error = xfs_attr_put_listent(context, namesp, (char *)sfe->nameval, (int)sfe->namelen, @@ -734,7 +738,7 @@ xfs_attr_shortform_list(xfs_attr_list_co cursor->hashval = sbp->hash; cursor->offset = 0; } - error = context->put_listent(context, + error = á(context, namesp, sbp->name, sbp->namelen, @@ -2418,7 +2422,7 @@ xfs_attr_leaf_list_int(xfs_dabuf_t *bp, xfs_attr_leaf_name_local_t *name_loc = XFS_ATTR_LEAF_NAME_LOCAL(leaf, i); - retval = context->put_listent(context, + retval = xfs_attr_put_listent(context, namesp, (char *)name_loc->nameval, (int)name_loc->namelen, @@ -2445,7 +2449,7 @@ xfs_attr_leaf_list_int(xfs_dabuf_t *bp, retval = xfs_attr_rmtval_get(&args); if (retval) return retval; - retval = context->put_listent(context, + retval = xfs_attr_put_listent(context, namesp, (char *)name_rmt->name, (int)name_rmt->namelen, @@ -2454,7 +2458,7 @@ xfs_attr_leaf_list_int(xfs_dabuf_t *bp, kmem_free(args.value, valuelen); } else { - retval = context->put_listent(context, + retval = xfs_attr_put_listent(context, namesp, (char *)name_rmt->name, (int)name_rmt->namelen, @@ -2472,6 +2476,32 @@ xfs_attr_leaf_list_int(xfs_dabuf_t *bp, return(retval); } +/* + * Do NLS name conversion if required for user attribute names and call + * context's put_listent routine + */ + +STATIC int +xfs_attr_put_listent(xfs_attr_list_context_t *context, attrnames_t *namesp, + char *name, int namelen, int valuelen, char *value) +{ + char *nls_name; + int nls_namelen; + int error; + + if (xfs_is_using_nls(context->dp->i_mount) && namesp == attr_user) { + error = xfs_unicode_to_nls(context->dp->i_mount, name, namelen, + &nls_name, &nls_namelen); + if (error) + return error; + error = context->put_listent(context, namesp, nls_name, + nls_namelen, valuelen, value); + xfs_unicode_nls_free(name, nls_name); + return error; + } else + return context->put_listent(context, namesp, name, namelen, + valuelen, value); +} /*======================================================================== * Manage the INCOMPLETE flag in a leaf entry Index: kern_ci/fs/xfs/xfs_clnt.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_clnt.h +++ kern_ci/fs/xfs/xfs_clnt.h @@ -48,6 +48,7 @@ struct xfs_mount_args { char rtname[MAXNAMELEN+1]; /* realtime device filename */ char logname[MAXNAMELEN+1]; /* journal device filename */ char mtpt[MAXNAMELEN+1]; /* filesystem mount point */ + char nls[MAXNAMELEN+1]; /* NLS character set to use */ int sunit; /* stripe unit (BBs) */ int swidth; /* stripe width (BBs), multiple of sunit */ uchar_t iosizelog; /* log2 of the preferred I/O size */ Index: kern_ci/fs/xfs/xfs_dir2_block.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_block.c +++ kern_ci/fs/xfs/xfs_dir2_block.c @@ -38,6 +38,7 @@ #include "xfs_dir2_block.h" #include "xfs_dir2_trace.h" #include "xfs_error.h" +#include "xfs_unicode.h" /* * Local function prototypes. @@ -450,6 +451,8 @@ xfs_dir2_block_getdents( int wantoff; /* starting block offset */ xfs_ino_t ino; xfs_off_t cook; + const uchar_t *nls_name; + int nls_namelen; mp = dp->i_mount; /* @@ -513,16 +516,21 @@ xfs_dir2_block_getdents( #if XFS_BIG_INUMS ino += mp->m_inoadd; #endif - + error = xfs_unicode_to_nls(mp, dep->name, dep->namelen, + &nls_name, &nls_namelen); + if (error) + break; /* * If it didn't fit, set the final offset to here & return. */ - if (filldir(dirent, dep->name, dep->namelen, cook, + if (filldir(dirent, nls_name, nls_namelen, cook, ino, DT_UNKNOWN)) { *offset = cook; + xfs_unicode_nls_free(dep->name, nls_name); xfs_da_brelse(NULL, bp); return 0; } + xfs_unicode_nls_free(dep->name, nls_name); } /* @@ -531,7 +539,7 @@ xfs_dir2_block_getdents( */ *offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk + 1, 0); xfs_da_brelse(NULL, bp); - return 0; + return error; } /* Index: kern_ci/fs/xfs/xfs_dir2_leaf.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_leaf.c +++ kern_ci/fs/xfs/xfs_dir2_leaf.c @@ -40,6 +40,7 @@ #include "xfs_dir2_node.h" #include "xfs_dir2_trace.h" #include "xfs_error.h" +#include "xfs_unicode.h" /* * Local function declarations. @@ -780,6 +781,8 @@ xfs_dir2_leaf_getdents( int ra_offset; /* map entry offset for ra */ int ra_want; /* readahead count wanted */ xfs_ino_t ino; + const uchar_t *nls_name; /* NLS name buffer */ + int nls_namelen; /* * If the offset is at or past the largest allowed value, @@ -1087,13 +1090,21 @@ xfs_dir2_leaf_getdents( ino += mp->m_inoadd; #endif + error = xfs_unicode_to_nls(mp, dep->name, dep->namelen, + &nls_name, &nls_namelen); + if (error) + break; + /* * Won't fit. Return to caller. */ - if (filldir(dirent, dep->name, dep->namelen, + if (filldir(dirent, nls_name, nls_namelen, xfs_dir2_byte_to_dataptr(mp, curoff), - ino, DT_UNKNOWN)) + ino, DT_UNKNOWN)) { + xfs_unicode_nls_free(dep->name, nls_name); break; + } + xfs_unicode_nls_free(dep->name, nls_name); /* * Advance to next entry in the block. Index: kern_ci/fs/xfs/xfs_dir2_sf.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_sf.c +++ kern_ci/fs/xfs/xfs_dir2_sf.c @@ -38,6 +38,7 @@ #include "xfs_dir2_leaf.h" #include "xfs_dir2_block.h" #include "xfs_dir2_trace.h" +#include "xfs_unicode.h" /* * Prototypes for internal functions. @@ -700,6 +701,7 @@ xfs_dir2_sf_getdents( xfs_off_t *offset, filldir_t filldir) { + int error; int i; /* shortform entry number */ xfs_mount_t *mp; /* filesystem mount point */ xfs_dir2_dataptr_t off; /* current entry's offset */ @@ -708,6 +710,8 @@ xfs_dir2_sf_getdents( xfs_dir2_dataptr_t dot_offset; xfs_dir2_dataptr_t dotdot_offset; xfs_ino_t ino; + const uchar_t *nls_name; /* NLS name buffer */ + int nls_namelen; mp = dp->i_mount; @@ -789,12 +793,18 @@ xfs_dir2_sf_getdents( #if XFS_BIG_INUMS ino += mp->m_inoadd; #endif + error = xfs_unicode_to_nls(mp, sfep->name, sfep->namelen, + &nls_name, &nls_namelen); + if (error) + return error; - if (filldir(dirent, sfep->name, sfep->namelen, + if (filldir(dirent, nls_name, nls_namelen, off, ino, DT_UNKNOWN)) { *offset = off; + xfs_unicode_nls_free(sfep->name, nls_name); return 0; } + xfs_unicode_nls_free(sfep->name, nls_name); sfep = xfs_dir2_sf_nextentry(sfp, sfep); } Index: kern_ci/fs/xfs/xfs_mount.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_mount.h +++ kern_ci/fs/xfs/xfs_mount.h @@ -54,6 +54,7 @@ typedef struct xfs_trans_reservations { #else struct cred; struct log; +struct nls_table; struct xfs_mount_args; struct xfs_inode; struct xfs_bmbt_irec; @@ -316,6 +317,7 @@ typedef struct xfs_mount { __uint8_t m_sectbb_log; /* sectlog - BBSHIFT */ struct xfs_nameops *m_dirnameops; /* vector of dir name ops */ struct xfs_cft *m_cft; /* unicode case fold table */ + struct nls_table *m_nls; /* active NLS table */ int m_dirblksize; /* directory block sz--bytes */ int m_dirblkfsbs; /* directory block sz--fsbs */ xfs_dablk_t m_dirdatablk; /* blockno of dir data v2 */ Index: kern_ci/fs/xfs/xfs_rename.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_rename.c +++ kern_ci/fs/xfs/xfs_rename.c @@ -250,10 +250,14 @@ xfs_rename( xfs_itrace_entry(target_dp); if (xfs_sb_version_hasunicode(&mp->m_sb)) { - error = xfs_unicode_validate(src_name, src_namelen); + error = xfs_nls_to_unicode(mp, + VNAME(src_vname), VNAMELEN(src_vname), + (const uchar_t **)&src_name, &src_namelen); if (error) return error; - error = xfs_unicode_validate(target_name, target_namelen); + error = xfs_nls_to_unicode(mp, + VNAME(target_vname), VNAMELEN(target_vname), + (const uchar_t **)&target_name, &target_namelen); if (error) return error; } @@ -265,6 +269,8 @@ xfs_rename( src_name, target_name, 0, 0, 0); if (error) { + xfs_unicode_nls_free(VNAME(src_vname), src_name); + xfs_unicode_nls_free(VNAME(target_vname), target_name); return error; } } @@ -598,6 +604,8 @@ std_return: src_name, target_name, 0, error, 0); } + xfs_unicode_nls_free(VNAME(src_vname), src_name); + xfs_unicode_nls_free(VNAME(target_vname), target_name); return error; abort_return: Index: kern_ci/fs/xfs/xfs_unicode.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_unicode.c +++ kern_ci/fs/xfs/xfs_unicode.c @@ -497,3 +497,140 @@ xfs_unicode_uninit(void) mutex_unlock(&cft_lock); mutex_destroy(&cft_lock); } + +/* + * Convert UTF-8 (Unicode) string into the specified character set in "nls". + * If no NLS conversion is required (mp->m_nls = NULL), the pointers are + * return as is. Otherwise, a new buffer is allocated and returned. + * xfs_unicode_nls_free() must be called with the source uni_name and returned + * nls_name so it can free the buffer if required. + */ +int +xfs_unicode_to_nls( + xfs_mount_t *mp, + const uchar_t *uni_name, + int uni_namelen, + const uchar_t **nls_name, + int *nls_namelen) +{ + char *n; + int i, o; + wchar_t uc; + int nlen; + int u8len; + int error; + + if (!xfs_is_using_nls(mp)) { + *nls_name = uni_name; + *nls_namelen = uni_namelen; + return 0; + } + + n = xfs_da_name_alloc(); + if (!n) + return ENOMEM; + + error = 0; + for (i = 0, o = 0; i < uni_namelen && o < MAXNAMELEN; + i += u8len, o += nlen) { + u8len = utf8_mbtowc(&uc, uni_name + i, uni_namelen - i); + if (u8len < 0) { + error = EINVAL; + goto err_out; + } + nlen = mp->m_nls->uni2char(uc, n + o, MAXNAMELEN - o); + if (nlen == -EINVAL) { + n[o] = '?'; + nlen = 1; + } else if (nlen < 0) { + error = -nlen; + goto err_out; + } + } + if (i == uni_namelen) { + *nls_name = n; + *nls_namelen = o; + return 0; + } + error = ENAMETOOLONG; +err_out: + xfs_da_name_free(n); + return error; +} + +/* + * Convert the "nls" specified charset string into UTF-8 (Unicode). + * If no NLS conversion is required (mp->m_nls = NULL), the pointers are + * return as is. Otherwise, a new buffer is allocated and returned. + * xfs_unicode_nls_free() must be called with the source uni_name and returned + * nls_name so it can free the buffer if required. + * + * As this is used for all strings coming in from outside XFS, if NLS + * conversion is not used, validate the string as properly formed UTF-8. + */ +int +xfs_nls_to_unicode( + xfs_mount_t *mp, + const uchar_t *nls_name, + int nls_namelen, + const uchar_t **uni_name, + int *uni_namelen) +{ + char *n; + int i, o; + wchar_t uc; + int nlen; + int u8len; + int error; + + if (!xfs_is_using_nls(mp)) { + error = xfs_unicode_validate(nls_name, nls_namelen); + if (error) + return error; + *uni_name = nls_name; + *uni_namelen = nls_namelen; + return 0; + } + + n = xfs_da_name_alloc(); + if (!n) + return ENOMEM; + + error = 0; + for (i = 0, o = 0; i < nls_namelen; i += nlen, o += u8len) { + nlen = mp->m_nls->char2uni(nls_name + i, nls_namelen - i, &uc); + if (nlen < 0) { + error = -nlen; + goto err_out; + } + if (uc >= 0xfffe || (uc >= 0xd800 && uc <= 0xdfff)) { + error = EINVAL; /* don't support chars outside BMP */ + goto err_out; + } + u8len = utf8_wctomb(n + o, uc, MAXNAMELEN - o); + if (u8len <= 0) { + error = (MAXNAMELEN - o < 3) ? ENAMETOOLONG : EINVAL; + goto err_out; + } + } + *uni_name = n; + *uni_namelen = o; + return 0; +err_out: + xfs_da_name_free(n); + return error; + +} + +/* + * free the buffer that MAY have been allocated by xfs_unicode_to_nls() + * or xfs_nls_to_unicode(). + */ +void +xfs_unicode_nls_free( + const uchar_t *src_name, + const uchar_t *conv_name) +{ + if (src_name != conv_name) + xfs_da_name_free((uchar_t *)conv_name); +} Index: kern_ci/fs/xfs/xfs_unicode.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_unicode.h +++ kern_ci/fs/xfs/xfs_unicode.h @@ -65,6 +65,14 @@ int xfs_unicode_validate(const uchar_t * int xfs_unicode_read_cft(struct xfs_mount *mp); void xfs_unicode_free_cft(const xfs_cft_t *cft); +#define xfs_is_using_nls(mp) ((mp)->m_nls != NULL) + +int xfs_unicode_to_nls(struct xfs_mount *mp, const uchar_t *uni_name, + int uni_namelen, const uchar_t **nls_name, int *nls_namelen); +int xfs_nls_to_unicode(struct xfs_mount *mp, const uchar_t *nls_name, + int nls_namelen, const uchar_t **uni_name, int *uni_namelen); +void xfs_unicode_nls_free(const uchar_t *src_name, const uchar_t *conv_name); + #else #define xfs_unicode_nameops xfs_default_nameops @@ -76,6 +84,14 @@ void xfs_unicode_free_cft(const xfs_cft_ #define xfs_unicode_read_cft(mp) (EOPNOTSUPP) #define xfs_unicode_free_cft(cft) +#define xfs_is_using_nls(mp) 0 + +#define xfs_unicode_to_nls(mp, uname, ulen, pnname, pnlen) \ + ((*(pnname)) = (uname), (*(pnlen)) = (ulen), 0) +#define xfs_nls_to_unicode(mp, nname, nlen, puname, pulen) \ + ((*(puname)) = (nname), (*(pulen)) = (nlen), 0) +#define xfs_unicode_nls_free(sname, cname) + #endif /* CONFIG_XFS_UNICODE */ #endif /* __XFS_UNICODE_H__ */ Index: kern_ci/fs/xfs/xfs_vfsops.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_vfsops.c +++ kern_ci/fs/xfs/xfs_vfsops.c @@ -405,13 +405,30 @@ xfs_finish_flags( if (xfs_sb_version_hasunicode(&mp->m_sb)) { if (ap->flags2 & XFSMNT2_CILOOKUP) mp->m_flags |= XFS_MOUNT_CILOOKUP; + + mp->m_nls = ap->nls[0] ? load_nls(ap->nls) : load_nls_default(); + if (!mp->m_nls) { + cmn_err(CE_WARN, + "XFS: unable to load nls mapping \"%s\"\n", ap->nls); + return XFS_ERROR(EINVAL); + } + if (strcmp(mp->m_nls->charset, XFS_NLS_UTF8) == 0) { + /* special case utf8 - no translation required */ + unload_nls(mp->m_nls); + mp->m_nls = NULL; + } } else { /* * Check for mount options which require a Unicode FS */ if (ap->flags2 & XFSMNT2_CILOOKUP) { cmn_err(CE_WARN, - "XFS: can't do case-insensitive mount on non-utf8 filesystem"); + "XFS: can't do case-insensitive mount on non-Unicode filesystem"); + return XFS_ERROR(EINVAL); + } + if (ap->nls[0]) { + cmn_err(CE_WARN, + "XFS: can't use nls mount option on non-Unicode filesystem"); return XFS_ERROR(EINVAL); } } @@ -647,6 +664,8 @@ out: xfs_unmountfs(mp, credp); xfs_qmops_put(mp); xfs_dmops_put(mp); + if (xfs_is_using_nls(mp)) + unload_nls(mp->m_nls); kmem_free(mp, sizeof(xfs_mount_t)); } Index: kern_ci/fs/xfs/xfs_vnodeops.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_vnodeops.c +++ kern_ci/fs/xfs/xfs_vnodeops.c @@ -1779,13 +1779,14 @@ xfs_lookup( return XFS_ERROR(EIO); if (xfs_sb_version_hasunicode(&dp->i_mount->m_sb)) { - error = xfs_unicode_validate(d_name->name, d_name->len); + error = xfs_nls_to_unicode(dp->i_mount, d_name->name, + d_name->len, &name.name, &name.len); if (error) return error; + } else { + name.name = d_name->name; + name.len = d_name->len; } - - name.name = (uchar_t *)d_name->name; - name.len = d_name->len; rname.name = NULL; lock_mode = xfs_ilock_map_shared(dp); error = xfs_dir_lookup_int(dp, lock_mode, &name, &e_inum, &ip, &rname); @@ -1793,11 +1794,15 @@ xfs_lookup( *ipp = ip; xfs_itrace_ref(ip); if (rname.name) { - ci_name->name = rname.name; - ci_name->len = rname.len; + error = xfs_unicode_to_nls(dp->i_mount, + rname.name, rname.len, + &ci_name->name, &ci_name->len); + /* free rname.name if conversion occurred or error */ + xfs_unicode_nls_free(ci_name->name, rname.name); } } xfs_iunlock_map_shared(dp, lock_mode); + xfs_unicode_nls_free(d_name->name, name.name); return error; } @@ -1810,7 +1815,7 @@ xfs_create( xfs_inode_t **ipp, cred_t *credp) { - char *name = VNAME(dentry); + char *name; xfs_mount_t *mp = dp->i_mount; xfs_inode_t *ip; xfs_trans_t *tp; @@ -1832,12 +1837,14 @@ xfs_create( if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); - namelen = VNAMELEN(dentry); - if (xfs_sb_version_hasunicode(&mp->m_sb)) { - error = xfs_unicode_validate(name, namelen); + error = xfs_nls_to_unicode(mp, VNAME(dentry), VNAMELEN(dentry), + (const uchar_t **)&name, &namelen); if (error) return error; + } else { + name = VNAME(dentry); + namelen = VNAMELEN(dentry); } if (DM_EVENT_ENABLED(dp, DM_EVENT_CREATE)) { @@ -1846,8 +1853,10 @@ xfs_create( DM_RIGHT_NULL, name, NULL, mode, 0, 0); - if (error) + if (error) { + xfs_unicode_nls_free(VNAME(dentry), name); return error; + } dm_event_sent = 1; } @@ -1999,6 +2008,7 @@ std_return: DM_RIGHT_NULL, name, NULL, mode, error, 0); } + xfs_unicode_nls_free(VNAME(dentry), name); return error; abort_return: @@ -2290,10 +2300,10 @@ xfs_remove( xfs_inode_t *dp, bhv_vname_t *dentry) { - char *name = VNAME(dentry); + char *name; xfs_mount_t *mp = dp->i_mount; xfs_inode_t *ip = VNAME_TO_INODE(dentry); - int namelen = VNAMELEN(dentry); + int namelen; xfs_trans_t *tp = NULL; int error; xfs_bmap_free_t free_list; @@ -2309,17 +2319,23 @@ xfs_remove( return XFS_ERROR(EIO); if (xfs_sb_version_hasunicode(&mp->m_sb)) { - error = xfs_unicode_validate(name, namelen); + error = xfs_nls_to_unicode(mp, VNAME(dentry), VNAMELEN(dentry), + (const uchar_t **)&name, &namelen); if (error) return error; + } else { + name = VNAME(dentry); + namelen = VNAMELEN(dentry); } if (DM_EVENT_ENABLED(dp, DM_EVENT_REMOVE)) { error = XFS_SEND_NAMESP(mp, DM_EVENT_REMOVE, dp, DM_RIGHT_NULL, NULL, DM_RIGHT_NULL, name, NULL, ip->i_d.di_mode, 0, 0); - if (error) + if (error) { + xfs_unicode_nls_free(VNAME(dentry), name); return error; + } } /* @@ -2472,6 +2488,7 @@ xfs_remove( NULL, DM_RIGHT_NULL, name, NULL, ip->i_d.di_mode, error, 0); } + xfs_unicode_nls_free(VNAME(dentry), name); return error; error1: @@ -2511,22 +2528,26 @@ xfs_link( int cancel_flags; int committed; int resblks; - char *target_name = VNAME(dentry); + char *target_name; int target_namelen; xfs_itrace_entry(tdp); xfs_itrace_entry(sip); - target_namelen = VNAMELEN(dentry); ASSERT(!S_ISDIR(sip->i_d.di_mode)); if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); if (xfs_sb_version_hasunicode(&mp->m_sb)) { - error = xfs_unicode_validate(target_name, target_namelen); + error = xfs_nls_to_unicode(mp, VNAME(dentry), VNAMELEN(dentry), + (const uchar_t **)&target_name, + &target_namelen); if (error) return error; + } else { + target_name = VNAME(dentry); + target_namelen = VNAMELEN(dentry); } if (DM_EVENT_ENABLED(tdp, DM_EVENT_LINK)) { @@ -2534,8 +2555,10 @@ xfs_link( tdp, DM_RIGHT_NULL, sip, DM_RIGHT_NULL, target_name, NULL, 0, 0, 0); - if (error) + if (error) { + xfs_unicode_nls_free(VNAME(dentry), target_name); return error; + } } /* Return through std_return after this point. */ @@ -2646,6 +2669,7 @@ std_return: sip, DM_RIGHT_NULL, target_name, NULL, 0, error, 0); } + xfs_unicode_nls_free(VNAME(dentry), target_name); return error; abort_return: @@ -2666,8 +2690,8 @@ xfs_mkdir( xfs_inode_t **ipp, cred_t *credp) { - char *dir_name = VNAME(dentry); - int dir_namelen = VNAMELEN(dentry); + char *dir_name; + int dir_namelen; xfs_mount_t *mp = dp->i_mount; xfs_inode_t *cdp; /* inode of created dir */ xfs_trans_t *tp; @@ -2687,9 +2711,13 @@ xfs_mkdir( return XFS_ERROR(EIO); if (xfs_sb_version_hasunicode(&mp->m_sb)) { - error = xfs_unicode_validate(dir_name, dir_namelen); + error = xfs_nls_to_unicode(mp, VNAME(dentry), VNAMELEN(dentry), + (const uchar_t **)&dir_name, &dir_namelen); if (error) return error; + } else { + dir_name = VNAME(dentry); + dir_namelen = VNAMELEN(dentry); } tp = NULL; @@ -2699,8 +2727,10 @@ xfs_mkdir( dp, DM_RIGHT_NULL, NULL, DM_RIGHT_NULL, dir_name, NULL, mode, 0, 0); - if (error) + if (error) { + xfs_unicode_nls_free(VNAME(dentry), dir_name); return error; + } dm_event_sent = 1; } @@ -2858,6 +2888,7 @@ std_return: dir_name, NULL, mode, error, 0); } + xfs_unicode_nls_free(VNAME(dentry), dir_name); return error; error2: @@ -2882,8 +2913,8 @@ xfs_rmdir( bhv_vname_t *dentry) { bhv_vnode_t *dir_vp = XFS_ITOV(dp); - char *name = VNAME(dentry); - int namelen = VNAMELEN(dentry); + char *name; + int namelen; xfs_mount_t *mp = dp->i_mount; xfs_inode_t *cdp = VNAME_TO_INODE(dentry); xfs_trans_t *tp; @@ -2901,9 +2932,13 @@ xfs_rmdir( return XFS_ERROR(EIO); if (xfs_sb_version_hasunicode(&mp->m_sb)) { - error = xfs_unicode_validate(name, namelen); + error = xfs_nls_to_unicode(mp, VNAME(dentry), VNAMELEN(dentry), + (const uchar_t **)&name, &namelen); if (error) return error; + } else { + name = VNAME(dentry); + namelen = VNAMELEN(dentry); } if (DM_EVENT_ENABLED(dp, DM_EVENT_REMOVE)) { @@ -2911,8 +2946,10 @@ xfs_rmdir( dp, DM_RIGHT_NULL, NULL, DM_RIGHT_NULL, name, NULL, cdp->i_d.di_mode, 0, 0); - if (error) + if (error) { + xfs_unicode_nls_free(VNAME(dentry), name); return XFS_ERROR(error); + } } /* @@ -3087,6 +3124,7 @@ xfs_rmdir( name, NULL, cdp->i_d.di_mode, error, 0); } + xfs_unicode_nls_free(VNAME(dentry), name); return error; error1: @@ -3130,7 +3168,7 @@ xfs_symlink( xfs_prid_t prid; struct xfs_dquot *udqp, *gdqp; uint resblks; - char *link_name = VNAME(dentry); + char *link_name; int link_namelen; *ipp = NULL; @@ -3142,14 +3180,6 @@ xfs_symlink( if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); - link_namelen = VNAMELEN(dentry); - - if (xfs_sb_version_hasunicode(&mp->m_sb)) { - error = xfs_unicode_validate(link_name, link_namelen); - if (error) - return error; - } - /* * Check component lengths of the target path name. */ @@ -3182,12 +3212,24 @@ xfs_symlink( } } + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_nls_to_unicode(mp, VNAME(dentry), VNAMELEN(dentry), + (const uchar_t **)&link_name, &link_namelen); + if (error) + return error; + } else { + link_name = VNAME(dentry); + link_namelen = VNAMELEN(dentry); + } + if (DM_EVENT_ENABLED(dp, DM_EVENT_SYMLINK)) { error = XFS_SEND_NAMESP(mp, DM_EVENT_SYMLINK, dp, DM_RIGHT_NULL, NULL, DM_RIGHT_NULL, link_name, target_path, 0, 0, 0); - if (error) + if (error) { + xfs_unicode_nls_free(VNAME(dentry), link_name); return error; + } } /* Return through std_return after this point. */ @@ -3395,6 +3437,7 @@ std_return: if (!error) *ipp = ip; + xfs_unicode_nls_free(VNAME(dentry), link_name); return error; error2: -- From owner-xfs@oss.sgi.com Tue Apr 1 23:26:39 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:27:04 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_43, J_CHICKENPOX_72 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326QXnX006082 for ; Tue, 1 Apr 2008 23:26:37 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23145; Wed, 2 Apr 2008 16:27:08 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 1161) id 57FE658C4C15; Wed, 2 Apr 2008 16:27:08 +1000 (EST) Message-Id: <20080402062708.071715758@chook.melbourne.sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 02 Apr 2008 16:25:10 +1000 From: Barry Naujok To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 2/7] XFS: ASCII case-insensitive support Content-Disposition: inline; filename=ascii_ci.patch X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15153 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs Implement ASCII case-insensitive support. It's primary purpose is for supporting existing filesystems that already use this case-insensitive mode migrated from IRIX. But, if you only need ASCII-only case-insensitive support (ie. English only) and will never use another language, then this mode is perfectly adequate. ASCII-CI is implemented by generating hashes based on lower-case letters and doing lower-case compares. It implements a new xfs_nameops vector for doing the hashes and comparisons for all filename operations. It also overrides the Linux dentry cache operations with its own hash and compare functions (the same as used in the xfs_nameops vector). To create a filesystem with this CI mode, use: # mkfs.xfs -n version=ci Signed-off-by: Barry Naujok --- fs/xfs/linux-2.6/xfs_iops.c | 46 +++++++++++++++++++++++++++++++++++++- fs/xfs/linux-2.6/xfs_linux.h | 1 fs/xfs/linux-2.6/xfs_super.c | 4 +++ fs/xfs/xfs_dir2.c | 52 ++++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_fs.h | 1 fs/xfs/linux-2.6/xfs_iops.c | 46 +++++++++++++++++++++++++++++++++++++- fs/xfs/linux-2.6/xfs_linux.h | 1 fs/xfs/linux-2.6/xfs_super.c | 4 +++ fs/xfs/xfs_dir2.c | 52 ++++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_fs.h | 1 fs/xfs/xfs_fsops.c | 4 ++- fs/xfs/xfs_sb.h | 10 +++++++- 7 files changed, 114 insertions(+), 4 deletions(-) Index: kern_ci/fs/xfs/linux-2.6/xfs_iops.c =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_iops.c +++ kern_ci/fs/xfs/linux-2.6/xfs_iops.c @@ -47,6 +47,7 @@ #include "xfs_buf_item.h" #include "xfs_utils.h" #include "xfs_vnodeops.h" +#include "xfs_da_btree.h" #include #include @@ -54,6 +55,8 @@ #include #include +struct dentry_operations xfs_ci_dentry_operations; + /* * Bring the atime in the XFS inode uptodate. * Used before logging the inode to disk or when the Linux inode goes away. @@ -372,10 +375,15 @@ xfs_vn_lookup( { struct xfs_inode *cip; int error; + struct xfs_mount *mp = XFS_I(dir)->i_mount; + struct dentry *result; if (dentry->d_name.len >= MAXNAMELEN) return ERR_PTR(-ENAMETOOLONG); + if (xfs_sb_version_hasoldci(&mp->m_sb)) + dentry->d_op = &xfs_ci_dentry_operations; + error = xfs_lookup(XFS_I(dir), dentry, &cip); if (unlikely(error)) { if (unlikely(error != ENOENT)) @@ -384,7 +392,10 @@ xfs_vn_lookup( return NULL; } - return d_splice_alias(cip->i_vnode, dentry); + result = d_splice_alias(cip->i_vnode, dentry); + if (result) + result->d_op = dentry->d_op; + return result; } STATIC int @@ -887,3 +898,36 @@ const struct inode_operations xfs_symlin .listxattr = xfs_vn_listxattr, .removexattr = xfs_vn_removexattr, }; + +STATIC int +xfs_ci_dentry_hash( + struct dentry *dir, + struct qstr *this) +{ + this->hash = xfs_dir_hashname(XFS_I(dir->d_inode), + this->name, this->len); + return 0; +} + +STATIC int +xfs_ci_dentry_compare( + struct dentry *dir, + struct qstr *a, + struct qstr *b) +{ + int result = xfs_dir_compname(XFS_I(dir->d_inode), a->name, a->len, + b->name, b->len) == XFS_CMP_DIFFERENT; + /* + * result == 0 if a match is found, and if so, copy the name in "b" + * to "a" to cope with negative dentries getting the correct name. + */ + if (result == 0) + memcpy((unsigned char *)a->name, b->name, a->len); + return result; +} + +struct dentry_operations xfs_ci_dentry_operations = +{ + .d_hash = xfs_ci_dentry_hash, + .d_compare = xfs_ci_dentry_compare, +}; Index: kern_ci/fs/xfs/linux-2.6/xfs_linux.h =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_linux.h +++ kern_ci/fs/xfs/linux-2.6/xfs_linux.h @@ -75,6 +75,7 @@ #include #include #include +#include #include #include Index: kern_ci/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_super.c +++ kern_ci/fs/xfs/linux-2.6/xfs_super.c @@ -67,6 +67,8 @@ static kmem_zone_t *xfs_vnode_zone; static kmem_zone_t *xfs_ioend_zone; mempool_t *xfs_ioend_pool; +extern struct dentry_operations xfs_ci_dentry_operations; + STATIC struct xfs_mount_args * xfs_args_allocate( struct super_block *sb, @@ -1359,6 +1361,8 @@ xfs_fs_fill_super( error = ENOMEM; goto fail_vnrele; } + if (xfs_sb_version_hasoldci(&mp->m_sb)) + sb->s_root->d_op = &xfs_ci_dentry_operations; mp->m_sync_work.w_syncer = xfs_sync_worker; mp->m_sync_work.w_mount = mp; Index: kern_ci/fs/xfs/xfs_dir2.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2.c +++ kern_ci/fs/xfs/xfs_dir2.c @@ -45,6 +45,55 @@ #include "xfs_vnodeops.h" +/* + * V1/OLDCI case-insensitive support for directories + * + * This is ASCII only case support, ie. A-Z. + */ +static xfs_dahash_t +xfs_ascii_ci_hashname( + const uchar_t *name, + int namelen) +{ + xfs_dahash_t hash; + int i; + + for (i = 0, hash = 0; i < namelen; i++) + hash = tolower(name[i]) ^ rol32(hash, 7); + + return hash; +} + +static xfs_dacmp_t +xfs_ascii_ci_compname( + const uchar_t *name1, + int len1, + const uchar_t *name2, + int len2) +{ + xfs_dacmp_t result; + int i; + + if (len1 != len2) + return XFS_CMP_DIFFERENT; + + result = XFS_CMP_EXACT; + for (i = 0; i < len1; i++) { + if (name1[i] == name2[i]) + continue; + if (tolower(name1[i]) != tolower(name2[i])) + return XFS_CMP_DIFFERENT; + result = XFS_CMP_CASE; + } + + return result; +} + +static struct xfs_nameops xfs_ascii_ci_nameops = { + .hashname = xfs_ascii_ci_hashname, + .compname = xfs_ascii_ci_compname, +}; + void xfs_dir_mount( xfs_mount_t *mp) @@ -64,7 +113,8 @@ xfs_dir_mount( (mp->m_dirblksize - (uint)sizeof(xfs_da_node_hdr_t)) / (uint)sizeof(xfs_da_node_entry_t); mp->m_dir_magicpct = (mp->m_dirblksize * 37) / 100; - mp->m_dirnameops = &xfs_default_nameops; + mp->m_dirnameops = xfs_sb_version_hasoldci(&mp->m_sb) ? + &xfs_ascii_ci_nameops : &xfs_default_nameops; } /* Index: kern_ci/fs/xfs/xfs_fs.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_fs.h +++ kern_ci/fs/xfs/xfs_fs.h @@ -239,6 +239,7 @@ typedef struct xfs_fsop_resblks { #define XFS_FSOP_GEOM_FLAGS_LOGV2 0x0100 /* log format version 2 */ #define XFS_FSOP_GEOM_FLAGS_SECTOR 0x0200 /* sector sizes >1BB */ #define XFS_FSOP_GEOM_FLAGS_ATTR2 0x0400 /* inline attributes rework */ +#define XFS_FSOP_GEOM_FLAGS_DIRV2CI 0x1000 /* ASCII only CI names */ #define XFS_FSOP_GEOM_FLAGS_LAZYSB 0x4000 /* lazy superblock counters */ Index: kern_ci/fs/xfs/xfs_fsops.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_fsops.c +++ kern_ci/fs/xfs/xfs_fsops.c @@ -95,6 +95,8 @@ xfs_fs_geometry( XFS_FSOP_GEOM_FLAGS_DIRV2 : 0) | (xfs_sb_version_hassector(&mp->m_sb) ? XFS_FSOP_GEOM_FLAGS_SECTOR : 0) | + (xfs_sb_version_hasoldci(&mp->m_sb) ? + XFS_FSOP_GEOM_FLAGS_DIRV2CI : 0) | (xfs_sb_version_haslazysbcount(&mp->m_sb) ? XFS_FSOP_GEOM_FLAGS_LAZYSB : 0) | (xfs_sb_version_hasattr2(&mp->m_sb) ? @@ -629,7 +631,7 @@ xfs_fs_goingdown( xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT); thaw_bdev(sb->s_bdev, sb); } - + break; } case XFS_FSOP_GOING_FLAGS_LOGFLUSH: Index: kern_ci/fs/xfs/xfs_sb.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_sb.h +++ kern_ci/fs/xfs/xfs_sb.h @@ -46,10 +46,12 @@ struct xfs_mount; #define XFS_SB_VERSION_SECTORBIT 0x0800 #define XFS_SB_VERSION_EXTFLGBIT 0x1000 #define XFS_SB_VERSION_DIRV2BIT 0x2000 +#define XFS_SB_VERSION_OLDCIBIT 0x4000 /* ASCII only case-insens. */ #define XFS_SB_VERSION_MOREBITSBIT 0x8000 #define XFS_SB_VERSION_OKSASHFBITS \ (XFS_SB_VERSION_EXTFLGBIT | \ - XFS_SB_VERSION_DIRV2BIT) + XFS_SB_VERSION_DIRV2BIT | \ + XFS_SB_VERSION_OLDCIBIT) #define XFS_SB_VERSION_OKREALFBITS \ (XFS_SB_VERSION_ATTRBIT | \ XFS_SB_VERSION_NLINKBIT | \ @@ -436,6 +438,12 @@ static inline int xfs_sb_version_hassect ((sbp)->sb_versionnum & XFS_SB_VERSION_SECTORBIT); } +static inline int xfs_sb_version_hasoldci(xfs_sb_t *sbp) +{ + return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_4) && \ + ((sbp)->sb_versionnum & XFS_SB_VERSION_OLDCIBIT); +} + static inline int xfs_sb_version_hasmorebits(xfs_sb_t *sbp) { return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_4) && \ -- From owner-xfs@oss.sgi.com Tue Apr 1 23:26:41 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:26:59 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326QYRN006090 for ; Tue, 1 Apr 2008 23:26:39 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23157; Wed, 2 Apr 2008 16:27:10 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 1161) id D3A4058C4C19; Wed, 2 Apr 2008 16:27:09 +1000 (EST) Message-Id: <20080402062709.577869936@chook.melbourne.sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 02 Apr 2008 16:25:15 +1000 From: Barry Naujok To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 7/7] XFS: NLS config option Content-Disposition: inline; filename=config_nls.patch X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15152 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs This optional patch implements the NLS support as a CONFIG option. Signed-off-by: Barry Naujok --- fs/xfs/Kconfig | 10 ++++++++++ fs/xfs/xfs_unicode.c | 4 ++++ fs/xfs/xfs_unicode.h | 17 ++++++++++++++++- 3 files changed, 30 insertions(+), 1 deletion(-) Index: kern_ci/fs/xfs/Kconfig =================================================================== --- kern_ci.orig/fs/xfs/Kconfig +++ kern_ci/fs/xfs/Kconfig @@ -87,6 +87,16 @@ config XFS_UNICODE If you don't require UTF-8 enforcement, say N. +config XFS_UNICODE_NLS + bool "XFS NLS Unicode support + depends on XFS_UNICODE + help + NLS (Native Language Support) allows non-UTF8 locales to + interact with XFS Unicode support. To specify the character + set being used, use the "-n nls=" mount option. + + If you don't require NLS conversion in XFS, say N. + config XFS_RT bool "XFS Realtime subvolume support" depends on XFS_FS Index: kern_ci/fs/xfs/xfs_unicode.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_unicode.c +++ kern_ci/fs/xfs/xfs_unicode.c @@ -498,6 +498,8 @@ xfs_unicode_uninit(void) mutex_destroy(&cft_lock); } +#ifdef CONFIG_XFS_UNICODE_NLS + /* * Convert UTF-8 (Unicode) string into the specified character set in "nls". * If no NLS conversion is required (mp->m_nls = NULL), the pointers are @@ -634,3 +636,5 @@ xfs_unicode_nls_free( if (src_name != conv_name) xfs_da_name_free((uchar_t *)conv_name); } + +#endif /* CONFIG_XFS_UNICODE_NLS */ Index: kern_ci/fs/xfs/xfs_unicode.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_unicode.h +++ kern_ci/fs/xfs/xfs_unicode.h @@ -65,6 +65,8 @@ int xfs_unicode_validate(const uchar_t * int xfs_unicode_read_cft(struct xfs_mount *mp); void xfs_unicode_free_cft(const xfs_cft_t *cft); +#ifdef CONFIG_XFS_UNICODE_NLS + #define xfs_is_using_nls(mp) ((mp)->m_nls != NULL) int xfs_unicode_to_nls(struct xfs_mount *mp, const uchar_t *uni_name, @@ -73,7 +75,20 @@ int xfs_nls_to_unicode(struct xfs_mount int nls_namelen, const uchar_t **uni_name, int *uni_namelen); void xfs_unicode_nls_free(const uchar_t *src_name, const uchar_t *conv_name); -#else +#else /* CONFIG_XFS_UNICODE_NLS */ + +#define xfs_is_using_nls(mp) 0 + +#define xfs_unicode_to_nls(mp, uname, ulen, pnname, pnlen) \ + ((*(pnname)) = (uname), (*(pnlen)) = (ulen), 0) +#define xfs_nls_to_unicode(mp, nname, nlen, puname, pulen) \ + ((*(puname)) = (nname), (*(pulen)) = (nlen), \ + xfs_unicode_validate(nname, nlen)) +#define xfs_unicode_nls_free(sname, cname) + +#endif /* CONFIG_XFS_UNICODE_NLS */ + +#else /* CONFIG_XFS_UNICODE */ #define xfs_unicode_nameops xfs_default_nameops #define xfs_unicode_ci_nameops xfs_default_nameops -- From owner-xfs@oss.sgi.com Tue Apr 1 23:26:46 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:27:24 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42, J_CHICKENPOX_43,J_CHICKENPOX_45,J_CHICKENPOX_47,J_CHICKENPOX_48 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326QYao006084 for ; Tue, 1 Apr 2008 23:26:36 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23149; Wed, 2 Apr 2008 16:27:09 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 1161) id F1D1458C4C17; Wed, 2 Apr 2008 16:27:08 +1000 (EST) Message-Id: <20080402062708.654277049@chook.melbourne.sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 02 Apr 2008 16:25:12 +1000 From: Barry Naujok To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 4/7] XFS: Return case-insensitive match for dentry cache Content-Disposition: inline; filename=return_name.patch X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15156 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs This implements the code to store the actual filename found during a lookup in the dentry cache and to avoid multiple entries in the dcache pointing to the same inode. It also introduces a new type, xfs_name, which is similar to the dentry cache's qstr type. It contains a pointer to a zone allocated string (MAXNAMELEN sized) and the length of the actual name. This string does not need to be NULL terminated (a counted string). xfs_name_t is only used in the lookup path for this patch, but may be used in other locations too if desired. It maybe desirable not to use xfs_name_t at all in the lookup functions but stick to separate parameters (which will mean 7 instead of 5 arguments). To avoid polluting the dcache, we implement a new directory inode operations for lookup. xfs_vn_ci_lookup() interacts directly with the dcache and the code was derived from ntfs_lookup() in fs/ntfs/namei.c. The dentry hash and compare overrides introduced in the ASCII-CI patch has been removed. The "actual name" is only allocated and returned for a case- insensitive match and not an actual match. Signed-off-by: Barry Naujok --- fs/xfs/linux-2.6/xfs_export.c | 2 fs/xfs/linux-2.6/xfs_iops.c | 165 +++++++++++++++++++++++++++++++----------- fs/xfs/linux-2.6/xfs_iops.h | 1 fs/xfs/linux-2.6/xfs_super.c | 5 + fs/xfs/linux-2.6/xfs_vnode.h | 1 fs/xfs/xfs_da_btree.c | 16 ++++ fs/xfs/xfs_da_btree.h | 13 +++ fs/xfs/xfs_dir2.c | 28 +++++-- fs/xfs/xfs_dir2.h | 4 - fs/xfs/xfs_dir2_block.c | 9 ++ fs/xfs/xfs_dir2_leaf.c | 9 ++ fs/xfs/xfs_dir2_node.c | 20 ++++- fs/xfs/xfs_dir2_sf.c | 13 +++ fs/xfs/xfs_rename.c | 5 + fs/xfs/xfs_utils.c | 12 ++- fs/xfs/xfs_utils.h | 6 + fs/xfs/xfs_vfsops.c | 2 fs/xfs/xfs_vnodeops.c | 15 +++ fs/xfs/xfs_vnodeops.h | 4 - 19 files changed, 264 insertions(+), 66 deletions(-) Index: kern_ci/fs/xfs/linux-2.6/xfs_export.c =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_export.c +++ kern_ci/fs/xfs/linux-2.6/xfs_export.c @@ -216,7 +216,7 @@ xfs_fs_get_parent( struct xfs_inode *cip; struct dentry *parent; - error = xfs_lookup(XFS_I(child->d_inode), &dotdot, &cip); + error = xfs_lookup(XFS_I(child->d_inode), &dotdot.d_name, &cip, NULL); if (unlikely(error)) return ERR_PTR(-error); Index: kern_ci/fs/xfs/linux-2.6/xfs_iops.c =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_iops.c +++ kern_ci/fs/xfs/linux-2.6/xfs_iops.c @@ -375,27 +375,125 @@ xfs_vn_lookup( { struct xfs_inode *cip; int error; - struct xfs_mount *mp = XFS_I(dir)->i_mount; + + if (dentry->d_name.len >= MAXNAMELEN) + return ERR_PTR(-ENAMETOOLONG); + + error = xfs_lookup(XFS_I(dir), &dentry->d_name, &cip, NULL); + if (unlikely(error)) { + if (unlikely(error != ENOENT)) + return ERR_PTR(-error); + d_add(dentry, NULL); + return NULL; + } + + return d_splice_alias(cip->i_vnode, dentry); +} + +STATIC struct dentry * +xfs_vn_ci_lookup( + struct inode *dir, + struct dentry *dentry, + struct nameidata *nd) +{ + struct xfs_inode *cip; + int error; struct dentry *result; + struct qstr ci_name = {0, 0, NULL}; + struct inode *inode; if (dentry->d_name.len >= MAXNAMELEN) return ERR_PTR(-ENAMETOOLONG); - if (xfs_sb_version_hasoldci(&mp->m_sb)) - dentry->d_op = &xfs_ci_dentry_operations; + error = xfs_lookup(XFS_I(dir), &dentry->d_name, &cip, &ci_name); - error = xfs_lookup(XFS_I(dir), dentry, &cip); if (unlikely(error)) { if (unlikely(error != ENOENT)) return ERR_PTR(-error); d_add(dentry, NULL); return NULL; } + inode = cip->i_vnode; + + /* if exact match, just splice and exit */ + if (!ci_name.name) { + result = d_splice_alias(inode, dentry); + return result; + } - result = d_splice_alias(cip->i_vnode, dentry); - if (result) - result->d_op = dentry->d_op; - return result; + /* + * case-insensitive match, create a dentry to return and fill it + * in with the correctly cased name. Parameter "dentry" is not + * used anymore and the caller will free it. + * Derived from fs/ntfs/namei.c + */ + + ci_name.hash = full_name_hash(ci_name.name, ci_name.len); + + /* Does an existing dentry match? */ + result = d_lookup(dentry->d_parent, &ci_name); + if (!result) { + /* if not, create one */ + result = d_alloc(dentry->d_parent, &ci_name); + xfs_da_name_free((char *)ci_name.name); + if (!result) + return ERR_PTR(-ENOMEM); + dentry = d_splice_alias(inode, result); + if (dentry) { + dput(result); + return dentry; + } + return result; + } + xfs_da_name_free((char *)ci_name.name); + + /* an existing dentry matches, use it */ + + if (result->d_inode) { + /* + * already an inode attached, deref the inode that was + * refcounted with xfs_lookup and return the dentry. + */ + if (unlikely(result->d_inode != inode)) { + /* This can happen because bad inodes are unhashed. */ + BUG_ON(!is_bad_inode(inode)); + BUG_ON(!is_bad_inode(result->d_inode)); + } + iput(inode); + return result; + } + + if (!S_ISDIR(inode->i_mode)) { + /* not a directory, easy to handle */ + d_instantiate(result, inode); + return result; + } + + spin_lock(&dcache_lock); + if (list_empty(&inode->i_dentry)) { + /* + * Directory without a 'disconnected' dentry; we need to do + * d_instantiate() by hand because it takes dcache_lock which + * we already hold. + */ + list_add(&result->d_alias, &inode->i_dentry); + result->d_inode = inode; + spin_unlock(&dcache_lock); + security_d_instantiate(result, inode); + return result; + } + /* + * Directory with a 'disconnected' dentry; get a reference to the + * 'disconnected' dentry. + */ + dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias); + dget_locked(dentry); + spin_unlock(&dcache_lock); + security_d_instantiate(result, inode); + d_move(dentry, result); + iput(inode); + dput(result); + return dentry; } STATIC int @@ -886,6 +984,25 @@ const struct inode_operations xfs_dir_in .removexattr = xfs_vn_removexattr, }; +const struct inode_operations xfs_dir_ci_inode_operations = { + .create = xfs_vn_create, + .lookup = xfs_vn_ci_lookup, + .link = xfs_vn_link, + .unlink = xfs_vn_unlink, + .symlink = xfs_vn_symlink, + .mkdir = xfs_vn_mkdir, + .rmdir = xfs_vn_rmdir, + .mknod = xfs_vn_mknod, + .rename = xfs_vn_rename, + .permission = xfs_vn_permission, + .getattr = xfs_vn_getattr, + .setattr = xfs_vn_setattr, + .setxattr = xfs_vn_setxattr, + .getxattr = xfs_vn_getxattr, + .listxattr = xfs_vn_listxattr, + .removexattr = xfs_vn_removexattr, +}; + const struct inode_operations xfs_symlink_inode_operations = { .readlink = generic_readlink, .follow_link = xfs_vn_follow_link, @@ -899,35 +1016,3 @@ const struct inode_operations xfs_symlin .removexattr = xfs_vn_removexattr, }; -STATIC int -xfs_ci_dentry_hash( - struct dentry *dir, - struct qstr *this) -{ - this->hash = xfs_dir_hashname(XFS_I(dir->d_inode), - this->name, this->len); - return 0; -} - -STATIC int -xfs_ci_dentry_compare( - struct dentry *dir, - struct qstr *a, - struct qstr *b) -{ - int result = xfs_dir_compname(XFS_I(dir->d_inode), a->name, a->len, - b->name, b->len) == XFS_CMP_DIFFERENT; - /* - * result == 0 if a match is found, and if so, copy the name in "b" - * to "a" to cope with negative dentries getting the correct name. - */ - if (result == 0) - memcpy((unsigned char *)a->name, b->name, a->len); - return result; -} - -struct dentry_operations xfs_ci_dentry_operations = -{ - .d_hash = xfs_ci_dentry_hash, - .d_compare = xfs_ci_dentry_compare, -}; Index: kern_ci/fs/xfs/linux-2.6/xfs_iops.h =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_iops.h +++ kern_ci/fs/xfs/linux-2.6/xfs_iops.h @@ -20,6 +20,7 @@ extern const struct inode_operations xfs_inode_operations; extern const struct inode_operations xfs_dir_inode_operations; +extern const struct inode_operations xfs_dir_ci_inode_operations; extern const struct inode_operations xfs_symlink_inode_operations; extern const struct file_operations xfs_file_operations; Index: kern_ci/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_super.c +++ kern_ci/fs/xfs/linux-2.6/xfs_super.c @@ -566,7 +566,10 @@ xfs_set_inodeops( inode->i_mapping->a_ops = &xfs_address_space_operations; break; case S_IFDIR: - inode->i_op = &xfs_dir_inode_operations; + inode->i_op = + xfs_sb_version_hasoldci(&XFS_I(inode)->i_mount->m_sb) ? + &xfs_dir_ci_inode_operations : + &xfs_dir_inode_operations; inode->i_fop = &xfs_dir_file_operations; break; case S_IFLNK: Index: kern_ci/fs/xfs/linux-2.6/xfs_vnode.h =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_vnode.h +++ kern_ci/fs/xfs/linux-2.6/xfs_vnode.h @@ -26,6 +26,7 @@ struct attrlist_cursor_kern; typedef struct dentry bhv_vname_t; typedef __u64 bhv_vnumber_t; typedef struct inode bhv_vnode_t; +typedef struct qstr bhv_vstr_t; #define VN_ISLNK(vp) S_ISLNK((vp)->i_mode) #define VN_ISREG(vp) S_ISREG((vp)->i_mode) Index: kern_ci/fs/xfs/xfs_da_btree.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_da_btree.c +++ kern_ci/fs/xfs/xfs_da_btree.c @@ -2176,6 +2176,22 @@ xfs_da_reada_buf( return rval; } + +kmem_zone_t *xfs_da_name_zone; + +uchar_t * +xfs_da_name_alloc(void) +{ + return kmem_zone_zalloc(xfs_da_name_zone, KM_SLEEP); +} + +void +xfs_da_name_free(const uchar_t *name) +{ + kmem_zone_free(xfs_da_name_zone, (void *)name); +} + + kmem_zone_t *xfs_da_state_zone; /* anchor for state struct zone */ kmem_zone_t *xfs_dabuf_zone; /* dabuf zone */ Index: kern_ci/fs/xfs/xfs_da_btree.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_da_btree.h +++ kern_ci/fs/xfs/xfs_da_btree.h @@ -224,6 +224,14 @@ typedef struct xfs_nameops { xfs_compname_t compname; } xfs_nameops_t; +/* + * Counted string for names, *name should be allocated and freed with + * xfs_da_name_alloc and xfs_da_name_free. len must not exceed MAXNAMELEN. + */ +typedef struct xfs_name { + const uchar_t *name; + int len; +} xfs_name_t; #ifdef __KERNEL__ /*======================================================================== @@ -277,6 +285,11 @@ uint xfs_da_hashname(const uchar_t *name xfs_dacmp_t xfs_da_compname(const uchar_t *name1, int len1, const uchar_t *name2, int len2); +/* returns/frees a MAXNAMELEN buffer from a zone */ +extern struct kmem_zone *xfs_da_name_zone; +uchar_t *xfs_da_name_alloc(void); +void xfs_da_name_free(const uchar_t *name); + xfs_da_state_t *xfs_da_state_alloc(void); void xfs_da_state_free(xfs_da_state_t *state); Index: kern_ci/fs/xfs/xfs_dir2.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2.c +++ kern_ci/fs/xfs/xfs_dir2.c @@ -242,15 +242,16 @@ xfs_dir_createname( } /* - * Lookup a name in a directory, give back the inode number. + * Lookup a name in a directory, give back the inode number and also + * the actual name if a case-insensitive match. */ int xfs_dir_lookup( xfs_trans_t *tp, xfs_inode_t *dp, - char *name, - int namelen, - xfs_ino_t *inum) /* out: inode number */ + xfs_name_t *name, + xfs_ino_t *inum, /* out: inode number */ + xfs_name_t *ci_name) /* out: actual name if different */ { xfs_da_args_t args; int rval; @@ -259,9 +260,9 @@ xfs_dir_lookup( ASSERT((dp->i_d.di_mode & S_IFMT) == S_IFDIR); XFS_STATS_INC(xs_dir_lookup); - args.name = name; - args.namelen = namelen; - args.hashval = xfs_dir_hashname(dp, name, namelen); + args.name = name->name; + args.namelen = name->len; + args.hashval = xfs_dir_hashname(dp, name->name, name->len); args.inumber = 0; args.dp = dp; args.firstblock = NULL; @@ -272,6 +273,8 @@ xfs_dir_lookup( args.justcheck = args.addname = 0; args.oknoent = 1; args.cmpresult = XFS_CMP_DIFFERENT; + args.value = NULL; + args.valuelen = 0; if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL) rval = xfs_dir2_sf_lookup(&args); @@ -287,8 +290,17 @@ xfs_dir_lookup( rval = xfs_dir2_node_lookup(&args); if (rval == EEXIST) rval = 0; - if (rval == 0) + if (rval == 0) { *inum = args.inumber; + if (args.value) { + ASSERT(args->cmpresult == XFS_CMP_CASE); + if (ci_name) { + ci_name->name = args.value; + ci_name->len = args.valuelen; + } else + xfs_da_name_free(args.value); + } + } return rval; } Index: kern_ci/fs/xfs/xfs_dir2.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2.h +++ kern_ci/fs/xfs/xfs_dir2.h @@ -26,6 +26,7 @@ struct xfs_bmap_free; struct xfs_inode; struct xfs_mount; struct xfs_trans; +struct xfs_name; /* * Directory version 2. @@ -72,7 +73,8 @@ extern int xfs_dir_createname(struct xfs xfs_fsblock_t *first, struct xfs_bmap_free *flist, xfs_extlen_t tot); extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp, - char *name, int namelen, xfs_ino_t *inum); + struct xfs_name *name, xfs_ino_t *inum, + struct xfs_name *ci_name); extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp, char *name, int namelen, xfs_ino_t ino, xfs_fsblock_t *first, Index: kern_ci/fs/xfs/xfs_dir2_block.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_block.c +++ kern_ci/fs/xfs/xfs_dir2_block.c @@ -616,6 +616,15 @@ xfs_dir2_block_lookup( * Fill in inode number, release the block. */ args->inumber = be64_to_cpu(dep->inumber); + /* + * If a case-insensitive match, allocate a buffer and copy the actual + * name into the buffer. Return it via args->value. + */ + if (args->cmpresult == XFS_CMP_CASE) { + args->value = xfs_da_name_alloc(); + memcpy(args->value, dep->name, dep->namelen); + args->valuelen = dep->namelen; + } xfs_da_brelse(args->trans, bp); return XFS_ERROR(EEXIST); } Index: kern_ci/fs/xfs/xfs_dir2_leaf.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_leaf.c +++ kern_ci/fs/xfs/xfs_dir2_leaf.c @@ -1301,6 +1301,15 @@ xfs_dir2_leaf_lookup( * Return the found inode number. */ args->inumber = be64_to_cpu(dep->inumber); + /* + * If a case-insensitive match, allocate a buffer and copy the actual + * name into the buffer. Return it via args->value. + */ + if (args->cmpresult == XFS_CMP_CASE) { + args->value = xfs_da_name_alloc(); + memcpy(args->value, dep->name, dep->namelen); + args->valuelen = dep->namelen; + } xfs_da_brelse(tp, dbp); xfs_da_brelse(tp, lbp); return XFS_ERROR(EEXIST); Index: kern_ci/fs/xfs/xfs_dir2_node.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_node.c +++ kern_ci/fs/xfs/xfs_dir2_node.c @@ -643,6 +643,8 @@ xfs_dir2_leafn_lookup_for_entry( xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address))); /* * Compare the entry, return it if it matches. + * "oknoent" is set for lookup and clear for + * remove and replace. */ cmp = args->oknoent ? xfs_dir_compname(dp, dep->name, dep->namelen, @@ -1857,10 +1859,22 @@ xfs_dir2_node_lookup( if (error) rval = error; /* - * If case-insensitive match was found in a leaf, return EEXIST. - */ - else if (rval == ENOENT && args->cmpresult == XFS_CMP_CASE) + * If case-insensitive match was found (xfs_dir2_leafn_lookup_int + * returns ENOENT for a case-insensitive match, but sets + * args->cmpresult to XFS_CMP_CASE): + * - Allocate a buffer and copy the actual name into the buffer and + * return it via args->value. + * - set rval to EEXIST + */ + else if (rval == ENOENT && args->cmpresult == XFS_CMP_CASE) { + xfs_dir2_data_entry_t *dep = (xfs_dir2_data_entry_t *) + ((char *)state->extrablk.bp->data + + state->extrablk.index); + args->value = xfs_da_name_alloc(); + memcpy(args->value, dep->name, dep->namelen); + args->valuelen = dep->namelen; rval = EEXIST; + } /* * Release the btree blocks and leaf block. */ Index: kern_ci/fs/xfs/xfs_dir2_sf.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_sf.c +++ kern_ci/fs/xfs/xfs_dir2_sf.c @@ -815,6 +815,7 @@ xfs_dir2_sf_lookup( xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */ xfs_dir2_sf_t *sfp; /* shortform structure */ xfs_dacmp_t cmp; /* comparison result */ + xfs_dir2_sf_entry_t *ci_sfep; /* case-insens. entry */ xfs_dir2_trace_args("sf_lookup", args); xfs_dir2_sf_check(args); @@ -852,6 +853,7 @@ xfs_dir2_sf_lookup( /* * Loop over all the entries trying to match ours. */ + ci_sfep = NULL; for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); i < sfp->hdr.count; i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) { @@ -864,10 +866,19 @@ xfs_dir2_sf_lookup( xfs_dir2_sf_inumberp(sfep)); if (cmp == XFS_CMP_EXACT) return XFS_ERROR(EEXIST); + ci_sfep = sfep; } } - if (args->cmpresult == XFS_CMP_CASE) + if (args->cmpresult == XFS_CMP_CASE) { + /* + * If a case-insensitive match, allocate a buffer and copy the + * actual name into the buffer and return it via args->value. + */ + args->value = xfs_da_name_alloc(); + memcpy(args->value, ci_sfep->name, ci_sfep->namelen); + args->valuelen = ci_sfep->namelen; return XFS_ERROR(EEXIST); + } /* * Didn't find it. */ Index: kern_ci/fs/xfs/xfs_rename.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_rename.c +++ kern_ci/fs/xfs/xfs_rename.c @@ -100,6 +100,7 @@ xfs_lock_for_rename( int i, j; uint lock_mode; int diff_dirs = (dp1 != dp2); + xfs_name_t name2; ip2 = NULL; @@ -125,7 +126,9 @@ xfs_lock_for_rename( lock_mode = xfs_ilock_map_shared(dp2); } - error = xfs_dir_lookup_int(dp2, lock_mode, vname2, &inum2, &ip2); + name2.name = VNAME(vname2); + name2.len = VNAMELEN(vname2); + error = xfs_dir_lookup_int(dp2, lock_mode, &name2, &inum2, &ip2, NULL); if (error == ENOENT) { /* target does not need to exist. */ inum2 = 0; } else if (error) { Index: kern_ci/fs/xfs/xfs_utils.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_utils.c +++ kern_ci/fs/xfs/xfs_utils.c @@ -24,6 +24,7 @@ #include "xfs_trans.h" #include "xfs_sb.h" #include "xfs_ag.h" +#include "xfs_da_btree.h" #include "xfs_dir2.h" #include "xfs_dmapi.h" #include "xfs_mount.h" @@ -45,15 +46,16 @@ int xfs_dir_lookup_int( xfs_inode_t *dp, uint lock_mode, - bhv_vname_t *dentry, + xfs_name_t *name, xfs_ino_t *inum, - xfs_inode_t **ipp) + xfs_inode_t **ipp, + xfs_name_t *ci_name) { int error; xfs_itrace_entry(dp); - error = xfs_dir_lookup(NULL, dp, VNAME(dentry), VNAMELEN(dentry), inum); + error = xfs_dir_lookup(NULL, dp, name, inum, ci_name); if (!error) { /* * Unlock the directory. We do this because we can't @@ -80,6 +82,10 @@ xfs_dir_lookup_int( xfs_ilock(dp, lock_mode); error = XFS_ERROR(ENOENT); } + if (error && ci_name && ci_name->name) { + xfs_da_name_free(ci_name->name); + ci_name->name = NULL; + } } return error; } Index: kern_ci/fs/xfs/xfs_utils.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_utils.h +++ kern_ci/fs/xfs/xfs_utils.h @@ -18,11 +18,13 @@ #ifndef __XFS_UTILS_H__ #define __XFS_UTILS_H__ +struct xfs_name; + #define IRELE(ip) VN_RELE(XFS_ITOV(ip)) #define IHOLD(ip) VN_HOLD(XFS_ITOV(ip)) -extern int xfs_dir_lookup_int (xfs_inode_t *, uint, bhv_vname_t *, xfs_ino_t *, - xfs_inode_t **); +extern int xfs_dir_lookup_int (xfs_inode_t *, uint, struct xfs_name *, + xfs_ino_t *, xfs_inode_t **, struct xfs_name *); extern int xfs_truncate_file (xfs_mount_t *, xfs_inode_t *); extern int xfs_dir_ialloc (xfs_trans_t **, xfs_inode_t *, mode_t, xfs_nlink_t, xfs_dev_t, cred_t *, prid_t, int, Index: kern_ci/fs/xfs/xfs_vfsops.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_vfsops.c +++ kern_ci/fs/xfs/xfs_vfsops.c @@ -74,6 +74,7 @@ xfs_init(void) xfs_btree_cur_zone = kmem_zone_init(sizeof(xfs_btree_cur_t), "xfs_btree_cur"); xfs_trans_zone = kmem_zone_init(sizeof(xfs_trans_t), "xfs_trans"); + xfs_da_name_zone = kmem_zone_init(MAXNAMELEN, "xfs_da_name"); xfs_da_state_zone = kmem_zone_init(sizeof(xfs_da_state_t), "xfs_da_state"); xfs_dabuf_zone = kmem_zone_init(sizeof(xfs_dabuf_t), "xfs_dabuf"); @@ -177,6 +178,7 @@ xfs_cleanup(void) kmem_zone_destroy(xfs_btree_cur_zone); kmem_zone_destroy(xfs_inode_zone); kmem_zone_destroy(xfs_trans_zone); + kmem_zone_destroy(xfs_da_name_zone); kmem_zone_destroy(xfs_da_state_zone); kmem_zone_destroy(xfs_dabuf_zone); kmem_zone_destroy(xfs_buf_item_zone); Index: kern_ci/fs/xfs/xfs_vnodeops.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_vnodeops.c +++ kern_ci/fs/xfs/xfs_vnodeops.c @@ -1762,24 +1762,33 @@ xfs_inactive( int xfs_lookup( xfs_inode_t *dp, - bhv_vname_t *dentry, - xfs_inode_t **ipp) + bhv_vstr_t *d_name, + xfs_inode_t **ipp, + bhv_vstr_t *ci_name) { xfs_inode_t *ip; xfs_ino_t e_inum; int error; uint lock_mode; + xfs_name_t name, rname; xfs_itrace_entry(dp); if (XFS_FORCED_SHUTDOWN(dp->i_mount)) return XFS_ERROR(EIO); + name.name = (uchar_t *)d_name->name; + name.len = d_name->len; + rname.name = NULL; lock_mode = xfs_ilock_map_shared(dp); - error = xfs_dir_lookup_int(dp, lock_mode, dentry, &e_inum, &ip); + error = xfs_dir_lookup_int(dp, lock_mode, &name, &e_inum, &ip, &rname); if (!error) { *ipp = ip; xfs_itrace_ref(ip); + if (rname.name) { + ci_name->name = rname.name; + ci_name->len = rname.len; + } } xfs_iunlock_map_shared(dp, lock_mode); return error; Index: kern_ci/fs/xfs/xfs_vnodeops.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_vnodeops.h +++ kern_ci/fs/xfs/xfs_vnodeops.h @@ -23,8 +23,8 @@ int xfs_fsync(struct xfs_inode *ip, int xfs_off_t stop); int xfs_release(struct xfs_inode *ip); int xfs_inactive(struct xfs_inode *ip); -int xfs_lookup(struct xfs_inode *dp, bhv_vname_t *dentry, - struct xfs_inode **ipp); +int xfs_lookup(struct xfs_inode *dp, bhv_vstr_t *d_name, + struct xfs_inode **ipp, bhv_vstr_t *ci_name); int xfs_create(struct xfs_inode *dp, bhv_vname_t *dentry, mode_t mode, xfs_dev_t rdev, struct xfs_inode **ipp, struct cred *credp); int xfs_remove(struct xfs_inode *dp, bhv_vname_t *dentry); -- From owner-xfs@oss.sgi.com Tue Apr 1 23:26:53 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:27:24 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42, J_CHICKENPOX_45,J_CHICKENPOX_47,J_CHICKENPOX_63 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326QYnT006085 for ; Tue, 1 Apr 2008 23:26:36 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23143; Wed, 2 Apr 2008 16:27:08 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 1161) id 0D41058C4C11; Wed, 2 Apr 2008 16:27:08 +1000 (EST) Message-Id: <20080402062707.797672682@chook.melbourne.sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 02 Apr 2008 16:25:09 +1000 From: Barry Naujok To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 1/7] XFS: Name operation vector for hash and compare Content-Disposition: inline; filename=nameops.patch X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15157 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs Adds two pieces of functionality for the basis of case-insensitive support in XFS: 1. A comparison result enumerated type: xfs_dacmp_t. It represents an exact match, case-insensitive match or no match at all. This patch only implements different and exact results. 2. xfs_nameops vector for specifying how to perform the hash generation of filenames and comparision methods. In this patch the hash vector points to the existing xfs_da_hashname function and the comparison method does a length compare, and if the same, does a memcmp and return the xfs_dacmp_t result. All filename functions that use the hash (create, lookup remove, rename, etc) now use the xfs_nameops.hashname function and all directory lookup functions also use the xfs_nameops.compname function. The lookup functions also handle case-insensitive results even though the default comparison function cannot return that. And important aspect of the lookup functions is that an exact match always has precedence over a case-insensitive. So while a case-insensitive match is found, we have to keep looking just in case there is an exact match. In the meantime, the info for the first case-insensitive match is retained if no exact match is found. Signed-off-by: Barry Naujok --- fs/xfs/xfs_da_btree.c | 12 ++++++++++++ fs/xfs/xfs_da_btree.h | 28 ++++++++++++++++++++++++++++ fs/xfs/xfs_dir2.c | 12 +++++++----- fs/xfs/xfs_dir2.h | 6 ++++++ fs/xfs/xfs_dir2_block.c | 29 ++++++++++++++++++++++------- fs/xfs/xfs_dir2_data.c | 3 ++- fs/xfs/xfs_dir2_leaf.c | 47 ++++++++++++++++++++++++++++++++++++++++------- fs/xfs/xfs_dir2_node.c | 45 +++++++++++++++++++++++++++++++-------------- fs/xfs/xfs_dir2_sf.c | 26 ++++++++++++++++---------- fs/xfs/xfs_mount.h | 2 ++ 10 files changed, 166 insertions(+), 44 deletions(-) Index: kern_ci/fs/xfs/xfs_da_btree.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_da_btree.c +++ kern_ci/fs/xfs/xfs_da_btree.c @@ -1530,6 +1530,18 @@ xfs_da_hashname(const uchar_t *name, int } } +xfs_dacmp_t +xfs_da_compname(const uchar_t *name1, int len1, const uchar_t *name2, int len2) +{ + return (len1 == len2 && memcmp(name1, name2, len1) == 0) ? + XFS_CMP_EXACT : XFS_CMP_DIFFERENT; +} + +struct xfs_nameops xfs_default_nameops = { + .hashname = xfs_da_hashname, + .compname = xfs_da_compname +}; + /* * Add a block to the btree ahead of the file. * Return the new block number to the caller. Index: kern_ci/fs/xfs/xfs_da_btree.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_da_btree.h +++ kern_ci/fs/xfs/xfs_da_btree.h @@ -99,6 +99,15 @@ typedef struct xfs_da_node_entry xfs_da_ *========================================================================*/ /* + * Search comparison results + */ +typedef enum { + XFS_CMP_DIFFERENT, /* names are completely different */ + XFS_CMP_EXACT, /* names are exactly the same */ + XFS_CMP_CASE /* names are same but differ in case */ +} xfs_dacmp_t; + +/* * Structure to ease passing around component names. */ typedef struct xfs_da_args { @@ -127,6 +136,7 @@ typedef struct xfs_da_args { unsigned char rename; /* T/F: this is an atomic rename op */ unsigned char addname; /* T/F: this is an add operation */ unsigned char oknoent; /* T/F: ok to return ENOENT, else die */ + xfs_dacmp_t cmpresult; /* name compare result for lookups */ } xfs_da_args_t; /* @@ -201,6 +211,19 @@ typedef struct xfs_da_state { (uint)(XFS_DA_LOGOFF(BASE, ADDR)), \ (uint)(XFS_DA_LOGOFF(BASE, ADDR)+(SIZE)-1) +/* + * Name ops for directory and/or attr name operations + */ + +typedef xfs_dahash_t (*xfs_hashname_t)(const uchar_t *, int); +typedef xfs_dacmp_t (*xfs_compname_t)(const uchar_t *, int, + const uchar_t *, int); + +typedef struct xfs_nameops { + xfs_hashname_t hashname; + xfs_compname_t compname; +} xfs_nameops_t; + #ifdef __KERNEL__ /*======================================================================== @@ -248,7 +271,12 @@ xfs_daddr_t xfs_da_reada_buf(struct xfs_ int xfs_da_shrink_inode(xfs_da_args_t *args, xfs_dablk_t dead_blkno, xfs_dabuf_t *dead_buf); +extern struct xfs_nameops xfs_default_nameops; + uint xfs_da_hashname(const uchar_t *name_string, int name_length); +xfs_dacmp_t xfs_da_compname(const uchar_t *name1, int len1, + const uchar_t *name2, int len2); + xfs_da_state_t *xfs_da_state_alloc(void); void xfs_da_state_free(xfs_da_state_t *state); Index: kern_ci/fs/xfs/xfs_dir2.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2.c +++ kern_ci/fs/xfs/xfs_dir2.c @@ -64,6 +64,7 @@ xfs_dir_mount( (mp->m_dirblksize - (uint)sizeof(xfs_da_node_hdr_t)) / (uint)sizeof(xfs_da_node_entry_t); mp->m_dir_magicpct = (mp->m_dirblksize * 37) / 100; + mp->m_dirnameops = &xfs_default_nameops; } /* @@ -164,7 +165,7 @@ xfs_dir_createname( args.name = name; args.namelen = namelen; - args.hashval = xfs_da_hashname(name, namelen); + args.hashval = xfs_dir_hashname(dp, name, namelen); args.inumber = inum; args.dp = dp; args.firstblock = first; @@ -210,7 +211,7 @@ xfs_dir_lookup( args.name = name; args.namelen = namelen; - args.hashval = xfs_da_hashname(name, namelen); + args.hashval = xfs_dir_hashname(dp, name, namelen); args.inumber = 0; args.dp = dp; args.firstblock = NULL; @@ -220,6 +221,7 @@ xfs_dir_lookup( args.trans = tp; args.justcheck = args.addname = 0; args.oknoent = 1; + args.cmpresult = XFS_CMP_DIFFERENT; if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL) rval = xfs_dir2_sf_lookup(&args); @@ -263,7 +265,7 @@ xfs_dir_removename( args.name = name; args.namelen = namelen; - args.hashval = xfs_da_hashname(name, namelen); + args.hashval = xfs_dir_hashname(dp, name, namelen); args.inumber = ino; args.dp = dp; args.firstblock = first; @@ -347,7 +349,7 @@ xfs_dir_replace( args.name = name; args.namelen = namelen; - args.hashval = xfs_da_hashname(name, namelen); + args.hashval = xfs_dir_hashname(dp, name, namelen); args.inumber = inum; args.dp = dp; args.firstblock = first; @@ -390,7 +392,7 @@ xfs_dir_canenter( args.name = name; args.namelen = namelen; - args.hashval = xfs_da_hashname(name, namelen); + args.hashval = xfs_dir_hashname(dp, name, namelen); args.inumber = 0; args.dp = dp; args.firstblock = NULL; Index: kern_ci/fs/xfs/xfs_dir2.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2.h +++ kern_ci/fs/xfs/xfs_dir2.h @@ -85,6 +85,12 @@ extern int xfs_dir_canenter(struct xfs_t char *name, int namelen); extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino); +#define xfs_dir_hashname(dp, n, l) \ + ((dp)->i_mount->m_dirnameops->hashname((n), (l))) + +#define xfs_dir_compname(dp, n1, l1, n2, l2) \ + ((dp)->i_mount->m_dirnameops->compname((n1), (l1), (n2), (l2))) + /* * Utility routines for v2 directories. */ Index: kern_ci/fs/xfs/xfs_dir2_block.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_block.c +++ kern_ci/fs/xfs/xfs_dir2_block.c @@ -643,6 +643,7 @@ xfs_dir2_block_lookup_int( int mid; /* binary search current idx */ xfs_mount_t *mp; /* filesystem mount point */ xfs_trans_t *tp; /* transaction pointer */ + xfs_dacmp_t cmp; /* comparison result */ dp = args->dp; tp = args->trans; @@ -698,19 +699,33 @@ xfs_dir2_block_lookup_int( ((char *)block + xfs_dir2_dataptr_to_off(mp, addr)); /* * Compare, if it's right give back buffer & entry number. + * + * lookup case - use nameops; + * + * replace/remove case - as lookup has been already been + * performed, look for an exact match using the fast method */ - if (dep->namelen == args->namelen && - dep->name[0] == args->name[0] && - memcmp(dep->name, args->name, args->namelen) == 0) { + cmp = args->oknoent ? + xfs_dir_compname(dp, dep->name, dep->namelen, + args->name, args->namelen) : + xfs_da_compname(dep->name, dep->namelen, + args->name, args->namelen); + if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) { + args->cmpresult = cmp; *bpp = bp; *entno = mid; - return 0; + if (cmp == XFS_CMP_EXACT) + return 0; } - } while (++mid < be32_to_cpu(btp->count) && be32_to_cpu(blp[mid].hashval) == hash); + } while (++mid < be32_to_cpu(btp->count) && + be32_to_cpu(blp[mid].hashval) == hash); + + ASSERT(args->oknoent); + if (args->cmpresult == XFS_CMP_CASE) + return 0; /* * No match, release the buffer and return ENOENT. */ - ASSERT(args->oknoent); xfs_da_brelse(tp, bp); return XFS_ERROR(ENOENT); } @@ -1187,7 +1202,7 @@ xfs_dir2_sf_to_block( tagp = xfs_dir2_data_entry_tag_p(dep); *tagp = cpu_to_be16((char *)dep - (char *)block); xfs_dir2_data_log_entry(tp, bp, dep); - blp[2 + i].hashval = cpu_to_be32(xfs_da_hashname( + blp[2 + i].hashval = cpu_to_be32(xfs_dir_hashname(dp, (char *)sfep->name, sfep->namelen)); blp[2 + i].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp, (char *)dep - (char *)block)); Index: kern_ci/fs/xfs/xfs_dir2_data.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_data.c +++ kern_ci/fs/xfs/xfs_dir2_data.c @@ -140,7 +140,8 @@ xfs_dir2_data_check( addr = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk, (xfs_dir2_data_aoff_t) ((char *)dep - (char *)d)); - hash = xfs_da_hashname((char *)dep->name, dep->namelen); + hash = xfs_dir_hashname(dp, (char *)dep->name, + dep->namelen); for (i = 0; i < be32_to_cpu(btp->count); i++) { if (be32_to_cpu(lep[i].address) == addr && be32_to_cpu(lep[i].hashval) == hash) Index: kern_ci/fs/xfs/xfs_dir2_leaf.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_leaf.c +++ kern_ci/fs/xfs/xfs_dir2_leaf.c @@ -1331,6 +1331,8 @@ xfs_dir2_leaf_lookup_int( xfs_mount_t *mp; /* filesystem mount point */ xfs_dir2_db_t newdb; /* new data block number */ xfs_trans_t *tp; /* transaction pointer */ + xfs_dabuf_t *cbp; /* case match data buffer */ + xfs_dacmp_t cmp; /* name compare result */ dp = args->dp; tp = args->trans; @@ -1354,6 +1356,7 @@ xfs_dir2_leaf_lookup_int( * Loop over all the entries with the right hash value * looking to match the name. */ + cbp = NULL; for (lep = &leaf->ents[index], dbp = NULL, curdb = -1; index < be16_to_cpu(leaf->hdr.count) && be32_to_cpu(lep->hashval) == args->hashval; lep++, index++) { @@ -1371,7 +1374,7 @@ xfs_dir2_leaf_lookup_int( * need to pitch the old one and read the new one. */ if (newdb != curdb) { - if (dbp) + if (dbp != cbp) xfs_da_brelse(tp, dbp); if ((error = xfs_da_read_buf(tp, dp, @@ -1391,19 +1394,49 @@ xfs_dir2_leaf_lookup_int( xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address))); /* * If it matches then return it. + * + * lookup case - use nameops; + * + * replace/remove case - as lookup has been already been + * performed, look for an exact match using the fast method */ - if (dep->namelen == args->namelen && - dep->name[0] == args->name[0] && - memcmp(dep->name, args->name, args->namelen) == 0) { - *dbpp = dbp; + cmp = args->oknoent ? + xfs_dir_compname(dp, dep->name, dep->namelen, + args->name, args->namelen) : + xfs_da_compname(dep->name, dep->namelen, + args->name, args->namelen); + if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) { + args->cmpresult = cmp; *indexp = index; - return 0; + if (cmp == XFS_CMP_EXACT) { + /* + * case exact match: release the case-insens. + * match buffer if it exists and return the + * current data buffer. + */ + if (cbp && cbp != dbp) + xfs_da_brelse(tp, cbp); + *dbpp = dbp; + return 0; + } + cbp = dbp; } } + ASSERT(args->oknoent); + if (args->cmpresult == XFS_CMP_CASE) { + /* + * case-insensitive match: release current buffer and + * return the buffer with the case-insensitive match. + */ + if (cbp != dbp) + xfs_da_brelse(tp, dbp); + *dbpp = cbp; + return 0; + } /* * No match found, return ENOENT. */ - ASSERT(args->oknoent); + ASSERT(cbp == NULL); if (dbp) xfs_da_brelse(tp, dbp); xfs_da_brelse(tp, lbp); Index: kern_ci/fs/xfs/xfs_dir2_node.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_node.c +++ kern_ci/fs/xfs/xfs_dir2_node.c @@ -414,6 +414,7 @@ xfs_dir2_leafn_lookup_int( xfs_dir2_db_t newdb; /* new data block number */ xfs_dir2_db_t newfdb; /* new free block number */ xfs_trans_t *tp; /* transaction pointer */ + xfs_dacmp_t cmp; /* comparison result */ dp = args->dp; tp = args->trans; @@ -578,19 +579,27 @@ xfs_dir2_leafn_lookup_int( /* * Compare the entry, return it if it matches. */ - if (dep->namelen == args->namelen && - dep->name[0] == args->name[0] && - memcmp(dep->name, args->name, args->namelen) == 0) { + cmp = args->oknoent ? + xfs_dir_compname(dp, dep->name, dep->namelen, + args->name, args->namelen): + xfs_da_compname(dep->name, dep->namelen, + args->name, args->namelen); + if (cmp != XFS_CMP_DIFFERENT && + cmp != args->cmpresult) { + args->cmpresult = cmp; args->inumber = be64_to_cpu(dep->inumber); *indexp = index; - state->extravalid = 1; - state->extrablk.bp = curbp; - state->extrablk.blkno = curdb; - state->extrablk.index = - (int)((char *)dep - - (char *)curbp->data); - state->extrablk.magic = XFS_DIR2_DATA_MAGIC; - return XFS_ERROR(EEXIST); + if (cmp == XFS_CMP_EXACT) { + state->extravalid = 1; + state->extrablk.blkno = curdb; + state->extrablk.index = + (int)((char *)dep - + (char *)curbp->data); + state->extrablk.magic = + XFS_DIR2_DATA_MAGIC; + state->extrablk.bp = curbp; + return XFS_ERROR(EEXIST); + } } } } @@ -618,6 +627,14 @@ xfs_dir2_leafn_lookup_int( } } /* + * For lookup (where args->oknoent is set, and args->addname is not + * set, the state->extrablk info is not used, just freed. + */ + if (args->cmpresult == XFS_CMP_CASE) { + ASSERT(!args->addname); + return XFS_ERROR(EEXIST); + } + /* * Return the final index, that will be the insertion point. */ *indexp = index; @@ -823,9 +840,9 @@ xfs_dir2_leafn_rebalance( */ if (!state->inleaf) blk2->index = blk1->index - be16_to_cpu(leaf1->hdr.count); - - /* - * Finally sanity check just to make sure we are not returning a negative index + + /* + * Finally sanity check just to make sure we are not returning a negative index */ if(blk2->index < 0) { state->inleaf = 1; Index: kern_ci/fs/xfs/xfs_dir2_sf.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2_sf.c +++ kern_ci/fs/xfs/xfs_dir2_sf.c @@ -814,6 +814,7 @@ xfs_dir2_sf_lookup( int i; /* entry index */ xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */ xfs_dir2_sf_t *sfp; /* shortform structure */ + xfs_dacmp_t cmp; /* comparison result */ xfs_dir2_trace_args("sf_lookup", args); xfs_dir2_sf_check(args); @@ -836,6 +837,7 @@ xfs_dir2_sf_lookup( */ if (args->namelen == 1 && args->name[0] == '.') { args->inumber = dp->i_ino; + args->cmpresult = XFS_CMP_EXACT; return XFS_ERROR(EEXIST); } /* @@ -844,6 +846,7 @@ xfs_dir2_sf_lookup( if (args->namelen == 2 && args->name[0] == '.' && args->name[1] == '.') { args->inumber = xfs_dir2_sf_get_inumber(sfp, &sfp->hdr.parent); + args->cmpresult = XFS_CMP_EXACT; return XFS_ERROR(EEXIST); } /* @@ -852,15 +855,19 @@ xfs_dir2_sf_lookup( for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); i < sfp->hdr.count; i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) { - if (sfep->namelen == args->namelen && - sfep->name[0] == args->name[0] && - memcmp(args->name, sfep->name, args->namelen) == 0) { + cmp = xfs_dir_compname(dp, sfep->name, sfep->namelen, + args->name, args->namelen); + if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) { + args->cmpresult = cmp; args->inumber = xfs_dir2_sf_get_inumber(sfp, xfs_dir2_sf_inumberp(sfep)); - return XFS_ERROR(EEXIST); + if (cmp == XFS_CMP_EXACT) + return XFS_ERROR(EEXIST); } } + if (args->cmpresult == XFS_CMP_CASE) + return XFS_ERROR(EEXIST); /* * Didn't find it. */ @@ -907,9 +914,8 @@ xfs_dir2_sf_removename( for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); i < sfp->hdr.count; i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) { - if (sfep->namelen == args->namelen && - sfep->name[0] == args->name[0] && - memcmp(sfep->name, args->name, args->namelen) == 0) { + if (xfs_da_compname(sfep->name, sfep->namelen, + args->name, args->namelen) == XFS_CMP_EXACT) { ASSERT(xfs_dir2_sf_get_inumber(sfp, xfs_dir2_sf_inumberp(sfep)) == args->inumber); @@ -1044,9 +1050,9 @@ xfs_dir2_sf_replace( for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); i < sfp->hdr.count; i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) { - if (sfep->namelen == args->namelen && - sfep->name[0] == args->name[0] && - memcmp(args->name, sfep->name, args->namelen) == 0) { + if (xfs_da_compname(sfep->name, sfep->namelen, + args->name, args->namelen) == + XFS_CMP_EXACT) { #if XFS_BIG_INUMS || defined(DEBUG) ino = xfs_dir2_sf_get_inumber(sfp, xfs_dir2_sf_inumberp(sfep)); Index: kern_ci/fs/xfs/xfs_mount.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_mount.h +++ kern_ci/fs/xfs/xfs_mount.h @@ -61,6 +61,7 @@ struct xfs_bmap_free; struct xfs_extdelta; struct xfs_swapext; struct xfs_mru_cache; +struct xfs_nameops; /* * Prototypes and functions for the Data Migration subsystem. @@ -312,6 +313,7 @@ typedef struct xfs_mount { __uint8_t m_inode_quiesce;/* call quiesce on new inodes. field governed by m_ilock */ __uint8_t m_sectbb_log; /* sectlog - BBSHIFT */ + struct xfs_nameops *m_dirnameops; /* vector of dir name ops */ int m_dirblksize; /* directory block sz--bytes */ int m_dirblkfsbs; /* directory block sz--fsbs */ xfs_dablk_t m_dirdatablk; /* blockno of dir data v2 */ -- From owner-xfs@oss.sgi.com Tue Apr 1 23:27:10 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:27:23 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42, J_CHICKENPOX_43,J_CHICKENPOX_51,J_CHICKENPOX_53,J_CHICKENPOX_61, J_CHICKENPOX_62,J_CHICKENPOX_65,J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326QYuC006086 for ; Tue, 1 Apr 2008 23:26:36 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23152; Wed, 2 Apr 2008 16:27:09 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 1161) id 4112E58C4C18; Wed, 2 Apr 2008 16:27:09 +1000 (EST) Message-Id: <20080402062709.011126702@chook.melbourne.sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 02 Apr 2008 16:25:13 +1000 From: Barry Naujok To: xfs@oss.sgi.com Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Content-Disposition: inline; filename=unicode_ci.patch X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15155 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs This is the core of the case-insensitive support - supporting and enforcing UTF-8 (Unicode) filenames. All filename and user-level extended attribute names are checked for UTF-8 compliance and the hashes generated are always case-insensitive by utilising the Unicode 5.0 standard case-folding table from: http://www.unicode.org/Public/UNIDATA/CaseFolding.txt As the hash is always case-insensitive, this allows the user to mkfs.xfs the filesystem once and enable or disable (default) case-insensitive support by a mount option "-o ci". The mount option specifies which xfs_nameops.compname function to use. Also, the Unicode support is a CONFIG option so users who do not required this functionality can CONFIG it to N. As the case-folding table is stored on disk, this allows backwards and forwards compatibility and languages like Turkic to support true case-insensitivity with I and i. To create a Unicode filesystem with case-insensitive mount support, run: # mkfs.xfs -n utf8[=default|turkic] The final patches implement NLS support for XFS Unicode. Signed-off-by: Barry Naujok --- fs/xfs/Kconfig | 17 + fs/xfs/Makefile | 4 fs/xfs/linux-2.6/xfs_linux.h | 1 fs/xfs/linux-2.6/xfs_super.c | 14 + fs/xfs/linux-2.6/xfs_super.h | 7 fs/xfs/xfs_attr.c | 24 ++ fs/xfs/xfs_clnt.h | 2 fs/xfs/xfs_da_btree.c | 18 + fs/xfs/xfs_da_btree.h | 16 - fs/xfs/xfs_dir2.c | 13 - fs/xfs/xfs_dir2.h | 5 fs/xfs/xfs_fs.h | 1 fs/xfs/xfs_fsops.c | 4 fs/xfs/xfs_itable.c | 2 fs/xfs/xfs_mount.c | 21 + fs/xfs/xfs_mount.h | 7 fs/xfs/xfs_rename.c | 9 fs/xfs/xfs_sb.h | 29 ++ fs/xfs/xfs_unicode.c | 499 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_unicode.h | 81 ++++++ fs/xfs/xfs_vfsops.c | 16 + fs/xfs/xfs_vnodeops.c | 52 ++++ 22 files changed, 808 insertions(+), 34 deletions(-) Index: kern_ci/fs/xfs/Kconfig =================================================================== --- kern_ci.orig/fs/xfs/Kconfig +++ kern_ci/fs/xfs/Kconfig @@ -72,6 +72,21 @@ config XFS_POSIX_ACL If you don't know what Access Control Lists are, say N. +config XFS_UNICODE + bool "XFS Unicode support" + depends on XFS_FS + help + Unicode support enforces UTF-8 filenames and user extended + attribute names. This option is required for filesystems + mkfs'ed with UTF-8 support. A Unicode filesystem guarantees + that filenames will be the same regardless of the user's + locale. For UTF-8 locales, no conversion is required. + + Unicode filesystems also allow the filesystem to be mounted with + case-insensitive lookup support with the "-o ci" mount option. + + If you don't require UTF-8 enforcement, say N. + config XFS_RT bool "XFS Realtime subvolume support" depends on XFS_FS @@ -107,7 +122,7 @@ config XFS_TRACE bool "XFS Tracing support (EXPERIMENTAL)" depends on XFS_FS && EXPERIMENTAL help - Say Y here to get an XFS build with activity tracing enabled. + Say Y here to get an XFS build with activity tracing enabled. Enabling this option will attach historical information to XFS inodes, buffers, certain locks, the log, the IO path, and a few other key areas within XFS. These traces can be examined Index: kern_ci/fs/xfs/Makefile =================================================================== --- kern_ci.orig/fs/xfs/Makefile +++ kern_ci/fs/xfs/Makefile @@ -30,11 +30,11 @@ obj-$(CONFIG_XFS_DMAPI) += dmapi/ xfs-$(CONFIG_XFS_RT) += xfs_rtalloc.o xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o +xfs-$(CONFIG_XFS_UNICODE) += xfs_unicode.o xfs-$(CONFIG_PROC_FS) += $(XFS_LINUX)/xfs_stats.o xfs-$(CONFIG_SYSCTL) += $(XFS_LINUX)/xfs_sysctl.o xfs-$(CONFIG_COMPAT) += $(XFS_LINUX)/xfs_ioctl32.o - xfs-y += xfs_alloc.o \ xfs_alloc_btree.o \ xfs_attr.o \ @@ -97,7 +97,7 @@ xfs-y += $(addprefix $(XFS_LINUX)/, \ xfs_lrw.o \ xfs_super.o \ xfs_vnode.o \ - xfs_ksyms.o) + xfs_ksyms.o) # Objects in support/ xfs-y += $(addprefix support/, \ Index: kern_ci/fs/xfs/linux-2.6/xfs_linux.h =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_linux.h +++ kern_ci/fs/xfs/linux-2.6/xfs_linux.h @@ -76,6 +76,7 @@ #include #include #include +#include #include #include Index: kern_ci/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_super.c +++ kern_ci/fs/xfs/linux-2.6/xfs_super.c @@ -46,6 +46,7 @@ #include "xfs_acl.h" #include "xfs_attr.h" #include "xfs_buf_item.h" +#include "xfs_unicode.h" #include "xfs_utils.h" #include "xfs_vnodeops.h" #include "xfs_vfsops.h" @@ -124,6 +125,7 @@ xfs_args_allocate( #define MNTOPT_ATTR2 "attr2" /* do use attr2 attribute format */ #define MNTOPT_NOATTR2 "noattr2" /* do not use attr2 attribute format */ #define MNTOPT_FILESTREAM "filestreams" /* use filestreams allocator */ +#define MNTOPT_CILOOKUP "ci" /* case-insensitive dir lookup */ #define MNTOPT_QUOTA "quota" /* disk quotas (user) */ #define MNTOPT_NOQUOTA "noquota" /* no quotas */ #define MNTOPT_USRQUOTA "usrquota" /* user quota enabled */ @@ -318,6 +320,14 @@ xfs_parseargs( args->flags &= ~XFSMNT_ATTR2; } else if (!strcmp(this_char, MNTOPT_FILESTREAM)) { args->flags2 |= XFSMNT2_FILESTREAMS; + } else if (!strcmp(this_char, MNTOPT_CILOOKUP)) { + args->flags2 |= XFSMNT2_CILOOKUP; +#ifndef CONFIG_XFS_UNICODE + cmn_err(CE_WARN, + "XFS: %s option requires Unicode support", + this_char); + return EINVAL; +#endif } else if (!strcmp(this_char, MNTOPT_NOQUOTA)) { args->flags &= ~(XFSMNT_UQUOTAENF|XFSMNT_UQUOTA); args->flags &= ~(XFSMNT_GQUOTAENF|XFSMNT_GQUOTA); @@ -458,6 +468,7 @@ xfs_showargs( { XFS_MOUNT_OSYNCISOSYNC, "," MNTOPT_OSYNCISOSYNC }, { XFS_MOUNT_ATTR2, "," MNTOPT_ATTR2 }, { XFS_MOUNT_FILESTREAMS, "," MNTOPT_FILESTREAM }, + { XFS_MOUNT_CILOOKUP, "," MNTOPT_CILOOKUP }, { XFS_MOUNT_DMAPI, "," MNTOPT_DMAPI }, { XFS_MOUNT_GRPID, "," MNTOPT_GRPID }, { 0, NULL } @@ -567,7 +578,8 @@ xfs_set_inodeops( break; case S_IFDIR: inode->i_op = - xfs_sb_version_hasoldci(&XFS_I(inode)->i_mount->m_sb) ? + xfs_sb_version_hasoldci(&XFS_I(inode)->i_mount->m_sb) || + (XFS_I(inode)->i_mount->m_flags & XFS_MOUNT_CILOOKUP) ? &xfs_dir_ci_inode_operations : &xfs_dir_inode_operations; inode->i_fop = &xfs_dir_file_operations; Index: kern_ci/fs/xfs/linux-2.6/xfs_super.h =================================================================== --- kern_ci.orig/fs/xfs/linux-2.6/xfs_super.h +++ kern_ci/fs/xfs/linux-2.6/xfs_super.h @@ -36,6 +36,12 @@ # define ENOSECURITY EOPNOTSUPP #endif +#ifdef CONFIG_XFS_UNICODE +# define XFS_UNICODE_STRING "Unicode, " +#else +# define XFS_UNICODE_STRING +#endif + #ifdef CONFIG_XFS_RT # define XFS_REALTIME_STRING "realtime, " #else @@ -66,6 +72,7 @@ #define XFS_BUILD_OPTIONS XFS_ACL_STRING \ XFS_SECURITY_STRING \ + XFS_UNICODE_STRING \ XFS_REALTIME_STRING \ XFS_BIGFS_STRING \ XFS_TRACE_STRING \ Index: kern_ci/fs/xfs/xfs_attr.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_attr.c +++ kern_ci/fs/xfs/xfs_attr.c @@ -50,6 +50,7 @@ #include "xfs_acl.h" #include "xfs_rw.h" #include "xfs_vnodeops.h" +#include "xfs_unicode.h" /* * xfs_attr.c @@ -175,6 +176,13 @@ xfs_attr_get( if (namelen >= MAXNAMELEN) return(EFAULT); /* match IRIX behaviour */ + /* Enforce UTF-8 only for user attr names */ + if (xfs_sb_version_hasunicode(&ip->i_mount->m_sb) && + (flags & (ATTR_ROOT | ATTR_SECURE)) == 0) { + error = xfs_unicode_validate(name, namelen); + if (error) + return error; + } if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return(EIO); @@ -435,6 +443,14 @@ xfs_attr_set( if (namelen >= MAXNAMELEN) return EFAULT; /* match IRIX behaviour */ + /* Enforce UTF-8 only for user attr names */ + if (xfs_sb_version_hasunicode(&dp->i_mount->m_sb) && + (flags & (ATTR_ROOT | ATTR_SECURE)) == 0) { + int error = xfs_unicode_validate(name, namelen); + if (error) + return error; + } + XFS_STATS_INC(xs_attr_set); if (XFS_FORCED_SHUTDOWN(dp->i_mount)) @@ -581,6 +597,14 @@ xfs_attr_remove( if (namelen >= MAXNAMELEN) return EFAULT; /* match IRIX behaviour */ + /* Enforce UTF-8 only for user attr names */ + if (xfs_sb_version_hasunicode(&dp->i_mount->m_sb) && + (flags & (ATTR_ROOT | ATTR_SECURE)) == 0) { + int error = xfs_unicode_validate(name, namelen); + if (error) + return error; + } + XFS_STATS_INC(xs_attr_remove); if (XFS_FORCED_SHUTDOWN(dp->i_mount)) Index: kern_ci/fs/xfs/xfs_clnt.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_clnt.h +++ kern_ci/fs/xfs/xfs_clnt.h @@ -100,5 +100,7 @@ struct xfs_mount_args { * I/O size in stat(2) */ #define XFSMNT2_FILESTREAMS 0x00000002 /* enable the filestreams * allocator */ +#define XFSMNT2_CILOOKUP 0x00000004 /* enable case-insensitive + * filename lookup */ #endif /* __XFS_CLNT_H__ */ Index: kern_ci/fs/xfs/xfs_da_btree.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_da_btree.c +++ kern_ci/fs/xfs/xfs_da_btree.c @@ -1530,16 +1530,22 @@ xfs_da_hashname(const uchar_t *name, int } } -xfs_dacmp_t -xfs_da_compname(const uchar_t *name1, int len1, const uchar_t *name2, int len2) +static xfs_dahash_t +xfs_default_hashname(xfs_inode_t *inode, const uchar_t *name, int namelen) { - return (len1 == len2 && memcmp(name1, name2, len1) == 0) ? - XFS_CMP_EXACT : XFS_CMP_DIFFERENT; + return xfs_da_hashname(name, namelen); +} + +static xfs_dacmp_t +xfs_default_compname(xfs_inode_t *inode, const uchar_t *name1, int namelen1, + const uchar_t *name2, int namelen2) +{ + return xfs_da_compname(name1, namelen1, name2, namelen2); } struct xfs_nameops xfs_default_nameops = { - .hashname = xfs_da_hashname, - .compname = xfs_da_compname + .hashname = xfs_default_hashname, + .compname = xfs_default_compname }; /* Index: kern_ci/fs/xfs/xfs_da_btree.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_da_btree.h +++ kern_ci/fs/xfs/xfs_da_btree.h @@ -215,9 +215,10 @@ typedef struct xfs_da_state { * Name ops for directory and/or attr name operations */ -typedef xfs_dahash_t (*xfs_hashname_t)(const uchar_t *, int); -typedef xfs_dacmp_t (*xfs_compname_t)(const uchar_t *, int, - const uchar_t *, int); +typedef xfs_dahash_t (*xfs_hashname_t)(struct xfs_inode *, const uchar_t *, + int); +typedef xfs_dacmp_t (*xfs_compname_t)(struct xfs_inode *, const uchar_t *, + int, const uchar_t *, int); typedef struct xfs_nameops { xfs_hashname_t hashname; @@ -282,8 +283,13 @@ int xfs_da_shrink_inode(xfs_da_args_t *a extern struct xfs_nameops xfs_default_nameops; uint xfs_da_hashname(const uchar_t *name_string, int name_length); -xfs_dacmp_t xfs_da_compname(const uchar_t *name1, int len1, - const uchar_t *name2, int len2); + +static inline xfs_dacmp_t +xfs_da_compname(const uchar_t *name1, int len1, const uchar_t *name2, int len2) +{ + return (len1 == len2 && memcmp(name1, name2, len1) == 0) ? + XFS_CMP_EXACT : XFS_CMP_DIFFERENT; +} /* returns/frees a MAXNAMELEN buffer from a zone */ extern struct kmem_zone *xfs_da_name_zone; Index: kern_ci/fs/xfs/xfs_dir2.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2.c +++ kern_ci/fs/xfs/xfs_dir2.c @@ -43,7 +43,7 @@ #include "xfs_dir2_trace.h" #include "xfs_error.h" #include "xfs_vnodeops.h" - +#include "xfs_unicode.h" /* * V1/OLDCI case-insensitive support for directories @@ -52,6 +52,7 @@ */ static xfs_dahash_t xfs_ascii_ci_hashname( + xfs_inode_t *inode, const uchar_t *name, int namelen) { @@ -66,6 +67,7 @@ xfs_ascii_ci_hashname( static xfs_dacmp_t xfs_ascii_ci_compname( + xfs_inode_t *inode, const uchar_t *name1, int len1, const uchar_t *name2, @@ -113,8 +115,13 @@ xfs_dir_mount( (mp->m_dirblksize - (uint)sizeof(xfs_da_node_hdr_t)) / (uint)sizeof(xfs_da_node_entry_t); mp->m_dir_magicpct = (mp->m_dirblksize * 37) / 100; - mp->m_dirnameops = xfs_sb_version_hasoldci(&mp->m_sb) ? - &xfs_ascii_ci_nameops : &xfs_default_nameops; + + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + mp->m_dirnameops = (mp->m_flags & XFS_MOUNT_CILOOKUP) ? + &xfs_unicode_ci_nameops : &xfs_unicode_nameops; + } else + mp->m_dirnameops = xfs_sb_version_hasoldci(&mp->m_sb) ? + &xfs_ascii_ci_nameops : &xfs_default_nameops; } /* Index: kern_ci/fs/xfs/xfs_dir2.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_dir2.h +++ kern_ci/fs/xfs/xfs_dir2.h @@ -88,10 +88,11 @@ extern int xfs_dir_canenter(struct xfs_t extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino); #define xfs_dir_hashname(dp, n, l) \ - ((dp)->i_mount->m_dirnameops->hashname((n), (l))) + ((dp)->i_mount->m_dirnameops->hashname((dp), (n), (l))) #define xfs_dir_compname(dp, n1, l1, n2, l2) \ - ((dp)->i_mount->m_dirnameops->compname((n1), (l1), (n2), (l2))) + ((dp)->i_mount->m_dirnameops->compname((dp), (n1), (l1), \ + (n2), (l2))) /* * Utility routines for v2 directories. Index: kern_ci/fs/xfs/xfs_fs.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_fs.h +++ kern_ci/fs/xfs/xfs_fs.h @@ -241,6 +241,7 @@ typedef struct xfs_fsop_resblks { #define XFS_FSOP_GEOM_FLAGS_ATTR2 0x0400 /* inline attributes rework */ #define XFS_FSOP_GEOM_FLAGS_DIRV2CI 0x1000 /* ASCII only CI names */ #define XFS_FSOP_GEOM_FLAGS_LAZYSB 0x4000 /* lazy superblock counters */ +#define XFS_FSOP_GEOM_FLAGS_UNICODE 0x10000 /* unicode filenames */ /* Index: kern_ci/fs/xfs/xfs_fsops.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_fsops.c +++ kern_ci/fs/xfs/xfs_fsops.c @@ -100,7 +100,9 @@ xfs_fs_geometry( (xfs_sb_version_haslazysbcount(&mp->m_sb) ? XFS_FSOP_GEOM_FLAGS_LAZYSB : 0) | (xfs_sb_version_hasattr2(&mp->m_sb) ? - XFS_FSOP_GEOM_FLAGS_ATTR2 : 0); + XFS_FSOP_GEOM_FLAGS_ATTR2 : 0) | + (xfs_sb_version_hasunicode(&mp->m_sb) ? + XFS_FSOP_GEOM_FLAGS_UNICODE : 0); geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ? mp->m_sb.sb_logsectsize : BBSIZE; geo->rtsectsize = mp->m_sb.sb_blocksize; Index: kern_ci/fs/xfs/xfs_itable.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_itable.c +++ kern_ci/fs/xfs/xfs_itable.c @@ -45,6 +45,8 @@ xfs_internal_inum( xfs_ino_t ino) { return (ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino || + (xfs_sb_version_hasunicode(&mp->m_sb) && + ino == mp->m_sb.sb_cftino) || (xfs_sb_version_hasquota(&mp->m_sb) && (ino == mp->m_sb.sb_uquotino || ino == mp->m_sb.sb_gquotino))); } Index: kern_ci/fs/xfs/xfs_mount.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_mount.c +++ kern_ci/fs/xfs/xfs_mount.c @@ -44,6 +44,7 @@ #include "xfs_quota.h" #include "xfs_fsops.h" #include "xfs_utils.h" +#include "xfs_unicode.h" STATIC void xfs_mount_log_sb(xfs_mount_t *, __int64_t); STATIC int xfs_uuid_mount(xfs_mount_t *); @@ -121,6 +122,7 @@ static const struct { { offsetof(xfs_sb_t, sb_logsunit), 0 }, { offsetof(xfs_sb_t, sb_features2), 0 }, { offsetof(xfs_sb_t, sb_bad_features2), 0 }, + { offsetof(xfs_sb_t, sb_cftino), 0 }, { sizeof(xfs_sb_t), 0 } }; @@ -167,6 +169,7 @@ xfs_mount_free( sizeof(xfs_perag_t) * mp->m_sb.sb_agcount); } + xfs_unicode_free_cft(mp->m_cft); spinlock_destroy(&mp->m_ail_lock); spinlock_destroy(&mp->m_sb_lock); mutex_destroy(&mp->m_ilock); @@ -452,6 +455,7 @@ xfs_sb_from_disk( to->sb_logsunit = be32_to_cpu(from->sb_logsunit); to->sb_features2 = be32_to_cpu(from->sb_features2); to->sb_bad_features2 = be32_to_cpu(from->sb_bad_features2); + to->sb_cftino = be64_to_cpu(from->sb_cftino); } /* @@ -1175,6 +1179,18 @@ xfs_mountfs( } /* + * Load in unicode case folding table from disk + */ + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_unicode_read_cft(mp); + if (error) { + cmn_err(CE_WARN, + "XFS: failed to read case folding table"); + goto error4; + } + } + + /* * If fs is not mounted readonly, then update the superblock changes. */ if (update_flags && !(mp->m_flags & XFS_MOUNT_RDONLY)) @@ -1229,7 +1245,8 @@ xfs_mountfs( * Free up the root inode. */ IRELE(rip); - error3: + xfs_unicode_free_cft(mp->m_cft); +error3: xfs_log_unmount_dealloc(mp); error2: for (agno = 0; agno < sbp->sb_agcount; agno++) @@ -1956,7 +1973,7 @@ xfs_mount_log_sb( * 3. accurate counter sync requires m_sb_lock + per cpu locks * 4. modifying per-cpu counters requires holding per-cpu lock * 5. modifying global counters requires holding m_sb_lock - * 6. enabling or disabling a counter requires holding the m_sb_lock + * 6. enabling or disabling a counter requires holding the m_sb_lock * and _none_ of the per-cpu locks. * * Disabled counters are only ever re-enabled by a balance operation Index: kern_ci/fs/xfs/xfs_mount.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_mount.h +++ kern_ci/fs/xfs/xfs_mount.h @@ -62,6 +62,7 @@ struct xfs_extdelta; struct xfs_swapext; struct xfs_mru_cache; struct xfs_nameops; +struct xfs_cft; /* * Prototypes and functions for the Data Migration subsystem. @@ -314,6 +315,7 @@ typedef struct xfs_mount { field governed by m_ilock */ __uint8_t m_sectbb_log; /* sectlog - BBSHIFT */ struct xfs_nameops *m_dirnameops; /* vector of dir name ops */ + struct xfs_cft *m_cft; /* unicode case fold table */ int m_dirblksize; /* directory block sz--bytes */ int m_dirblkfsbs; /* directory block sz--fsbs */ xfs_dablk_t m_dirdatablk; /* blockno of dir data v2 */ @@ -379,7 +381,8 @@ typedef struct xfs_mount { counters */ #define XFS_MOUNT_FILESTREAMS (1ULL << 24) /* enable the filestreams allocator */ - +#define XFS_MOUNT_CILOOKUP (1ULL << 25) /* enable case-insensitive + file lookup */ /* * Default minimum read and write sizes. @@ -403,7 +406,7 @@ typedef struct xfs_mount { /* * Allow large block sizes to be reported to userspace programs if the - * "largeio" mount option is used. + * "largeio" mount option is used. * * If compatibility mode is specified, simply return the basic unit of caching * so that we don't get inefficient read/modify/write I/O from user apps. Index: kern_ci/fs/xfs/xfs_rename.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_rename.c +++ kern_ci/fs/xfs/xfs_rename.c @@ -39,6 +39,7 @@ #include "xfs_utils.h" #include "xfs_trans_space.h" #include "xfs_vnodeops.h" +#include "xfs_unicode.h" /* @@ -248,6 +249,14 @@ xfs_rename( xfs_itrace_entry(src_dp); xfs_itrace_entry(target_dp); + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_unicode_validate(src_name, src_namelen); + if (error) + return error; + error = xfs_unicode_validate(target_name, target_namelen); + if (error) + return error; + } if (DM_EVENT_ENABLED(src_dp, DM_EVENT_RENAME) || DM_EVENT_ENABLED(target_dp, DM_EVENT_RENAME)) { error = XFS_SEND_NAMESP(mp, DM_EVENT_RENAME, Index: kern_ci/fs/xfs/xfs_sb.h =================================================================== --- kern_ci.orig/fs/xfs/xfs_sb.h +++ kern_ci/fs/xfs/xfs_sb.h @@ -79,10 +79,18 @@ struct xfs_mount; #define XFS_SB_VERSION2_LAZYSBCOUNTBIT 0x00000002 /* Superblk counters */ #define XFS_SB_VERSION2_RESERVED4BIT 0x00000004 #define XFS_SB_VERSION2_ATTR2BIT 0x00000008 /* Inline attr rework */ +#define XFS_SB_VERSION2_UNICODEBIT 0x00000020 /* Unicode names */ -#define XFS_SB_VERSION2_OKREALFBITS \ +#ifdef CONFIG_XFS_UNICODE +# define XFS_SB_VERSION2_OKREALFBITS \ (XFS_SB_VERSION2_LAZYSBCOUNTBIT | \ + XFS_SB_VERSION2_UNICODEBIT | \ XFS_SB_VERSION2_ATTR2BIT) +#else +# define XFS_SB_VERSION2_OKREALFBITS \ + (XFS_SB_VERSION2_LAZYSBCOUNTBIT | \ + XFS_SB_VERSION2_ATTR2BIT) +#endif #define XFS_SB_VERSION2_OKSASHFBITS \ (0) #define XFS_SB_VERSION2_OKREALBITS \ @@ -156,6 +164,7 @@ typedef struct xfs_sb { * it for anything else. */ __uint32_t sb_bad_features2; + xfs_ino_t sb_cftino; /* unicode case folding table inode */ /* must be padded to 64 bit alignment */ } xfs_sb_t; @@ -225,7 +234,8 @@ typedef struct xfs_dsb { * for features2 bits. Easiest just to mark it bad and not use * it for anything else. */ - __be32 sb_bad_features2; + __be32 sb_bad_features2; + __be64 sb_cftino; /* unicode case folding table inode */ /* must be padded to 64 bit alignment */ } xfs_dsb_t; @@ -246,7 +256,7 @@ typedef enum { XFS_SBS_GQUOTINO, XFS_SBS_QFLAGS, XFS_SBS_FLAGS, XFS_SBS_SHARED_VN, XFS_SBS_INOALIGNMT, XFS_SBS_UNIT, XFS_SBS_WIDTH, XFS_SBS_DIRBLKLOG, XFS_SBS_LOGSECTLOG, XFS_SBS_LOGSECTSIZE, XFS_SBS_LOGSUNIT, - XFS_SBS_FEATURES2, XFS_SBS_BAD_FEATURES2, + XFS_SBS_FEATURES2, XFS_SBS_BAD_FEATURES2, XFS_SBS_CFTINO, XFS_SBS_FIELDCOUNT } xfs_sb_field_t; @@ -272,6 +282,7 @@ typedef enum { #define XFS_SB_FDBLOCKS XFS_SB_MVAL(FDBLOCKS) #define XFS_SB_FEATURES2 XFS_SB_MVAL(FEATURES2) #define XFS_SB_BAD_FEATURES2 XFS_SB_MVAL(BAD_FEATURES2) +#define XFS_SB_CFTINO XFS_SB_MVAL(CFTINO) #define XFS_SB_NUM_BITS ((int)XFS_SBS_FIELDCOUNT) #define XFS_SB_ALL_BITS ((1LL << XFS_SB_NUM_BITS) - 1) #define XFS_SB_MOD_BITS \ @@ -279,7 +290,7 @@ typedef enum { XFS_SB_VERSIONNUM | XFS_SB_UQUOTINO | XFS_SB_GQUOTINO | \ XFS_SB_QFLAGS | XFS_SB_SHARED_VN | XFS_SB_UNIT | XFS_SB_WIDTH | \ XFS_SB_ICOUNT | XFS_SB_IFREE | XFS_SB_FDBLOCKS | XFS_SB_FEATURES2 | \ - XFS_SB_BAD_FEATURES2) + XFS_SB_BAD_FEATURES2 | XFS_SB_CFTINO) /* @@ -480,6 +491,16 @@ static inline void xfs_sb_version_addatt ((sbp)->sb_features2 | XFS_SB_VERSION2_ATTR2BIT))); } +#ifdef CONFIG_XFS_UNICODE +static inline int xfs_sb_version_hasunicode(xfs_sb_t *sbp) +{ + return (xfs_sb_version_hasmorebits(sbp) && \ + ((sbp)->sb_features2 & XFS_SB_VERSION2_UNICODEBIT)); +} +#else +# define xfs_sb_version_hasunicode(sbp) (0) +#endif + /* * end of superblock version macros */ Index: kern_ci/fs/xfs/xfs_unicode.c =================================================================== --- /dev/null +++ kern_ci/fs/xfs/xfs_unicode.c @@ -0,0 +1,499 @@ +/* + * Copyright (c) 2007-2008 Silicon Graphics, Inc. + * All Rights Reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_bit.h" +#include "xfs_log.h" +#include "xfs_inum.h" +#include "xfs_clnt.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_ag.h" +#include "xfs_da_btree.h" +#include "xfs_dir2.h" +#include "xfs_alloc.h" +#include "xfs_dmapi.h" +#include "xfs_mount.h" +#include "xfs_bmap_btree.h" +#include "xfs_alloc_btree.h" +#include "xfs_ialloc_btree.h" +#include "xfs_dir2_sf.h" +#include "xfs_attr_sf.h" +#include "xfs_dinode.h" +#include "xfs_inode.h" +#include "xfs_btree.h" +#include "xfs_ialloc.h" +#include "xfs_itable.h" +#include "xfs_rtalloc.h" +#include "xfs_error.h" +#include "xfs_bmap.h" +#include "xfs_rw.h" +#include "xfs_unicode.h" + +#define MAX_FOLD_CHARS 4 + +static inline int +xfs_casefold( + const xfs_cft_t *cft, + __uint16_t c, + __uint16_t *fc) +{ + __uint16_t *table = XFS_CFT_PTR(cft, 0); + __uint16_t tmp = table[c >> 8]; + int i; + + if (!tmp) { + *fc = c; + return 1; + } + tmp = table[tmp + (c & 0xff)]; + if ((tmp & 0xf000) != 0xe000) { + *fc = tmp; + return 1; + } + i = ((tmp >> 10) & 0x3) + 2; + ASSERT(i < cft->num_tables); + table = XFS_CFT_PTR(cft, i - 1) + ((tmp & 0x3ff) * i); + + memcpy(fc, table, sizeof(__uint16_t) * i); + + return i; +} + +static inline int +xfs_utf8_casefold( + const xfs_cft_t *cft, + const uchar_t **name, + int *namelen, + __uint16_t *fc) +{ + wchar_t uc; + + if (*namelen == 0) + return 0; + + if (**name & 0x80) { + int n = utf8_mbtowc(&uc, *name, *namelen); + if (n < 0) { + (*namelen)--; + *fc = *(*name)++; + return 1; + } + *name += n; + *namelen -= n; + } else { + uc = *(*name)++; + (*namelen)--; + } + return xfs_casefold(cft, uc, fc); +} + +/* + * always generate a case-folded hash to allow mount-time selection of + * case-insensitive lookup (rather than mkfs time). + */ +xfs_dahash_t +xfs_unicode_hashname( + xfs_inode_t *inode, + const uchar_t *name, + int namelen) +{ + xfs_dahash_t hash = 0; + __uint16_t fc[MAX_FOLD_CHARS]; + int nfc; + int i; + + while (namelen > 0) { + nfc = xfs_utf8_casefold(inode->i_mount->m_cft, &name, &namelen, + fc); + for (i = 0; i < nfc; i++) + hash = fc[i] ^ rol32(hash, 7); + } + return hash; +} + +/* + * Perform a case-folding case-insensitive string comparison, + * returns either XFS_CMP_CASE or XFS_CMP_DIFFERENT. + */ +static xfs_dacmp_t +xfs_unicode_casecmp( + xfs_cft_t *cft, + const uchar_t *name1, + int len1, + const uchar_t *name2, + int len2) +{ + __uint16_t fc1[MAX_FOLD_CHARS], fc2[MAX_FOLD_CHARS]; + __uint16_t *pfc1, *pfc2; + int nfc1, nfc2; + + nfc1 = xfs_utf8_casefold(cft, &name1, &len1, fc1); + pfc1 = fc1; + nfc2 = xfs_utf8_casefold(cft, &name2, &len2, fc2); + pfc2 = fc2; + + while (nfc1 > 0 && nfc2 > 0) { + if (*pfc1 != *pfc2) + return XFS_CMP_DIFFERENT; + if (!--nfc1) { + nfc1 = xfs_utf8_casefold(cft, &name1, &len1, fc1); + pfc1 = fc1; + } else + pfc1++; + if (!--nfc2) { + nfc2 = xfs_utf8_casefold(cft, &name2, &len2, fc2); + pfc2 = fc2; + } else + pfc2++; + } + if (nfc1 != nfc2) + return XFS_CMP_DIFFERENT; + return XFS_CMP_CASE; + +} + +/* + * Compare two UTF-8 names to see if they are exactly the same or + * case-insensitive match. + */ +xfs_dacmp_t +xfs_unicode_compname( + xfs_inode_t *inode, + const uchar_t *name1, + int len1, + const uchar_t *name2, + int len2) +{ + wchar_t uc1, uc2; + int n; + + /* + * If the lengths are different, go straight to the case-insensitive + * comparison + */ + if (len1 != len2) + return xfs_unicode_casecmp(inode->i_mount->m_cft, + name1, len1, name2, len2); + + /* + * Start by comparing one-to-one UTF-8 chars. If we have a mismatch, + * downgrade to case-insensitive comparison on the rest of the names. + * At this stage, we only need to maintain one length variable. + */ + while (len1) { + /* + * first do a direct compare, if different, try the + * case-insensitive comparison on the remainder. + */ + if (*name1 != *name2) + return xfs_unicode_casecmp(inode->i_mount->m_cft, + name1, len1, name2, len1); + /* + * if we are working on a UTF-8 sequence, take in all + * appropriate chars and then compare. + */ + if (*name1 >= 0x80) { + n = utf8_mbtowc(&uc1, name1, len1); + if (n < 0) + return XFS_CMP_DIFFERENT; /* invalid */ + utf8_mbtowc(&uc2, name2, len1); + /* + * no need to check "n" here as the first char + * determines the length of a UTF-8 sequence. + */ + if (uc1 != uc2) + return xfs_unicode_casecmp( + inode->i_mount->m_cft, + name1, len1, name2, len1); + } else { + n = 1; + } + name1 += n; + name2 += n; + len1 -= n; + } + /* + * to get here, all chars must have matched + */ + return XFS_CMP_EXACT; +} + +static xfs_dacmp_t +xfs_default_compname( + xfs_inode_t *inode, + const uchar_t *name1, + int namelen1, + const uchar_t *name2, + int namelen2) +{ + return xfs_da_compname(name1, namelen1, name2, namelen2); +} + +struct xfs_nameops xfs_unicode_nameops = { + .hashname = xfs_unicode_hashname, + .compname = xfs_default_compname, +}; + +struct xfs_nameops xfs_unicode_ci_nameops = { + .hashname = xfs_unicode_hashname, + .compname = xfs_unicode_compname, +}; + +int +xfs_unicode_validate( + const uchar_t *name, + int namelen) +{ + wchar_t uc; + int i, nlen; + + for (i = 0; i < namelen; i += nlen) { + if (*name >= 0xf0) { + cmn_err(CE_WARN, "xfs_unicode_validate: " + "UTF-8 char beyond U+FFFF\n"); + return EINVAL; + } + /* utf8_mbtowc must fail on overlong sequences too */ + nlen = utf8_mbtowc(&uc, name + i, namelen - i); + if (nlen < 0) { + cmn_err(CE_WARN, "xfs_unicode_validate: " + "invalid UTF-8 sequence\n"); + return EILSEQ; + } + /* check for invalid/surrogate/private unicode chars */ + if (uc >= 0xfffe || (uc >= 0xd800 && uc <= 0xf8ff)) { + cmn_err(CE_WARN, "xfs_unicode_validate: " + "unsupported UTF-8 char\n"); + return EINVAL; + } + } + return 0; +} + +/* + * Unicode Case Fold Table management + */ + +struct cft_item { + xfs_cft_t *table; + int size; + int refcount; +}; + +static mutex_t cft_lock; +static int cft_size; +static struct cft_item *cft_list; + +static xfs_cft_t * +add_cft( + xfs_dcft_t *dcft, + int size) +{ + int found = 0; + int i, j; + xfs_cft_t *cft; + __be16 *duc; + __uint16_t *uc; + + mutex_lock(&cft_lock); + + for (i = 0; i < cft_size; i++) { + if (cft_list[i].size != size) + continue; + cft = cft_list[i].table; + if (cft->num_tables != be32_to_cpu(dcft->num_tables) || + cft->flags != be32_to_cpu(dcft->flags)) + continue; + found = 1; + for (j = 0; j < cft->num_tables; j++) { + if (cft->table_offset[j] != + be32_to_cpu(dcft->table_offset[j])) { + found = 0; + break; + } + } + if (found) { + cft_list[i].refcount++; + mutex_unlock(&cft_lock); + return cft; + } + } + + cft = vmalloc(size); + if (!cft) { + mutex_unlock(&cft_lock); + return NULL; + } + cft->magic = be32_to_cpu(dcft->magic); + cft->flags = be32_to_cpu(dcft->flags); + cft->num_tables = be32_to_cpu(dcft->num_tables); + ASSERT(cft->num_tables <= MAX_FOLD_CHARS); + for (i = 0; i < cft->num_tables; i++) + cft->table_offset[i] = be32_to_cpu(dcft->table_offset[i]); + j = (size - cft->table_offset[0]) / sizeof(__uint16_t); + uc = XFS_CFT_PTR(cft, 0); + duc = XFS_DCFT_PTR(dcft, 0); + for (i = 0; i < j; i++) + uc[i] = be16_to_cpu(duc[i]); + + cft_list = kmem_realloc(cft_list, + (cft_size + 1) * sizeof(struct cft_item), + cft_size * sizeof(struct cft_item), KM_SLEEP); + cft_list[cft_size].table = cft; + cft_list[cft_size].size = size; + cft_list[cft_size].refcount = 1; + cft_size++; + + mutex_unlock(&cft_lock); + + return cft; +} + +static void +remove_cft( + const xfs_cft_t *cft) +{ + int i; + + mutex_lock(&cft_lock); + + for (i = 0; i < cft_size; i++) { + if (cft_list[i].table == cft) { + ASSERT(cft_list[i].refcount > 0); + cft_list[i].refcount--; + break; + } + } + + mutex_unlock(&cft_lock); +} + + +int +xfs_unicode_read_cft( + xfs_mount_t *mp) +{ + int error; + xfs_inode_t *cftip; + int size; + int nfsb; + int nmap; + xfs_bmbt_irec_t *mapp; + int n; + int byte_cnt; + xfs_buf_t *bp; + char *table; + xfs_dcft_t *dcft; + + if (mp->m_sb.sb_cftino == NULLFSINO || mp->m_sb.sb_cftino == 0) + return EINVAL; + error = xfs_iget(mp, NULL, mp->m_sb.sb_cftino, 0, 0, &cftip, 0); + if (error) + return error; + ASSERT(cftip != NULL); + + size = cftip->i_d.di_size; + nfsb = cftip->i_d.di_nblocks; + + table = vmalloc(size); + if (!table) { + xfs_iput(cftip, 0); + return ENOMEM; + } + dcft = (xfs_dcft_t *)table; + + nmap = nfsb; + mapp = kmem_alloc(nfsb * sizeof(xfs_bmbt_irec_t), KM_SLEEP); + + error = xfs_bmapi(NULL, cftip, 0, nfsb, 0, NULL, 0, mapp, &nmap, + NULL, NULL); + if (error) + goto out; + + for (n = 0; n < nmap; n++) { + byte_cnt = XFS_FSB_TO_B(mp, mapp[n].br_blockcount); + + error = xfs_read_buf(mp, mp->m_ddev_targp, + XFS_FSB_TO_DADDR(mp, mapp[n].br_startblock), + BTOBB(byte_cnt), 0, &bp); + if (error) + goto out; + + if (size < byte_cnt) + byte_cnt = size; + size -= byte_cnt; + memcpy(table, XFS_BUF_PTR(bp), byte_cnt); + table += byte_cnt; + xfs_buf_relse(bp); + } + + /* verify case table read off disk */ + if (!uuid_equal(&dcft->uuid, &mp->m_sb.sb_uuid)) { + error = EINVAL; + goto out; + } + + /* clear UUID for in-memory copy/compare */ + memset(&dcft->uuid, 0, sizeof(dcft->uuid)); + + mp->m_cft = add_cft(dcft, cftip->i_d.di_size); + if (mp->m_cft == NULL) + error = ENOMEM; + +out: + xfs_iput(cftip, 0); + kmem_free(mapp, nfsb * sizeof(xfs_bmbt_irec_t)); + vfree(dcft); + + return error; +} + +void +xfs_unicode_free_cft( + const xfs_cft_t *cft) +{ + if (cft) + remove_cft(cft); +} + +void +xfs_unicode_init(void) +{ + mutex_init(&cft_lock); +} + +void +xfs_unicode_uninit(void) +{ + int i; + + mutex_lock(&cft_lock); + + for (i = 0; i < cft_size; i++) { + ASSERT(cft_list[i].refcount == 0); + vfree(cft_list[i].table); + } + kmem_free(cft_list, cft_size * sizeof(struct cft_item)); + cft_size = 0; + cft_list = NULL; + + mutex_unlock(&cft_lock); + mutex_destroy(&cft_lock); +} Index: kern_ci/fs/xfs/xfs_unicode.h =================================================================== --- /dev/null +++ kern_ci/fs/xfs/xfs_unicode.h @@ -0,0 +1,81 @@ +/* + * Copyright (c) 2007-2008 Silicon Graphics, Inc. + * All Rights Reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ +#ifndef __XFS_UNICODE_H__ +#define __XFS_UNICODE_H__ + +#define XFS_CFT_MAGIC 0x58434654 /* 'XCFT' */ +#define XFS_CFT_FLAG_TURKIC 0x00000001 +#define XFS_CFT_FLAG_MAX 0x00000001 + +/* + * Case Fold Table - on disk version. Must match the incore version below. + */ +typedef struct xfs_dcft { + __be32 magic; /* validity check */ + __be32 flags; + uuid_t uuid; /* UUID of the filesystem */ + __be32 crc; /* for future support */ + __be32 num_tables; /* single, double, etc */ + __be32 table_offset[1]; +} xfs_dcft_t; + +/* + * Case Fold Table - in core version. Must match the ondisk version above. + */ +typedef struct xfs_cft { + __uint32_t magic; + __uint32_t flags; + uuid_t uuid; /* UUID of the filesystem */ + __uint32_t crc; + __uint32_t num_tables; /* single, double, etc */ + __uint32_t table_offset[1];/* num_tables sized */ + /* 16-bit array tables immediately follow */ +} xfs_cft_t; + +#define XFS_CFT_PTR(t,n) (__uint16_t *)(((char *)(t)) + \ + (t)->table_offset[n]) +#define XFS_DCFT_PTR(t,n) (__be16 *)(((char *)(t)) + \ + be32_to_cpu((t)->table_offset[n])) + +#ifdef CONFIG_XFS_UNICODE + +extern struct xfs_nameops xfs_unicode_nameops; +extern struct xfs_nameops xfs_unicode_ci_nameops; + +void xfs_unicode_init(void); +void xfs_unicode_uninit(void); + +int xfs_unicode_validate(const uchar_t *name, int namelen); + +int xfs_unicode_read_cft(struct xfs_mount *mp); +void xfs_unicode_free_cft(const xfs_cft_t *cft); + +#else + +#define xfs_unicode_nameops xfs_default_nameops +#define xfs_unicode_ci_nameops xfs_default_nameops + +#define xfs_unicode_init() +#define xfs_unicode_uninit() +#define xfs_unicode_validate(n,l) 0 +#define xfs_unicode_read_cft(mp) (EOPNOTSUPP) +#define xfs_unicode_free_cft(cft) + +#endif /* CONFIG_XFS_UNICODE */ + +#endif /* __XFS_UNICODE_H__ */ Index: kern_ci/fs/xfs/xfs_vfsops.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_vfsops.c +++ kern_ci/fs/xfs/xfs_vfsops.c @@ -56,6 +56,7 @@ #include "xfs_vnodeops.h" #include "xfs_vfsops.h" #include "xfs_utils.h" +#include "xfs_unicode.h" int __init @@ -82,6 +83,7 @@ xfs_init(void) xfs_acl_zone_init(xfs_acl_zone, "xfs_acl"); xfs_mru_cache_init(); xfs_filestream_init(); + xfs_unicode_init(); /* * The size of the zone allocated buf log item is the maximum @@ -157,6 +159,7 @@ xfs_cleanup(void) xfs_filestream_uninit(); xfs_mru_cache_uninit(); xfs_acl_zone_destroy(xfs_acl_zone); + xfs_unicode_uninit(); #ifdef XFS_DIR2_TRACE ktrace_free(xfs_dir2_trace_buf); @@ -399,6 +402,19 @@ xfs_finish_flags( mp->m_qflags |= XFS_OQUOTA_ENFD; } + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + if (ap->flags2 & XFSMNT2_CILOOKUP) + mp->m_flags |= XFS_MOUNT_CILOOKUP; + } else { + /* + * Check for mount options which require a Unicode FS + */ + if (ap->flags2 & XFSMNT2_CILOOKUP) { + cmn_err(CE_WARN, + "XFS: can't do case-insensitive mount on non-utf8 filesystem"); + return XFS_ERROR(EINVAL); + } + } return 0; } Index: kern_ci/fs/xfs/xfs_vnodeops.c =================================================================== --- kern_ci.orig/fs/xfs/xfs_vnodeops.c +++ kern_ci/fs/xfs/xfs_vnodeops.c @@ -52,6 +52,7 @@ #include "xfs_log_priv.h" #include "xfs_filestream.h" #include "xfs_vnodeops.h" +#include "xfs_unicode.h" int xfs_open( @@ -1777,6 +1778,12 @@ xfs_lookup( if (XFS_FORCED_SHUTDOWN(dp->i_mount)) return XFS_ERROR(EIO); + if (xfs_sb_version_hasunicode(&dp->i_mount->m_sb)) { + error = xfs_unicode_validate(d_name->name, d_name->len); + if (error) + return error; + } + name.name = (uchar_t *)d_name->name; name.len = d_name->len; rname.name = NULL; @@ -1822,8 +1829,17 @@ xfs_create( ASSERT(!*ipp); xfs_itrace_entry(dp); + if (XFS_FORCED_SHUTDOWN(mp)) + return XFS_ERROR(EIO); + namelen = VNAMELEN(dentry); + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_unicode_validate(name, namelen); + if (error) + return error; + } + if (DM_EVENT_ENABLED(dp, DM_EVENT_CREATE)) { error = XFS_SEND_NAMESP(mp, DM_EVENT_CREATE, dp, DM_RIGHT_NULL, NULL, @@ -1835,9 +1851,6 @@ xfs_create( dm_event_sent = 1; } - if (XFS_FORCED_SHUTDOWN(mp)) - return XFS_ERROR(EIO); - /* Return through std_return after this point. */ udqp = gdqp = NULL; @@ -2282,7 +2295,7 @@ xfs_remove( xfs_inode_t *ip = VNAME_TO_INODE(dentry); int namelen = VNAMELEN(dentry); xfs_trans_t *tp = NULL; - int error = 0; + int error; xfs_bmap_free_t free_list; xfs_fsblock_t first_block; int cancel_flags; @@ -2295,6 +2308,12 @@ xfs_remove( if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_unicode_validate(name, namelen); + if (error) + return error; + } + if (DM_EVENT_ENABLED(dp, DM_EVENT_REMOVE)) { error = XFS_SEND_NAMESP(mp, DM_EVENT_REMOVE, dp, DM_RIGHT_NULL, NULL, DM_RIGHT_NULL, @@ -2504,6 +2523,12 @@ xfs_link( if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_unicode_validate(target_name, target_namelen); + if (error) + return error; + } + if (DM_EVENT_ENABLED(tdp, DM_EVENT_LINK)) { error = XFS_SEND_NAMESP(mp, DM_EVENT_LINK, tdp, DM_RIGHT_NULL, @@ -2661,6 +2686,12 @@ xfs_mkdir( if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_unicode_validate(dir_name, dir_namelen); + if (error) + return error; + } + tp = NULL; if (DM_EVENT_ENABLED(dp, DM_EVENT_CREATE)) { @@ -2869,6 +2900,12 @@ xfs_rmdir( if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_unicode_validate(name, namelen); + if (error) + return error; + } + if (DM_EVENT_ENABLED(dp, DM_EVENT_REMOVE)) { error = XFS_SEND_NAMESP(mp, DM_EVENT_REMOVE, dp, DM_RIGHT_NULL, @@ -3097,7 +3134,6 @@ xfs_symlink( int link_namelen; *ipp = NULL; - error = 0; ip = NULL; tp = NULL; @@ -3108,6 +3144,12 @@ xfs_symlink( link_namelen = VNAMELEN(dentry); + if (xfs_sb_version_hasunicode(&mp->m_sb)) { + error = xfs_unicode_validate(link_name, link_namelen); + if (error) + return error; + } + /* * Check component lengths of the target path name. */ -- From owner-xfs@oss.sgi.com Tue Apr 1 23:56:34 2008 Received: with ECARTIS (v1.0.0; list xfs); Tue, 01 Apr 2008 23:57:08 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m326uTxI012058 for ; Tue, 1 Apr 2008 23:56:33 -0700 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA24280; Wed, 2 Apr 2008 16:57:01 +1000 Date: Wed, 02 Apr 2008 16:58:46 +1000 To: "David Chinner" , xfs-dev Subject: Re: [Patch] xfsqa: 091 needs to support sector size != 512 bytes From: "Barry Naujok" Organization: SGI Cc: xfs-oss Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <20080402053451.GE103491721@sgi.com> Message-ID: In-Reply-To: <20080402053451.GE103491721@sgi.com> User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id m326uYxI012064 X-archive-position: 15158 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs Not tested yet, but looks fine. On Wed, 02 Apr 2008 15:34:51 +1000, David Chinner wrote: > Test 091 assumes a direct I/O alignment of 512 bytes, > a hold over from 2.4 kernels. On 2.6. kernels, direct > I/O needs to be aligned to the sector size the filesystem > was mkfs'd with. > > Teach 091 about 2.6 kernels and grab the sector size from the > xfs_info output. > > Signed-off-by: Dave Chinner > --- > xfstests/091 | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > Index: xfs-cmds/xfstests/091 > =================================================================== > --- xfs-cmds.orig/xfstests/091 2007-03-19 08:49:37.000000000 +1100 > +++ xfs-cmds/xfstests/091 2008-04-02 15:27:39.266824430 +1000 > @@ -43,10 +43,20 @@ run_fsx() > psize=`$here/src/feature -s` > bsize=512 > -# 2.4 Linux kernels support bsize aligned direct I/O only > kernel=`uname -r | sed -e 's/\(2\..\).*/\1/'` > + > +# 2.4 Linux kernels support bsize aligned direct I/O only > [ "$HOSTOS" = "Linux" -a "$kernel" = "2.4" ] && bsize=$psize > +# 2.6 Linux kernels support sector aligned direct I/O only > +if [ "$HOSTOS" = "Linux" -a "$kernel" = "2.6" ]; then > + xfs_info $TEST_DIR | _filter_mkfs 2> $tmp.info > + if [ $? -eq 0 ]; then > + source $tmp.info > + bsize=$sectsz > + fi > +fi > + > # fsx usage: > # > # -N numops: total # operations to do > > From owner-xfs@oss.sgi.com Wed Apr 2 01:28:05 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 01:28:13 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m328S4cb002118 for ; Wed, 2 Apr 2008 01:28:05 -0700 X-ASG-Debug-ID: 1207124920-419202f40000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp-out03.alice-dsl.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 203748EB221 for ; Wed, 2 Apr 2008 01:28:40 -0700 (PDT) Received: from smtp-out03.alice-dsl.net (smtp-out03.alice-dsl.net [88.44.63.5]) by cuda.sgi.com with ESMTP id au9jcMNyrAm4sE0g for ; Wed, 02 Apr 2008 01:28:40 -0700 (PDT) Received: from out.alice-dsl.de ([192.168.125.62]) by smtp-out03.alice-dsl.net with Microsoft SMTPSVC(6.0.3790.1830); Wed, 2 Apr 2008 10:21:26 +0200 Received: from basil.firstfloor.org ([78.53.156.28]) by out.alice-dsl.de with Microsoft SMTPSVC(6.0.3790.1830); Wed, 2 Apr 2008 10:21:26 +0200 Received: by basil.firstfloor.org (Postfix, from userid 1000) id 3F9FE1B4211; Wed, 2 Apr 2008 10:28:07 +0200 (CEST) To: David Chinner Cc: Lachlan McIlroy , xfs-dev , xfs-oss X-ASG-Orig-Subj: Re: [Patch] Cacheline align xlog_t Subject: Re: [Patch] Cacheline align xlog_t References: <20080401231552.GV103491721@sgi.com> <47F3293C.6090708@sgi.com> <20080402054403.GF103491721@sgi.com> From: Andi Kleen Date: 02 Apr 2008 10:28:07 +0200 In-Reply-To: <20080402054403.GF103491721@sgi.com> Message-ID: <87myocek4o.fsf@basil.nowhere.org> Lines: 21 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-OriginalArrivalTime: 02 Apr 2008 08:21:26.0660 (UTC) FILETIME=[8B83BC40:01C8949A] X-Barracuda-Connect: smtp-out03.alice-dsl.net[88.44.63.5] X-Barracuda-Start-Time: 1207124921 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46605 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15159 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: andi@firstfloor.org Precedence: bulk X-list: xfs David Chinner writes: > > This just means that the start of the structure is cacheline > aligned. I don't think the internal alignment commands force the > entire structure to be cacheline aligned, merely pad the struture > internally. In that case, even though the specific internal parts of > the structure are on separate cache lines, there's no guarantee that > all the related members are on the same cacheline. Hence I'm > explicitly stating the exact alignment I want for the structure.... Isn't the structure dynamically allocated anyways? The full type alignment really only matters for statics/globals where the linker can handle it. For the dynamic allocation you would rather need to make sure it starts at a cache line boundary explicitely because the allocator doesn't know the alignment of the target type, otherwise your careful padding might be useless. -Andi From owner-xfs@oss.sgi.com Wed Apr 2 04:33:36 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 04:34:04 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m32BXZNs000916 for ; Wed, 2 Apr 2008 04:33:36 -0700 X-ASG-Debug-ID: 1207136050-371c00c10000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp7-g19.free.fr (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 56684729D05 for ; Wed, 2 Apr 2008 04:34:11 -0700 (PDT) Received: from smtp7-g19.free.fr (smtp7-g19.free.fr [212.27.42.64]) by cuda.sgi.com with ESMTP id JvviJI7HL8OTV0Cs for ; Wed, 02 Apr 2008 04:34:11 -0700 (PDT) Received: from smtp7-g19.free.fr (localhost [127.0.0.1]) by smtp7-g19.free.fr (Postfix) with ESMTP id B989F322823; Wed, 2 Apr 2008 13:33:39 +0200 (CEST) Received: from galadriel.home (pla78-1-82-235-234-79.fbx.proxad.net [82.235.234.79]) by smtp7-g19.free.fr (Postfix) with ESMTP id 58169322837; Wed, 2 Apr 2008 13:33:39 +0200 (CEST) Date: Wed, 2 Apr 2008 13:30:03 +0200 From: Emmanuel Florac To: David Chinner Cc: David Chinner , xfs@oss.sgi.com X-ASG-Orig-Subj: Re: Serious XFS crash Subject: Re: Serious XFS crash Message-ID: <20080402133003.4bb043e4@galadriel.home> In-Reply-To: <20080402055831.GG103491721@sgi.com> References: <20080325185453.3a1957dd@galadriel.home> <20080325233611.GW103491721@sgi.com> <20080401140035.46470306@galadriel.home> <20080402055831.GG103491721@sgi.com> Organization: Intellique X-Mailer: Claws Mail 2.9.1 (GTK+ 2.8.20; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_N0nVzd=sq85jQ=VAox5cYeJ" X-Barracuda-Connect: smtp7-g19.free.fr[212.27.42.64] X-Barracuda-Start-Time: 1207136052 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46617 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15160 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: eflorac@intellique.com Precedence: bulk X-list: xfs --MP_N0nVzd=sq85jQ=VAox5cYeJ Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Le Wed, 2 Apr 2008 15:58:31 +1000 vous =E9criviez: > The log is rather garbled - can you repost? Also, XFS usually outputs > an error message before the stack trace; can you make sure you > paste that as well (if it exists)? Well I attached the relevant part of kern.log; the message just before the crash is not very clear... You can see the other messages relevant to the disk error too. --=20 -------------------------------------------------- Emmanuel Florac www.intellique.com=20=20=20 -------------------------------------------------- --MP_N0nVzd=sq85jQ=VAox5cYeJ Content-Type: text/plain; name=log Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=log TWFyICA2IDA2OjI1OjA0IHN5c3RlbTMga2VybmVsOiAzdy05eHh4OiBzY3Np MDogQUVOOiBXQVJOSU5HICgweDA0OjB4MDAyMyk6IFNlY3RvciByZXBhaXIg Y29tcGxldGVkOnBvcnQ9NiwgTEJBPTB4RTZFMkEuDQpNYXIgIDYgMDY6MjU6 MDQgc3lzdGVtMyBrZXJuZWw6IFJlaXNlckZTOiB3YXJuaW5nOiBpc190cmVl X25vZGU6IG5vZGUgbGV2ZWwgMjg3ODQgZG9lcyBub3QgbWF0Y2ggdG8gdGhl IGV4cGVjdGVkIG9uZSAxDQpNYXIgIDYgMDY6MjU6MDQgc3lzdGVtMyBrZXJu ZWw6IFJlaXNlckZTOiBzZGExOiB3YXJuaW5nOiB2cy01MTUwOiBzZWFyY2hf Ynlfa2V5OiBpbnZhbGlkIGZvcm1hdCBmb3VuZCBpbiBibG9jayA3NTM2NzEu IEZzY2s/DQpNYXIgIDYgMDY6MjU6MDQgc3lzdGVtMyBrZXJuZWw6IFJlaXNl ckZTOiBzZGExOiB3YXJuaW5nOiB2cy0xMzA3MDogcmVpc2VyZnNfcmVhZF9s b2NrZWRfaW5vZGU6IGkvbyBmYWlsdXJlIG9jY3VycmVkIHRyeWluZyB0byBm aW5kIHN0YXQgZGF0YSBvZiBbMTg0MDQgMTg0NjMgMHgwIFNEXQ0KTWFyICA2 IDEwOjQyOjQ2IHN5c3RlbTMga2VybmVsOiAweDA6IDAwIDAwIDAwIDAwIDAw IDAwIDAwIDAwIDAwIDAwIDAwIDAwIDAwIDAwIDAwIDAwIA0KTWFyICA2IDEw OjQyOjQ2IHN5c3RlbTMga2VybmVsOiBGaWxlc3lzdGVtICJtZDAiOiBYRlMg aW50ZXJuYWwgZXJyb3IgeGZzX2FsbG9jX3JlYWRfYWdmIGF0IGxpbmUgMjE5 MCBvZiBmaWxlIGZzL3hmcy94ZnNfYWxsb2MuYy4gIENhbGxlciAweGMwMWY0 Yjg4DQpNYXIgIDYgMTA6NDI6NDYgc3lzdGVtMyBrZXJuZWw6ICBbeGZzX2Fs bG9jX3JlYWRfYWdmKzI0NC80MzJdIHhmc19hbGxvY19yZWFkX2FnZisweGY0 LzB4MWIwDQpNYXIgIDYgMTA6NDI6NDYgc3lzdGVtMyBrZXJuZWw6ICBbeGZz X2FsbG9jX2ZpeF9mcmVlbGlzdCsxMDAwLzExMjBdIHhmc19hbGxvY19maXhf ZnJlZWxpc3QrMHgzZTgvMHg0NjANCk1hciAgNiAxMDo0Mjo0NiBzeXN0ZW0z IGxhc3QgbWVzc2FnZSByZXBlYXRlZCAyIHRpbWVzDQpNYXIgIDYgMTA6NDI6 NDYgc3lzdGVtMyBrZXJuZWw6ICBbX3hmc190cmFuc19jb21taXQrNDg5Lzky OF0gX3hmc190cmFuc19jb21taXQrMHgxZTkvMHgzYTANCk1hciAgNiAxMDo0 Mjo0NiBzeXN0ZW0zIGtlcm5lbDogIFt4ZnNfZnJlZV9leHRlbnQrMTUyLzIy NF0geGZzX2ZyZWVfZXh0ZW50KzB4OTgvMHhlMA0KTWFyICA2IDEwOjQyOjQ2 IHN5c3RlbTMga2VybmVsOiAgW3hmc19ibWFwX2ZpbmlzaCsyNjMvNDAwXSB4 ZnNfYm1hcF9maW5pc2grMHgxMDcvMHgxOTANCk1hciAgNiAxMDo0Mjo0NiBz eXN0ZW0zIGtlcm5lbDogIFt4ZnNfaXRydW5jYXRlX2ZpbmlzaCs1NDQvOTc2 XSB4ZnNfaXRydW5jYXRlX2ZpbmlzaCsweDIyMC8weDNkMA0KTWFyICA2IDEw OjQyOjQ2IHN5c3RlbTMga2VybmVsOiAgW3hmc190cmFuc19pam9pbis0My8x MjhdIHhmc190cmFuc19pam9pbisweDJiLzB4ODANCk1hciAgNiAxMDo0Mjo0 NiBzeXN0ZW0zIGtlcm5lbDogIFt4ZnNfaW5hY3RpdmUrMTE5NS8xMjk2XSB4 ZnNfaW5hY3RpdmUrMHg0YWIvMHg1MTANCk1hciAgNiAxMDo0Mjo0NiBzeXN0 ZW0zIGtlcm5lbDogIFt4ZnNfZnNfY2xlYXJfaW5vZGUrMTU2LzE5Ml0geGZz X2ZzX2NsZWFyX2lub2RlKzB4OWMvMHhjMA0KTWFyICA2IDEwOjQyOjQ2IHN5 c3RlbTMga2VybmVsOiAgW2ludmFsaWRhdGVfaW5vZGVfYnVmZmVycysyMS8x MTJdIGludmFsaWRhdGVfaW5vZGVfYnVmZmVycysweDE1LzB4NzANCk1hciAg NiAxMDo0Mjo0NiBzeXN0ZW0zIGtlcm5lbDogIFtjbGVhcl9pbm9kZSsyMTIv MzIwXSBjbGVhcl9pbm9kZSsweGQ0LzB4MTQwDQpNYXIgIDYgMTA6NDI6NDYg c3lzdGVtMyBrZXJuZWw6ICBbdHJ1bmNhdGVfaW5vZGVfcGFnZXMrMjMvMzJd IHRydW5jYXRlX2lub2RlX3BhZ2VzKzB4MTcvMHgyMA0KTWFyICA2IDEwOjQy OjQ2IHN5c3RlbTMga2VybmVsOiAgW2dlbmVyaWNfZGVsZXRlX2lub2RlKzI2 NC8yNzJdIGdlbmVyaWNfZGVsZXRlX2lub2RlKzB4MTA4LzB4MTEwDQpNYXIg IDYgMTA6NDI6NDYgc3lzdGVtMyBrZXJuZWw6ICBbaXB1dCs4My8xMTJdIGlw dXQrMHg1My8weDcwDQpNYXIgIDYgMTA6NDI6NDYgc3lzdGVtMyBrZXJuZWw6 ICBbZG9fdW5saW5rYXQrMTg2LzI3Ml0gZG9fdW5saW5rYXQrMHhiYS8weDEx MA0KTWFyICA2IDEwOjQyOjQ2IHN5c3RlbTMga2VybmVsOiAgW3N5c19mY250 bDY0Kzg5LzE0NF0gc3lzX2ZjbnRsNjQrMHg1OS8weDkwDQpNYXIgIDYgMTA6 NDI6NDYgc3lzdGVtMyBrZXJuZWw6ICBbc3lzY2FsbF9jYWxsKzcvMTFdIHN5 c2NhbGxfY2FsbCsweDcvMHhiDQpNYXIgIDYgMTA6NDI6NDYgc3lzdGVtMyBr ZXJuZWw6IHhmc19mb3JjZV9zaHV0ZG93bihtZDAsMHg4KSBjYWxsZWQgZnJv bSBsaW5lIDQyNjcgb2YgZmlsZSBmcy94ZnMveGZzX2JtYXAuYy4gIFJldHVy biBhZGRyZXNzID0gMHhjMDI1NmIyOQ0KTWFyICA2IDEwOjQyOjQ2IHN5c3Rl bTMga2VybmVsOiBGaWxlc3lzdGVtICJtZDAiOiBDb3JydXB0aW9uIG9mIGlu LW1lbW9yeSBkYXRhIGRldGVjdGVkLiAgU2h1dHRpbmcgZG93biBmaWxlc3lz dGVtOiBtZDANCk1hciAgNiAxMDo0Mjo0NiBzeXN0ZW0zIGtlcm5lbDogUGxl YXNlIHVtb3VudCB0aGUgZmlsZXN5c3RlbSwgYW5kIHJlY3RpZnkgdGhlIHBy b2JsZW0ocykNCk1hciAgNiAxMDo1MToxOSBzeXN0ZW0zIGtlcm5lbDogM3ct OXh4eDogc2NzaTA6IEFFTjogV0FSTklORyAoMHgwNDoweDAwMjMpOiBTZWN0 b3IgcmVwYWlyIGNvbXBsZXRlZDpwb3J0PTYsIExCQT0weEU2RTAwLg0KTWFy ICA2IDEwOjUxOjIwIHN5c3RlbTMga2VybmVsOiAzdy05eHh4OiBzY3NpMDog QUVOOiBXQVJOSU5HICgweDA0OjB4MDAyMyk6IFNlY3RvciByZXBhaXIgY29t cGxldGVkOnBvcnQ9NiwgTEJBPTB4RTZEQ0EuDQo= --MP_N0nVzd=sq85jQ=VAox5cYeJ-- From owner-xfs@oss.sgi.com Wed Apr 2 05:16:28 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 05:16:36 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,STOX_REPLY_TYPE autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m32CGRm6007138 for ; Wed, 2 Apr 2008 05:16:28 -0700 X-ASG-Debug-ID: 1207138622-3678018e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from tyo201.gate.nec.co.jp (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7CA301069942 for ; Wed, 2 Apr 2008 05:17:02 -0700 (PDT) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193]) by cuda.sgi.com with ESMTP id 0QHbnAYc1hOOM8JM for ; Wed, 02 Apr 2008 05:17:02 -0700 (PDT) Received: from mailgate3.nec.co.jp (mailgate54B.nec.co.jp [10.7.69.195]) by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id m32CH0LP029798; Wed, 2 Apr 2008 21:17:00 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id m32CH0w05545; Wed, 2 Apr 2008 21:17:00 +0900 (JST) Received: from togyo.jp.nec.com (togyo.jp.nec.com [10.26.220.4]) by mailsv3.nec.co.jp (8.13.8/8.13.4) with ESMTP id m32CH0wa022192; Wed, 2 Apr 2008 21:17:00 +0900 (JST) Received: from TNESB07336 ([10.64.168.65] [10.64.168.65]) by mail.jp.nec.com with ESMTP; Wed, 2 Apr 2008 21:16:59 +0900 Message-Id: <3AF481B79787436E8E20D6C597D5DD1C@nsl.ad.nec.co.jp> From: "Takashi Sato" To: "David Chinner" Cc: , , , , References: <20080328180736t-sato@mail.jp.nec.com> <20080331000057.GI108924158@sgi.com> <2530BB4B166747659C8F65C9C3DE7CFB@nsl.ad.nec.co.jp> <20080402062147.GH103491721@sgi.com> In-Reply-To: <20080402062147.GH103491721@sgi.com> X-ASG-Orig-Subj: Re: [RFC PATCH 2/2] Add timeout feature Subject: Re: [RFC PATCH 2/2] Add timeout feature Date: Wed, 2 Apr 2008 21:16:59 +0900 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6000.16480 X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6000.16545 X-Barracuda-Connect: TYO201.gate.nec.co.jp[202.32.8.193] X-Barracuda-Start-Time: 1207138623 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46621 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15161 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: t-sato@yk.jp.nec.com Precedence: bulk X-list: xfs Hi, David Chinner wrote: >> Exactly my timeout feature is only for an application, not for >> freeze_bdev(). >> I think it is needed for the situation we can't unfreeze from userspace. >> (e.g. Freezing the root filesystem) > > Ummm - why can't you unfreeze the root fs from userspace? freezing > only prevents modification to the filesystem. A frozen filesystem is > effectively a read-only filesystem... > > On XFS: > > # xfs_freeze -f / > # echo $? > 0 > # xfs_freeze -u / > # echo $? > 0 Yes. If we have already logged in, we can unfreeze as above. But if not, we cannot log in and unfreeze because the modification of /var/log/wtmp is blocked in the log-in procedure. The timeout feature will work in such case. Cheers, Takashi From owner-xfs@oss.sgi.com Wed Apr 2 15:07:30 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 15:07:40 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m32M7PR1003507 for ; Wed, 2 Apr 2008 15:07:29 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA22127; Thu, 3 Apr 2008 08:07:54 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m32M7qsT119833493; Thu, 3 Apr 2008 08:07:53 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m32M7oWn119843479; Thu, 3 Apr 2008 08:07:50 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 3 Apr 2008 08:07:50 +1000 From: David Chinner To: Emmanuel Florac Cc: David Chinner , xfs@oss.sgi.com Subject: Re: Serious XFS crash Message-ID: <20080402220750.GJ103491721@sgi.com> References: <20080325185453.3a1957dd@galadriel.home> <20080325233611.GW103491721@sgi.com> <20080401140035.46470306@galadriel.home> <20080402055831.GG103491721@sgi.com> <20080402133003.4bb043e4@galadriel.home> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20080402133003.4bb043e4@galadriel.home> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15162 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 01:30:03PM +0200, Emmanuel Florac wrote: > Le Wed, 2 Apr 2008 15:58:31 +1000 vous écriviez: > > > The log is rather garbled - can you repost? Also, XFS usually outputs > > an error message before the stack trace; can you make sure you > > paste that as well (if it exists)? > > Well I attached the relevant part of kern.log; the message just before > the crash is not very clear... You can see the other messages relevant > to the disk error too. Like the fact reiser is also complaining about corrupted blocks? > Mar 6 06:25:04 system3 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0023): Sector repair completed:port=6, LBA=0xE6E2A. > Mar 6 06:25:04 system3 kernel: ReiserFS: warning: is_tree_node: node level 28784 does not match to the expected one 1 > Mar 6 06:25:04 system3 kernel: ReiserFS: sda1: warning: vs-5150: search_by_key: invalid format found in block 753671. Fsck? > Mar 6 06:25:04 system3 kernel: ReiserFS: sda1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [18404 18463 0x0 SD] and: > Mar 6 10:42:46 system3 kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Mar 6 10:42:46 system3 kernel: Filesystem "md0": XFS internal error xfs_alloc_read_agf at line 2190 of file fs/xfs/xfs_alloc.c. Caller 0xc01f4b88 That's an AGF made up of zeros instead of real metadata. Something has trashed it - perhaps a "sector repair"? > Mar 6 10:42:46 system3 kernel: Please umount the filesystem, and rectify the problem(s) > Mar 6 10:51:19 system3 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0023): Sector repair completed:port=6, LBA=0xE6E00. > Mar 6 10:51:20 system3 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0023): Sector repair completed:port=6, LBA=0xE6DCA. I'd go and find whatever disk is located at LBA 0xE6DCA-0xE6E2A and replace it - if there are that many repairs needed on it, it's likely to be failing.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Apr 2 15:23:21 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 15:23:29 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m32MNH2a006373 for ; Wed, 2 Apr 2008 15:23:19 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA22663; Thu, 3 Apr 2008 08:23:52 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m32MNosT119784846; Thu, 3 Apr 2008 08:23:52 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m32MNl6X119772053; Thu, 3 Apr 2008 08:23:47 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 3 Apr 2008 08:23:47 +1000 From: David Chinner To: Andi Kleen Cc: David Chinner , Lachlan McIlroy , xfs-dev , xfs-oss Subject: Re: [Patch] Cacheline align xlog_t Message-ID: <20080402222347.GK103491721@sgi.com> References: <20080401231552.GV103491721@sgi.com> <47F3293C.6090708@sgi.com> <20080402054403.GF103491721@sgi.com> <87myocek4o.fsf@basil.nowhere.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87myocek4o.fsf@basil.nowhere.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15163 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 10:28:07AM +0200, Andi Kleen wrote: > David Chinner writes: > > > > This just means that the start of the structure is cacheline > > aligned. I don't think the internal alignment commands force the > > entire structure to be cacheline aligned, merely pad the struture > > internally. In that case, even though the specific internal parts of > > the structure are on separate cache lines, there's no guarantee that > > all the related members are on the same cacheline. Hence I'm > > explicitly stating the exact alignment I want for the structure.... > > Isn't the structure dynamically allocated anyways? > The full type alignment really only matters for statics/globals > where the linker can handle it. Ah, right you are. My bad. > For the dynamic allocation you would rather need to make sure it > starts at a cache line boundary explicitely because the allocator doesn't > know the alignment of the target type, otherwise your careful > padding might be useless. Yup. Is there an allocator function gives us cacheline aligned allocation (apart from a slab initialised with SLAB_HWCACHE_ALIGN)? There isn't one, right? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Apr 2 15:23:52 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 15:24:02 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m32MNodH006515 for ; Wed, 2 Apr 2008 15:23:52 -0700 X-ASG-Debug-ID: 1207175065-1c15018f0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp7-g19.free.fr (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 9B0B0730A6B for ; Wed, 2 Apr 2008 15:24:25 -0700 (PDT) Received: from smtp7-g19.free.fr (smtp7-g19.free.fr [212.27.42.64]) by cuda.sgi.com with ESMTP id hnBAH6MHZKYTIufW for ; Wed, 02 Apr 2008 15:24:25 -0700 (PDT) Received: from smtp7-g19.free.fr (localhost [127.0.0.1]) by smtp7-g19.free.fr (Postfix) with ESMTP id ACAC73227FB; Thu, 3 Apr 2008 00:24:24 +0200 (CEST) Received: from galadriel.home (pla78-1-82-235-234-79.fbx.proxad.net [82.235.234.79]) by smtp7-g19.free.fr (Postfix) with ESMTP id 648033227E9; Thu, 3 Apr 2008 00:24:24 +0200 (CEST) Date: Thu, 3 Apr 2008 00:22:48 +0200 From: Emmanuel Florac To: David Chinner Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: Serious XFS crash Subject: Re: Serious XFS crash Message-ID: <20080403002248.4bd263e6@galadriel.home> In-Reply-To: <20080402220750.GJ103491721@sgi.com> References: <20080325185453.3a1957dd@galadriel.home> <20080325233611.GW103491721@sgi.com> <20080401140035.46470306@galadriel.home> <20080402055831.GG103491721@sgi.com> <20080402133003.4bb043e4@galadriel.home> <20080402220750.GJ103491721@sgi.com> Organization: Intellique X-Mailer: Claws Mail 2.9.1 (GTK+ 2.8.20; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Barracuda-Connect: smtp7-g19.free.fr[212.27.42.64] X-Barracuda-Start-Time: 1207175066 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46662 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id m32MNrdH006521 X-archive-position: 15164 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: eflorac@intellique.com Precedence: bulk X-list: xfs Le Thu, 3 Apr 2008 08:07:50 +1000 vous écriviez: > I'd go and find whatever disk is located at LBA 0xE6DCA-0xE6E2A and > replace it - if there are that many repairs needed on it, it's likely > to be failing.... > Oh, it failed and I changed it. However it's a RAID-5 and though it appeared corrected, as you've seen the XFS fs crashed for no apparent reason (there was little or no activity at the time of the march 23rd crash) later. I was wondering if it could be related, for instance if some garbage may have remained hidden somewhere and break it later, like a standing nail waiting for someone to step on it... -- -------------------------------------------------- Emmanuel Florac www.intellique.com -------------------------------------------------- From owner-xfs@oss.sgi.com Wed Apr 2 17:22:14 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 17:22:23 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m330MC5j027016 for ; Wed, 2 Apr 2008 17:22:14 -0700 X-ASG-Debug-ID: 1207182167-2f2802790000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from filer.fsl.cs.sunysb.edu (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 9B0E68EE190; Wed, 2 Apr 2008 17:22:47 -0700 (PDT) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by cuda.sgi.com with ESMTP id WjA1HFUXL6BH05FS; Wed, 02 Apr 2008 17:22:47 -0700 (PDT) Received: from josefsipek.net (baal.fsl.cs.sunysb.edu [130.245.126.78]) by filer.fsl.cs.sunysb.edu (8.12.11.20060308/8.13.1) with ESMTP id m330MiPl025856; Wed, 2 Apr 2008 20:22:44 -0400 Received: by josefsipek.net (Postfix, from userid 1000) id 933C41C00E74; Wed, 2 Apr 2008 20:22:46 -0400 (EDT) Date: Wed, 2 Apr 2008 20:22:46 -0400 From: "Josef 'Jeff' Sipek" To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 1/7] XFS: Name operation vector for hash and compare Subject: Re: [PATCH 1/7] XFS: Name operation vector for hash and compare Message-ID: <20080403002246.GB5211@josefsipek.net> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062707.797672682@chook.melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062707.797672682@chook.melbourne.sgi.com> User-Agent: Mutt/1.5.16 (2007-06-11) X-Barracuda-Connect: filer.fsl.cs.sunysb.edu[130.245.126.2] X-Barracuda-Start-Time: 1207182168 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46669 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15165 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jeffpc@josefsipek.net Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:09PM +1000, Barry Naujok wrote: ... > Index: kern_ci/fs/xfs/xfs_da_btree.h > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_da_btree.h > +++ kern_ci/fs/xfs/xfs_da_btree.h > @@ -99,6 +99,15 @@ typedef struct xfs_da_node_entry xfs_da_ > *========================================================================*/ > > /* > + * Search comparison results > + */ > +typedef enum { > + XFS_CMP_DIFFERENT, /* names are completely different */ > + XFS_CMP_EXACT, /* names are exactly the same */ > + XFS_CMP_CASE /* names are same but differ in case */ > +} xfs_dacmp_t; It is somewhat unfortunate that the "matches" case has multiple values. memcmp, strcmp, etc. return 0 if the two match, and you make >0 a match, and 0 if they don't. This is really a nitpick, and I don't think there is a way around...if everyone uses the enum all should be fine. ... > +/* > + * Name ops for directory and/or attr name operations > + */ > + > +typedef xfs_dahash_t (*xfs_hashname_t)(const uchar_t *, int); > +typedef xfs_dacmp_t (*xfs_compname_t)(const uchar_t *, int, > + const uchar_t *, int); Why have typedefs for function pointers? Sometimes, they even cause problems (I remember Eric finding a nasty 64-bit bug related to a function pointer typedef). Since IRIX isn't on the supported OS list anymore, what's the policy with coding style within XFS? ... > Index: kern_ci/fs/xfs/xfs_dir2.h > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_dir2.h > +++ kern_ci/fs/xfs/xfs_dir2.h > @@ -85,6 +85,12 @@ extern int xfs_dir_canenter(struct xfs_t > char *name, int namelen); > extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino); > > +#define xfs_dir_hashname(dp, n, l) \ > + ((dp)->i_mount->m_dirnameops->hashname((n), (l))) > + > +#define xfs_dir_compname(dp, n1, l1, n2, l2) \ > + ((dp)->i_mount->m_dirnameops->compname((n1), (l1), (n2), (l2))) #define vs. static inline... I guess this comes back to my question before...what is the coding style direction you want XFS to go in? More Linux-like (static inline)? or keep it more IRIX-like (#define)? ... > --- kern_ci.orig/fs/xfs/xfs_dir2_block.c > +++ kern_ci/fs/xfs/xfs_dir2_block.c ... > @@ -698,19 +699,33 @@ xfs_dir2_block_lookup_int( > ((char *)block + xfs_dir2_dataptr_to_off(mp, addr)); > /* > * Compare, if it's right give back buffer & entry number. > + * > + * lookup case - use nameops; > + * > + * replace/remove case - as lookup has been already been > + * performed, look for an exact match using the fast method > */ > - if (dep->namelen == args->namelen && > - dep->name[0] == args->name[0] && > - memcmp(dep->name, args->name, args->namelen) == 0) { > + cmp = args->oknoent ? > + xfs_dir_compname(dp, dep->name, dep->namelen, > + args->name, args->namelen) : > + xfs_da_compname(dep->name, dep->namelen, > + args->name, args->namelen); Initial reaction: What's going on here? if oknoent: use the mount-determined cmp function else: use case-sensitive That combined with the comment above makes it understandable...but what does "oknoent" have to do with the whole thing? Wouldn't "exact_match" be a better name? Aside from the oknoent rename, I might even turn the ?: into a if-else. > + if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) { > + args->cmpresult = cmp; > *bpp = bp; > *entno = mid; > - return 0; > + if (cmp == XFS_CMP_EXACT) > + return 0; > } I'd put a comment above the above block...reminding whoever that if you get XFS_CMP_CASE, you keep scanning to make sure you don't get XFS_CMP_EXACT. ... > @@ -1391,19 +1394,49 @@ xfs_dir2_leaf_lookup_int( > xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address))); > /* > * If it matches then return it. > + * > + * lookup case - use nameops; > + * > + * replace/remove case - as lookup has been already been > + * performed, look for an exact match using the fast method > */ > - if (dep->namelen == args->namelen && > - dep->name[0] == args->name[0] && > - memcmp(dep->name, args->name, args->namelen) == 0) { > - *dbpp = dbp; > + cmp = args->oknoent ? > + xfs_dir_compname(dp, dep->name, dep->namelen, > + args->name, args->namelen) : > + xfs_da_compname(dep->name, dep->namelen, > + args->name, args->namelen); Same as above. This code is very similar to the above...maybe they should be factored out in some cleanup patch series. ... > @@ -578,19 +579,27 @@ xfs_dir2_leafn_lookup_int( > /* > * Compare the entry, return it if it matches. > */ > - if (dep->namelen == args->namelen && > - dep->name[0] == args->name[0] && > - memcmp(dep->name, args->name, args->namelen) == 0) { > + cmp = args->oknoent ? > + xfs_dir_compname(dp, dep->name, dep->namelen, > + args->name, args->namelen): > + xfs_da_compname(dep->name, dep->namelen, > + args->name, args->namelen); And again, the same applies. :) ... > + } > } > } > } Side note: That's a lot of nesting...yuck :) Josef 'Jeff' Sipek. -- UNIX is user-friendly ... it's just selective about who it's friends are From owner-xfs@oss.sgi.com Wed Apr 2 17:39:53 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 17:40:00 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_43, J_CHICKENPOX_72 autolearn=no version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m330dpHt029476 for ; Wed, 2 Apr 2008 17:39:53 -0700 X-ASG-Debug-ID: 1207183227-353502bf0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from filer.fsl.cs.sunysb.edu (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 444658EDFE8; Wed, 2 Apr 2008 17:40:27 -0700 (PDT) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by cuda.sgi.com with ESMTP id doLVljxsRXvXV5Rx; Wed, 02 Apr 2008 17:40:27 -0700 (PDT) Received: from josefsipek.net (baal.fsl.cs.sunysb.edu [130.245.126.78]) by filer.fsl.cs.sunysb.edu (8.12.11.20060308/8.13.1) with ESMTP id m330ZcZW027519; Wed, 2 Apr 2008 20:35:38 -0400 Received: by josefsipek.net (Postfix, from userid 1000) id 4EC8D1C00E74; Wed, 2 Apr 2008 20:35:39 -0400 (EDT) Date: Wed, 2 Apr 2008 20:35:39 -0400 From: "Josef 'Jeff' Sipek" To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 2/7] XFS: ASCII case-insensitive support Subject: Re: [PATCH 2/7] XFS: ASCII case-insensitive support Message-ID: <20080403003539.GC5211@josefsipek.net> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.071715758@chook.melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062708.071715758@chook.melbourne.sgi.com> User-Agent: Mutt/1.5.16 (2007-06-11) X-Barracuda-Connect: filer.fsl.cs.sunysb.edu[130.245.126.2] X-Barracuda-Start-Time: 1207183228 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46671 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15166 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jeffpc@josefsipek.net Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:10PM +1000, Barry Naujok wrote: > Implement ASCII case-insensitive support. It's primary purpose > is for supporting existing filesystems that already use this > case-insensitive mode migrated from IRIX. But, if you only need > ASCII-only case-insensitive support (ie. English only) and will > never use another language, then this mode is perfectly adequate. > > ASCII-CI is implemented by generating hashes based on lower-case > letters and doing lower-case compares. It implements a new > xfs_nameops vector for doing the hashes and comparisons for > all filename operations. > > It also overrides the Linux dentry cache operations with its > own hash and compare functions (the same as used in the xfs_nameops > vector). > > To create a filesystem with this CI mode, use: > # mkfs.xfs -n version=ci Since you have to mkfs anyway, why not just use the unicode mkfs option, and the ci mount option. Then, you can just drop this patch :) Josef 'Jeff' Sipek. -- Ready; T=0.01/0.01 20:32:39 From owner-xfs@oss.sgi.com Wed Apr 2 17:49:16 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 17:49:23 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m330nBpS031090 for ; Wed, 2 Apr 2008 17:49:15 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA27180; Thu, 3 Apr 2008 10:49:39 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m330nbsT119762026; Thu, 3 Apr 2008 10:49:38 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m330nYeY119750342; Thu, 3 Apr 2008 10:49:34 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 3 Apr 2008 10:49:34 +1000 From: David Chinner To: Emmanuel Florac Cc: David Chinner , xfs@oss.sgi.com Subject: Re: Serious XFS crash Message-ID: <20080403004934.GM103491721@sgi.com> References: <20080325185453.3a1957dd@galadriel.home> <20080325233611.GW103491721@sgi.com> <20080401140035.46470306@galadriel.home> <20080402055831.GG103491721@sgi.com> <20080402133003.4bb043e4@galadriel.home> <20080402220750.GJ103491721@sgi.com> <20080403002248.4bd263e6@galadriel.home> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20080403002248.4bd263e6@galadriel.home> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15167 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 12:22:48AM +0200, Emmanuel Florac wrote: > Le Thu, 3 Apr 2008 08:07:50 +1000 vous écriviez: > > > I'd go and find whatever disk is located at LBA 0xE6DCA-0xE6E2A and > > replace it - if there are that many repairs needed on it, it's likely > > to be failing.... > > > > Oh, it failed and I changed it. However it's a RAID-5 and though it > appeared corrected, as you've seen the XFS fs crashed for no apparent > reason (there was little or no activity at the time of the march 23rd > crash) later. I was wondering if it could be related, for instance if > some garbage may have remained hidden somewhere and break it later, > like a standing nail waiting for someone to step on it... Yes, entirely possible. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Apr 2 18:05:50 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 18:05:58 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m3315oWx001067 for ; Wed, 2 Apr 2008 18:05:50 -0700 X-ASG-Debug-ID: 1207184783-1d1e032e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 95BB0731FE1 for ; Wed, 2 Apr 2008 18:06:23 -0700 (PDT) Received: from sandeen.net (sandeen.net [209.173.210.139]) by cuda.sgi.com with ESMTP id IVcFAq3MqrmmpOkK for ; Wed, 02 Apr 2008 18:06:23 -0700 (PDT) Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 2DD1E18003EEC; Wed, 2 Apr 2008 20:06:22 -0500 (CDT) Message-ID: <47F42D8D.3030406@sandeen.net> Date: Wed, 02 Apr 2008 20:06:21 -0500 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: David Chinner CC: xfs-oss X-ASG-Orig-Subj: [PATCH V2] combined features2 fixup patches (updating/rewriting what was sent in other threads) Subject: [PATCH V2] combined features2 fixup patches (updating/rewriting what was sent in other threads) References: <47F0546C.9070709@sandeen.net> <20080402002940.GZ103491721@sgi.com> In-Reply-To: <20080402002940.GZ103491721@sgi.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Barracuda-Connect: sandeen.net[209.173.210.139] X-Barracuda-Start-Time: 1207184786 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46672 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15168 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs (Addressing Dave's review point) Ensure "both" features2 slots are consistent, and set mp attr2 flag. Since older kernels may look in the sb_bad_features2 slot for flags, rather than zeroing it out on fixup, we should make it equal to the sb_features2 value. Also, if the ATTR2 flag was not found prior to features2 fixup, it was not set in the mount flags, so re-check after the fixup so that the current session will use the feature. Also fix up the comments to reflect these changes. Signed-off-by: Eric Sandeen --- Index: linux-2.6-xfs/fs/xfs/xfs_mount.c =================================================================== --- linux-2.6-xfs.orig/fs/xfs/xfs_mount.c +++ linux-2.6-xfs/fs/xfs/xfs_mount.c @@ -967,23 +967,32 @@ xfs_mountfs( xfs_mount_common(mp, sbp); /* - * Check for a bad features2 field alignment. This happened on - * some platforms due to xfs_sb_t not being 64bit size aligned - * when sb_features was added and hence the compiler put it in - * the wrong place. + * Check for a mismatched features2 values. Older kernels + * read & wrote into the wrong sb offset for sb_features2 + * on some platforms due to xfs_sb_t not being 64bit size aligned + * when sb_features2 was added, which made older superblock + * reading/writing routines swap it as a 64-bit value. * - * If we detect a bad field, we or the set bits into the existing - * features2 field in case it has already been modified and we - * don't want to lose any features. Zero the bad one and mark - * the two fields as needing updates once the transaction subsystem - * is online. + * For backwards compatibility, we make both slots equal. + * + * If we detect a mismatched field, we OR the set bits into the + * existing features2 field in case it has already been modified; we + * don't want to lose any features. We then update the bad location + * with the ORed value so that older kernels will see any features2 + * flags, and mark the two fields as needing updates once the + * transaction subsystem is online. */ - if (xfs_sb_has_bad_features2(sbp)) { + if (xfs_sb_has_mismatched_features2(sbp)) { cmn_err(CE_WARN, "XFS: correcting sb_features alignment problem"); sbp->sb_features2 |= sbp->sb_bad_features2; - sbp->sb_bad_features2 = 0; + sbp->sb_bad_features2 = sbp->sb_features2; update_flags |= XFS_SB_FEATURES2 | XFS_SB_BAD_FEATURES2; + /* + * Re-check for ATTR2 from the bad_features2 slot. + */ + if (xfs_sb_version_hasattr2(&mp->m_sb)) + mp->m_flags |= XFS_MOUNT_ATTR2; } /* @@ -1890,7 +1899,8 @@ xfs_uuid_unmount( /* * Used to log changes to the superblock unit and width fields which could - * be altered by the mount options. Only the first superblock is updated. + * be altered by the mount options, as well as any potential sb_features2 + * fixup. Only the first superblock is updated. */ STATIC void xfs_mount_log_sb( Index: linux-2.6-xfs/fs/xfs/xfs_sb.h =================================================================== --- linux-2.6-xfs.orig/fs/xfs/xfs_sb.h +++ linux-2.6-xfs/fs/xfs/xfs_sb.h @@ -320,11 +320,12 @@ static inline int xfs_sb_good_version(xf #endif /* __KERNEL__ */ /* - * Detect a bad features2 field + * Detect a mismatched features2 field. Older kernels read/wrote + * this into the wrong slot, so to be safe we keep them in sync. */ -static inline int xfs_sb_has_bad_features2(xfs_sb_t *sbp) +static inline int xfs_sb_has_mismatched_features2(xfs_sb_t *sbp) { - return (sbp->sb_bad_features2 != 0); + return (sbp->sb_bad_features2 != sbp->sb_features2); } static inline unsigned xfs_sb_version_tonew(unsigned v) From owner-xfs@oss.sgi.com Wed Apr 2 18:28:47 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 18:28:58 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m331Shv4004533 for ; Wed, 2 Apr 2008 18:28:45 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA28254; Thu, 3 Apr 2008 11:29:15 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m331TDsT118615431; Thu, 3 Apr 2008 11:29:14 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m331TC8L119826180; Thu, 3 Apr 2008 11:29:12 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 3 Apr 2008 11:29:12 +1000 From: David Chinner To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 1/7] XFS: Name operation vector for hash and compare Message-ID: <20080403012912.GO103491721@sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062707.797672682@chook.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062707.797672682@chook.melbourne.sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15169 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:09PM +1000, Barry Naujok wrote: > Adds two pieces of functionality for the basis of case-insensitive > support in XFS: > > 1. A comparison result enumerated type: xfs_dacmp_t. It represents an > exact match, case-insensitive match or no match at all. This patch > only implements different and exact results. > > 2. xfs_nameops vector for specifying how to perform the hash generation > of filenames and comparision methods. In this patch the hash vector > points to the existing xfs_da_hashname function and the comparison > method does a length compare, and if the same, does a memcmp and > return the xfs_dacmp_t result. > > All filename functions that use the hash (create, lookup remove, rename, > etc) now use the xfs_nameops.hashname function and all directory lookup > functions also use the xfs_nameops.compname function. Ok, so internally I see this is not the case. I'll comment on that inline. > The lookup functions also handle case-insensitive results even though > the default comparison function cannot return that. And important > aspect of the lookup functions is that an exact match always has > precedence over a case-insensitive. So while a case-insensitive match > is found, we have to keep looking just in case there is an exact > match. In the meantime, the info for the first case-insensitive match > is retained if no exact match is found. > > Signed-off-by: Barry Naujok ...... > } > > +xfs_dacmp_t > +xfs_da_compname(const uchar_t *name1, int len1, const uchar_t *name2, int len2) > +{ > + return (len1 == len2 && memcmp(name1, name2, len1) == 0) ? > + XFS_CMP_EXACT : XFS_CMP_DIFFERENT; > +} > + > +struct xfs_nameops xfs_default_nameops = { const. > #ifdef __KERNEL__ > /*======================================================================== > @@ -248,7 +271,12 @@ xfs_daddr_t xfs_da_reada_buf(struct xfs_ > int xfs_da_shrink_inode(xfs_da_args_t *args, xfs_dablk_t dead_blkno, > xfs_dabuf_t *dead_buf); > > +extern struct xfs_nameops xfs_default_nameops; Does this need global visibility? It's only needed in xfs_dir_mount(), right? > Index: kern_ci/fs/xfs/xfs_dir2.h > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_dir2.h > +++ kern_ci/fs/xfs/xfs_dir2.h > @@ -85,6 +85,12 @@ extern int xfs_dir_canenter(struct xfs_t > char *name, int namelen); > extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino); > > +#define xfs_dir_hashname(dp, n, l) \ > + ((dp)->i_mount->m_dirnameops->hashname((n), (l))) > + > +#define xfs_dir_compname(dp, n1, l1, n2, l2) \ > + ((dp)->i_mount->m_dirnameops->compname((n1), (l1), (n2), (l2))) > + Static inline functions, please. > /* > * Utility routines for v2 directories. > */ > Index: kern_ci/fs/xfs/xfs_dir2_block.c > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_dir2_block.c > +++ kern_ci/fs/xfs/xfs_dir2_block.c > @@ -643,6 +643,7 @@ xfs_dir2_block_lookup_int( > int mid; /* binary search current idx */ > xfs_mount_t *mp; /* filesystem mount point */ > xfs_trans_t *tp; /* transaction pointer */ > + xfs_dacmp_t cmp; /* comparison result */ > > dp = args->dp; > tp = args->trans; > @@ -698,19 +699,33 @@ xfs_dir2_block_lookup_int( > ((char *)block + xfs_dir2_dataptr_to_off(mp, addr)); > /* > * Compare, if it's right give back buffer & entry number. > + * > + * lookup case - use nameops; > + * > + * replace/remove case - as lookup has been already been > + * performed, look for an exact match using the fast method > */ > - if (dep->namelen == args->namelen && > - dep->name[0] == args->name[0] && > - memcmp(dep->name, args->name, args->namelen) == 0) { > + cmp = args->oknoent ? > + xfs_dir_compname(dp, dep->name, dep->namelen, > + args->name, args->namelen) : > + xfs_da_compname(dep->name, dep->namelen, > + args->name, args->namelen); Why add this "fast path"? All you're saving here is a few instructions but making the code much harder to follow. cmp = xfs_dir_compname(dp, dep->name, dep->namelen, args->name, args->namelen); Will do exactly the same thing and I'd much prefer readable code over prematurely optimised code any day of the week.... > + if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) { > + args->cmpresult = cmp; > *bpp = bp; > *entno = mid; > - return 0; > + if (cmp == XFS_CMP_EXACT) > + return 0; > } > - } while (++mid < be32_to_cpu(btp->count) && be32_to_cpu(blp[mid].hashval) == hash); > + } while (++mid < be32_to_cpu(btp->count) && > + be32_to_cpu(blp[mid].hashval) == hash); > + > + ASSERT(args->oknoent); > + if (args->cmpresult == XFS_CMP_CASE) > + return 0; So if we find multiple case matches, we'll take the last we find? > /* > * No match, release the buffer and return ENOENT. > */ > - ASSERT(args->oknoent); > xfs_da_brelse(tp, bp); > return XFS_ERROR(ENOENT); Should we really be promoting that assert to before we return a successful case match? > @@ -1391,19 +1394,49 @@ xfs_dir2_leaf_lookup_int( > xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address))); > /* > * If it matches then return it. > + * > + * lookup case - use nameops; > + * > + * replace/remove case - as lookup has been already been > + * performed, look for an exact match using the fast method > */ > - if (dep->namelen == args->namelen && > - dep->name[0] == args->name[0] && > - memcmp(dep->name, args->name, args->namelen) == 0) { > - *dbpp = dbp; > + cmp = args->oknoent ? > + xfs_dir_compname(dp, dep->name, dep->namelen, > + args->name, args->namelen) : > + xfs_da_compname(dep->name, dep->namelen, > + args->name, args->namelen); Same again. > + if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) { > + args->cmpresult = cmp; > *indexp = index; > - return 0; > + if (cmp == XFS_CMP_EXACT) { > + /* > + * case exact match: release the case-insens. > + * match buffer if it exists and return the > + * current data buffer. > + */ > + if (cbp && cbp != dbp) > + xfs_da_brelse(tp, cbp); > + *dbpp = dbp; > + return 0; > + } > + cbp = dbp; > } > } > + ASSERT(args->oknoent); > + if (args->cmpresult == XFS_CMP_CASE) { > + /* > + * case-insensitive match: release current buffer and > + * return the buffer with the case-insensitive match. > + */ > + if (cbp != dbp) > + xfs_da_brelse(tp, dbp); > + *dbpp = cbp; > + return 0; > + } > /* > * No match found, return ENOENT. > */ > - ASSERT(args->oknoent); Same question about promoting the assert.... > @@ -578,19 +579,27 @@ xfs_dir2_leafn_lookup_int( > /* > * Compare the entry, return it if it matches. > */ > - if (dep->namelen == args->namelen && > - dep->name[0] == args->name[0] && > - memcmp(dep->name, args->name, args->namelen) == 0) { > + cmp = args->oknoent ? > + xfs_dir_compname(dp, dep->name, dep->namelen, > + args->name, args->namelen): > + xfs_da_compname(dep->name, dep->namelen, > + args->name, args->namelen); Same again. > @@ -907,9 +914,8 @@ xfs_dir2_sf_removename( > for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); > i < sfp->hdr.count; > i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) { > - if (sfep->namelen == args->namelen && > - sfep->name[0] == args->name[0] && > - memcmp(sfep->name, args->name, args->namelen) == 0) { > + if (xfs_da_compname(sfep->name, sfep->namelen, > + args->name, args->namelen) == XFS_CMP_EXACT) { > ASSERT(xfs_dir2_sf_get_inumber(sfp, > xfs_dir2_sf_inumberp(sfep)) == > args->inumber); This only checks for an exact match - what is supposed to happen with a XFS_CMP_CASE return? > @@ -1044,9 +1050,9 @@ xfs_dir2_sf_replace( > for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); > i < sfp->hdr.count; > i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) { > - if (sfep->namelen == args->namelen && > - sfep->name[0] == args->name[0] && > - memcmp(args->name, sfep->name, args->namelen) == 0) { > + if (xfs_da_compname(sfep->name, sfep->namelen, > + args->name, args->namelen) == > + XFS_CMP_EXACT) { ditto. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Apr 2 18:32:08 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 18:32:25 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m331W4RU005120 for ; Wed, 2 Apr 2008 18:32:06 -0700 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA28516; Thu, 3 Apr 2008 11:32:35 +1000 To: "Josef 'Jeff' Sipek" Subject: Re: [PATCH 7/7] XFS: NLS config option From: "Barry Naujok" Organization: SGI Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062709.577869936@chook.melbourne.sgi.com> <20080403012610.GD5211@josefsipek.net> Date: Thu, 03 Apr 2008 11:38:11 +1000 Message-ID: In-Reply-To: <20080403012610.GD5211@josefsipek.net> User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id m331W8RU005134 X-archive-position: 15170 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Thu, 03 Apr 2008 11:26:10 +1000, Josef 'Jeff' Sipek wrote: > On Wed, Apr 02, 2008 at 04:25:15PM +1000, Barry Naujok wrote: >> This optional patch implements the NLS support as a CONFIG option. > > Any reason this is a separate patch, and not part of the previous > patches? > > ... >> --- kern_ci.orig/fs/xfs/Kconfig >> +++ kern_ci/fs/xfs/Kconfig >> @@ -87,6 +87,16 @@ config XFS_UNICODE >> >> If you don't require UTF-8 enforcement, say N. >> >> +config XFS_UNICODE_NLS >> + bool "XFS NLS Unicode support >> + depends on XFS_UNICODE >> + help >> + NLS (Native Language Support) allows non-UTF8 locales to >> + interact with XFS Unicode support. To specify the character >> + set being used, use the "-n nls=" mount option. > > "mount option"? Or was that supposed to say mkfs? Hmm... typo: "-o nls=" in mount :) > from mount(8) manpage: > > -n Mount without writing in /etc/mtab. This is necessary for > example when /etc is on a read-only file system. > > ... >> Index: kern_ci/fs/xfs/xfs_unicode.h >> =================================================================== >> --- kern_ci.orig/fs/xfs/xfs_unicode.h >> +++ kern_ci/fs/xfs/xfs_unicode.h >> @@ -65,6 +65,8 @@ int xfs_unicode_validate(const uchar_t * >> int xfs_unicode_read_cft(struct xfs_mount *mp); >> void xfs_unicode_free_cft(const xfs_cft_t *cft); >> >> +#ifdef CONFIG_XFS_UNICODE_NLS >> + >> #define xfs_is_using_nls(mp) ((mp)->m_nls != NULL) >> >> int xfs_unicode_to_nls(struct xfs_mount *mp, const uchar_t *uni_name, >> @@ -73,7 +75,20 @@ int xfs_nls_to_unicode(struct xfs_mount >> int nls_namelen, const uchar_t **uni_name, int *uni_namelen); >> void xfs_unicode_nls_free(const uchar_t *src_name, const uchar_t >> *conv_name); >> >> -#else >> +#else /* CONFIG_XFS_UNICODE_NLS */ >> + >> +#define xfs_is_using_nls(mp) 0 >> + >> +#define xfs_unicode_to_nls(mp, uname, ulen, pnname, pnlen) \ >> + ((*(pnname)) = (uname), (*(pnlen)) = (ulen), 0) >> +#define xfs_nls_to_unicode(mp, nname, nlen, puname, pulen) \ >> + ((*(puname)) = (nname), (*(pulen)) = (nlen), \ >> + xfs_unicode_validate(nname, nlen)) > > While I commend your use of the comma operator, I really think those > should > be static inlines :) > > Josef 'Jeff' Sipek. > From owner-xfs@oss.sgi.com Wed Apr 2 18:39:30 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 18:39:43 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m331dNOs006463 for ; Wed, 2 Apr 2008 18:39:28 -0700 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA28728; Thu, 3 Apr 2008 11:39:54 +1000 Date: Thu, 03 Apr 2008 11:45:33 +1000 To: "David Chinner" Subject: Re: [PATCH 1/7] XFS: Name operation vector for hash and compare From: "Barry Naujok" Organization: SGI Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062707.797672682@chook.melbourne.sgi.com> <20080403012912.GO103491721@sgi.com> Message-ID: In-Reply-To: <20080403012912.GO103491721@sgi.com> User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id m331dUOs006498 X-archive-position: 15171 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Thu, 03 Apr 2008 11:29:12 +1000, David Chinner wrote: > On Wed, Apr 02, 2008 at 04:25:09PM +1000, Barry Naujok wrote: >> Adds two pieces of functionality for the basis of case-insensitive >> support in XFS: >> >> 1. A comparison result enumerated type: xfs_dacmp_t. It represents an >> exact match, case-insensitive match or no match at all. This patch >> only implements different and exact results. >> >> 2. xfs_nameops vector for specifying how to perform the hash generation >> of filenames and comparision methods. In this patch the hash vector >> points to the existing xfs_da_hashname function and the comparison >> method does a length compare, and if the same, does a memcmp and >> return the xfs_dacmp_t result. >> >> All filename functions that use the hash (create, lookup remove, rename, >> etc) now use the xfs_nameops.hashname function and all directory lookup >> functions also use the xfs_nameops.compname function. > > Ok, so internally I see this is not the case. I'll comment on that > inline. Ah yes. Remove and rename rely on an exact match. Forgot about that when documenting this patch. >> The lookup functions also handle case-insensitive results even though >> the default comparison function cannot return that. And important >> aspect of the lookup functions is that an exact match always has >> precedence over a case-insensitive. So while a case-insensitive match >> is found, we have to keep looking just in case there is an exact >> match. In the meantime, the info for the first case-insensitive match >> is retained if no exact match is found. >> >> Signed-off-by: Barry Naujok > ...... >> } >> >> +xfs_dacmp_t >> +xfs_da_compname(const uchar_t *name1, int len1, const uchar_t *name2, >> int len2) >> +{ >> + return (len1 == len2 && memcmp(name1, name2, len1) == 0) ? >> + XFS_CMP_EXACT : XFS_CMP_DIFFERENT; >> +} >> + >> +struct xfs_nameops xfs_default_nameops = { > > const. > >> #ifdef __KERNEL__ >> /*======================================================================== >> @@ -248,7 +271,12 @@ xfs_daddr_t xfs_da_reada_buf(struct xfs_ >> int xfs_da_shrink_inode(xfs_da_args_t *args, xfs_dablk_t dead_blkno, >> xfs_dabuf_t *dead_buf); >> >> +extern struct xfs_nameops xfs_default_nameops; > > Does this need global visibility? It's only needed in xfs_dir_mount(), > right? Good point, I'll fix this. >> Index: kern_ci/fs/xfs/xfs_dir2.h >> =================================================================== >> --- kern_ci.orig/fs/xfs/xfs_dir2.h >> +++ kern_ci/fs/xfs/xfs_dir2.h >> @@ -85,6 +85,12 @@ extern int xfs_dir_canenter(struct xfs_t >> char *name, int namelen); >> extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino); >> >> +#define xfs_dir_hashname(dp, n, l) \ >> + ((dp)->i_mount->m_dirnameops->hashname((n), (l))) >> + >> +#define xfs_dir_compname(dp, n1, l1, n2, l2) \ >> + ((dp)->i_mount->m_dirnameops->compname((n1), (l1), (n2), (l2))) >> + > > Static inline functions, please. Ok. >> /* >> * Utility routines for v2 directories. >> */ >> Index: kern_ci/fs/xfs/xfs_dir2_block.c >> =================================================================== >> --- kern_ci.orig/fs/xfs/xfs_dir2_block.c >> +++ kern_ci/fs/xfs/xfs_dir2_block.c >> @@ -643,6 +643,7 @@ xfs_dir2_block_lookup_int( >> int mid; /* binary search current idx */ >> xfs_mount_t *mp; /* filesystem mount point */ >> xfs_trans_t *tp; /* transaction pointer */ >> + xfs_dacmp_t cmp; /* comparison result */ >> >> dp = args->dp; >> tp = args->trans; >> @@ -698,19 +699,33 @@ xfs_dir2_block_lookup_int( >> ((char *)block + xfs_dir2_dataptr_to_off(mp, addr)); >> /* >> * Compare, if it's right give back buffer & entry number. >> + * >> + * lookup case - use nameops; >> + * >> + * replace/remove case - as lookup has been already been >> + * performed, look for an exact match using the fast method >> */ >> - if (dep->namelen == args->namelen && >> - dep->name[0] == args->name[0] && >> - memcmp(dep->name, args->name, args->namelen) == 0) { >> + cmp = args->oknoent ? >> + xfs_dir_compname(dp, dep->name, dep->namelen, >> + args->name, args->namelen) : >> + xfs_da_compname(dep->name, dep->namelen, >> + args->name, args->namelen); > > Why add this "fast path"? All you're saving here is a few > instructions but making the code much harder to follow. > > cmp = xfs_dir_compname(dp, dep->name, dep->namelen, > args->name, args->namelen); > > Will do exactly the same thing and I'd much prefer readable code > over prematurely optimised code any day of the week.... Ok, I'll change that code (might make it more CONFIG_XFS_CI capable ;) ) >> + if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) { >> + args->cmpresult = cmp; >> *bpp = bp; >> *entno = mid; >> - return 0; >> + if (cmp == XFS_CMP_EXACT) >> + return 0; >> } >> - } while (++mid < be32_to_cpu(btp->count) && >> be32_to_cpu(blp[mid].hashval) == hash); >> + } while (++mid < be32_to_cpu(btp->count) && >> + be32_to_cpu(blp[mid].hashval) == hash); >> + >> + ASSERT(args->oknoent); >> + if (args->cmpresult == XFS_CMP_CASE) >> + return 0; > > So if we find multiple case matches, we'll take the last we find? No, the first as *bpp and *entno is only set for the first case-insensitive match or overriden for an exact match. >> /* >> * No match, release the buffer and return ENOENT. >> */ >> - ASSERT(args->oknoent); >> xfs_da_brelse(tp, bp); >> return XFS_ERROR(ENOENT); > > Should we really be promoting that assert to before we return a > successful > case match? Yes as a !args->oknoent has to find an exact match. It's a big failure otherwise (ie. remove/rename case). >> @@ -907,9 +914,8 @@ xfs_dir2_sf_removename( >> for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); >> i < sfp->hdr.count; >> i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) { >> - if (sfep->namelen == args->namelen && >> - sfep->name[0] == args->name[0] && >> - memcmp(sfep->name, args->name, args->namelen) == 0) { >> + if (xfs_da_compname(sfep->name, sfep->namelen, >> + args->name, args->namelen) == XFS_CMP_EXACT) { >> ASSERT(xfs_dir2_sf_get_inumber(sfp, >> xfs_dir2_sf_inumberp(sfep)) == >> args->inumber); > > This only checks for an exact match - what is supposed to happen > with a XFS_CMP_CASE return? > >> @@ -1044,9 +1050,9 @@ xfs_dir2_sf_replace( >> for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); >> i < sfp->hdr.count; >> i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) { >> - if (sfep->namelen == args->namelen && >> - sfep->name[0] == args->name[0] && >> - memcmp(args->name, sfep->name, args->namelen) == 0) { >> + if (xfs_da_compname(sfep->name, sfep->namelen, >> + args->name, args->namelen) == >> + XFS_CMP_EXACT) { > > ditto. Like I stated above, remove/rename (replace) require an exact match. From owner-xfs@oss.sgi.com Wed Apr 2 18:43:09 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 18:43:19 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m331h8kb007358 for ; Wed, 2 Apr 2008 18:43:09 -0700 X-ASG-Debug-ID: 1207187022-419400b30000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from filer.fsl.cs.sunysb.edu (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 558B88EE8ED; Wed, 2 Apr 2008 18:43:42 -0700 (PDT) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by cuda.sgi.com with ESMTP id sD50uAQTrVQ5wOLt; Wed, 02 Apr 2008 18:43:42 -0700 (PDT) Received: from josefsipek.net (baal.fsl.cs.sunysb.edu [130.245.126.78]) by filer.fsl.cs.sunysb.edu (8.12.11.20060308/8.13.1) with ESMTP id m331Q8CI002431; Wed, 2 Apr 2008 21:26:08 -0400 Received: by josefsipek.net (Postfix, from userid 1000) id 3E09D1C00E74; Wed, 2 Apr 2008 21:26:10 -0400 (EDT) Date: Wed, 2 Apr 2008 21:26:10 -0400 From: "Josef 'Jeff' Sipek" To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 7/7] XFS: NLS config option Subject: Re: [PATCH 7/7] XFS: NLS config option Message-ID: <20080403012610.GD5211@josefsipek.net> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062709.577869936@chook.melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062709.577869936@chook.melbourne.sgi.com> User-Agent: Mutt/1.5.16 (2007-06-11) X-Barracuda-Connect: filer.fsl.cs.sunysb.edu[130.245.126.2] X-Barracuda-Start-Time: 1207187025 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46675 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15172 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jeffpc@josefsipek.net Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:15PM +1000, Barry Naujok wrote: > This optional patch implements the NLS support as a CONFIG option. Any reason this is a separate patch, and not part of the previous patches? ... > --- kern_ci.orig/fs/xfs/Kconfig > +++ kern_ci/fs/xfs/Kconfig > @@ -87,6 +87,16 @@ config XFS_UNICODE > > If you don't require UTF-8 enforcement, say N. > > +config XFS_UNICODE_NLS > + bool "XFS NLS Unicode support > + depends on XFS_UNICODE > + help > + NLS (Native Language Support) allows non-UTF8 locales to > + interact with XFS Unicode support. To specify the character > + set being used, use the "-n nls=" mount option. "mount option"? Or was that supposed to say mkfs? from mount(8) manpage: -n Mount without writing in /etc/mtab. This is necessary for example when /etc is on a read-only file system. ... > Index: kern_ci/fs/xfs/xfs_unicode.h > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_unicode.h > +++ kern_ci/fs/xfs/xfs_unicode.h > @@ -65,6 +65,8 @@ int xfs_unicode_validate(const uchar_t * > int xfs_unicode_read_cft(struct xfs_mount *mp); > void xfs_unicode_free_cft(const xfs_cft_t *cft); > > +#ifdef CONFIG_XFS_UNICODE_NLS > + > #define xfs_is_using_nls(mp) ((mp)->m_nls != NULL) > > int xfs_unicode_to_nls(struct xfs_mount *mp, const uchar_t *uni_name, > @@ -73,7 +75,20 @@ int xfs_nls_to_unicode(struct xfs_mount > int nls_namelen, const uchar_t **uni_name, int *uni_namelen); > void xfs_unicode_nls_free(const uchar_t *src_name, const uchar_t *conv_name); > > -#else > +#else /* CONFIG_XFS_UNICODE_NLS */ > + > +#define xfs_is_using_nls(mp) 0 > + > +#define xfs_unicode_to_nls(mp, uname, ulen, pnname, pnlen) \ > + ((*(pnname)) = (uname), (*(pnlen)) = (ulen), 0) > +#define xfs_nls_to_unicode(mp, nname, nlen, puname, pulen) \ > + ((*(puname)) = (nname), (*(pulen)) = (nlen), \ > + xfs_unicode_validate(nname, nlen)) While I commend your use of the comma operator, I really think those should be static inlines :) Josef 'Jeff' Sipek. -- Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. - Albert Einstein From owner-xfs@oss.sgi.com Wed Apr 2 18:53:09 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 18:53:15 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_43, J_CHICKENPOX_72 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m331r3Cv009081 for ; Wed, 2 Apr 2008 18:53:06 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA29194; Thu, 3 Apr 2008 11:53:34 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m331rXsT119906104; Thu, 3 Apr 2008 11:53:34 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m331rVY5115191785; Thu, 3 Apr 2008 11:53:31 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 3 Apr 2008 11:53:31 +1000 From: David Chinner To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 2/7] XFS: ASCII case-insensitive support Message-ID: <20080403015331.GP103491721@sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.071715758@chook.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062708.071715758@chook.melbourne.sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15173 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:10PM +1000, Barry Naujok wrote: > Implement ASCII case-insensitive support. It's primary purpose > is for supporting existing filesystems that already use this > case-insensitive mode migrated from IRIX. But, if you only need > ASCII-only case-insensitive support (ie. English only) and will > never use another language, then this mode is perfectly adequate. > > ASCII-CI is implemented by generating hashes based on lower-case > letters and doing lower-case compares. It implements a new > xfs_nameops vector for doing the hashes and comparisons for > all filename operations. > > It also overrides the Linux dentry cache operations with its > own hash and compare functions (the same as used in the xfs_nameops > vector). > > To create a filesystem with this CI mode, use: > # mkfs.xfs -n version=ci > > Signed-off-by: Barry Naujok > > --- > fs/xfs/linux-2.6/xfs_iops.c | 46 +++++++++++++++++++++++++++++++++++++- > fs/xfs/linux-2.6/xfs_linux.h | 1 > fs/xfs/linux-2.6/xfs_super.c | 4 +++ > fs/xfs/xfs_dir2.c | 52 ++++++++++++++++++++++++++++++++++++++++++- > fs/xfs/xfs_fs.h | 1 > fs/xfs/linux-2.6/xfs_iops.c | 46 +++++++++++++++++++++++++++++++++++++- > fs/xfs/linux-2.6/xfs_linux.h | 1 > fs/xfs/linux-2.6/xfs_super.c | 4 +++ > fs/xfs/xfs_dir2.c | 52 ++++++++++++++++++++++++++++++++++++++++++- > fs/xfs/xfs_fs.h | 1 > fs/xfs/xfs_fsops.c | 4 ++- > fs/xfs/xfs_sb.h | 10 +++++++- > 7 files changed, 114 insertions(+), 4 deletions(-) > > Index: kern_ci/fs/xfs/linux-2.6/xfs_iops.c > =================================================================== > --- kern_ci.orig/fs/xfs/linux-2.6/xfs_iops.c > +++ kern_ci/fs/xfs/linux-2.6/xfs_iops.c > @@ -47,6 +47,7 @@ > #include "xfs_buf_item.h" > #include "xfs_utils.h" > #include "xfs_vnodeops.h" > +#include "xfs_da_btree.h" > > #include > #include > @@ -54,6 +55,8 @@ > #include > #include > > +struct dentry_operations xfs_ci_dentry_operations; static? > + > +STATIC int > +xfs_ci_dentry_hash( > + struct dentry *dir, > + struct qstr *this) > +{ > + this->hash = xfs_dir_hashname(XFS_I(dir->d_inode), > + this->name, this->len); > + return 0; > +} > + > +STATIC int > +xfs_ci_dentry_compare( > + struct dentry *dir, > + struct qstr *a, > + struct qstr *b) > +{ > + int result = xfs_dir_compname(XFS_I(dir->d_inode), a->name, a->len, > + b->name, b->len) == XFS_CMP_DIFFERENT; > + /* > + * result == 0 if a match is found, and if so, copy the name in "b" > + * to "a" to cope with negative dentries getting the correct name. > + */ > + if (result == 0) > + memcpy((unsigned char *)a->name, b->name, a->len); > + return result; > +} large comment in the middle of a 5 line function? Move it above the function. Also should not need a cast in memcpy().... /* * xfs_dir_compname will return 0 if a match is found. If so, we * need to copy the name in "b" to "a" to cope with negative dentries * getting the correct name. */ STATIC int xfs_ci_dentry_compare( struct dentry *dir, struct qstr *a, struct qstr *b) { int result; result = xfs_dir_compname(XFS_I(dir->d_inode), a->name, a->len, b->name, b->len) == XFS_CMP_DIFFERENT; if (!result) memcpy(a->name, b->name, a->len); return result; } > + > +struct dentry_operations xfs_ci_dentry_operations = > +{ > + .d_hash = xfs_ci_dentry_hash, > + .d_compare = xfs_ci_dentry_compare, > +}; static. You should probably move these functions and declarations to before xfs_ci_dentry_operations is used so you can avoid the forward declaration.... > =================================================================== > --- kern_ci.orig/fs/xfs/linux-2.6/xfs_super.c > +++ kern_ci/fs/xfs/linux-2.6/xfs_super.c > @@ -67,6 +67,8 @@ static kmem_zone_t *xfs_vnode_zone; > static kmem_zone_t *xfs_ioend_zone; > mempool_t *xfs_ioend_pool; > > +extern struct dentry_operations xfs_ci_dentry_operations; > + > STATIC struct xfs_mount_args * > xfs_args_allocate( > struct super_block *sb, > @@ -1359,6 +1361,8 @@ xfs_fs_fill_super( > error = ENOMEM; > goto fail_vnrele; > } > + if (xfs_sb_version_hasoldci(&mp->m_sb)) > + sb->s_root->d_op = &xfs_ci_dentry_operations; Write a helper function for this: xfs_set_ci_dentry_ops(mp, dentry) rather than exporting the xfs_ci_dentry_operations structure. > > +/* > + * V1/OLDCI case-insensitive support for directories > + * > + * This is ASCII only case support, ie. A-Z. > + */ I'd mention that this is legacy code for supporting the Irix format CI. > @@ -629,7 +631,7 @@ xfs_fs_goingdown( > xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT); > thaw_bdev(sb->s_bdev, sb); > } > - > + > break; random whitespace change? > Index: kern_ci/fs/xfs/xfs_sb.h > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_sb.h > +++ kern_ci/fs/xfs/xfs_sb.h > @@ -46,10 +46,12 @@ struct xfs_mount; > #define XFS_SB_VERSION_SECTORBIT 0x0800 > #define XFS_SB_VERSION_EXTFLGBIT 0x1000 > #define XFS_SB_VERSION_DIRV2BIT 0x2000 > +#define XFS_SB_VERSION_OLDCIBIT 0x4000 /* ASCII only case-insens. */ > #define XFS_SB_VERSION_MOREBITSBIT 0x8000 > #define XFS_SB_VERSION_OKSASHFBITS \ Whitespace. But it's a shame you're being sensible about this - I kinda liked the Irix name for this feature (XFS_SB_VERSION_BORGBIT). :) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Apr 2 18:55:01 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 18:55:18 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m331sxkj009624 for ; Wed, 2 Apr 2008 18:55:01 -0700 X-ASG-Debug-ID: 1207187734-418201080000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from filer.fsl.cs.sunysb.edu (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id DE4DF8EEF36; Wed, 2 Apr 2008 18:55:34 -0700 (PDT) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by cuda.sgi.com with ESMTP id p6HSj12X9o3KBTNo; Wed, 02 Apr 2008 18:55:34 -0700 (PDT) Received: from josefsipek.net (baal.fsl.cs.sunysb.edu [130.245.126.78]) by filer.fsl.cs.sunysb.edu (8.12.11.20060308/8.13.1) with ESMTP id m331pKJv005900; Wed, 2 Apr 2008 21:51:20 -0400 Received: by josefsipek.net (Postfix, from userid 1000) id 7A59D1C00E74; Wed, 2 Apr 2008 21:51:22 -0400 (EDT) Date: Wed, 2 Apr 2008 21:51:22 -0400 From: "Josef 'Jeff' Sipek" To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 3/7] XFS: Refactor node format directory lookup/addname Subject: Re: [PATCH 3/7] XFS: Refactor node format directory lookup/addname Message-ID: <20080403015122.GE5211@josefsipek.net> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.380299192@chook.melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062708.380299192@chook.melbourne.sgi.com> User-Agent: Mutt/1.5.16 (2007-06-11) X-Barracuda-Connect: filer.fsl.cs.sunysb.edu[130.245.126.2] X-Barracuda-Start-Time: 1207187734 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46675 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15174 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jeffpc@josefsipek.net Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:11PM +1000, Barry Naujok wrote: ... > --- kern_ci.orig/fs/xfs/xfs_dir2_node.c > +++ kern_ci/fs/xfs/xfs_dir2_node.c ... > @@ -432,27 +429,15 @@ xfs_dir2_leafn_lookup_int( > /* > * Do we have a buffer coming in? > */ > - if (state->extravalid) > - curbp = state->extrablk.bp; > - else > - curbp = NULL; > + curbp = state->extravalid ? state->extrablk.bp : NULL; > /* > * For addname, it's a free block buffer, get the block number. > */ > - if (args->addname) { > - curfdb = curbp ? state->extrablk.blkno : -1; > - curdb = -1; > - length = xfs_dir2_data_entsize(args->namelen); > - if ((free = (curbp ? curbp->data : NULL))) > - ASSERT(be32_to_cpu(free->hdr.magic) == XFS_DIR2_FREE_MAGIC); > - } > - /* > - * For others, it's a data block buffer, get the block number. > - */ > - else { > - curfdb = -1; > - curdb = curbp ? state->extrablk.blkno : -1; > - } > + curfdb = curbp ? state->extrablk.blkno : -1; > + free = curbp ? curbp->data : NULL; The previous 3 lines can be cleaned up as: if (state->extravalid) curbp = state->extrablk.bp; else curbp = NULL; if (curbp) { curfdb = state->extrablk.blkno; free = curbp->data; } else { curfdb = -1; free = NULL; } or, if (state->extravalid && state->extrablk.bp == NULL) is _ALWAYS_ false (which seems to be the case), you can do: if (state->extravalid) { curbp = state->extrablk.bp; curfdb = state->extrablk.blkno; free = curbp->data; } else { curbp = NULL; curfdb = -1; free = NULL; } ... > +static int > +xfs_dir2_leafn_lookup_for_entry( > + xfs_dabuf_t *bp, /* leaf buffer */ > + xfs_da_args_t *args, /* operation arguments */ > + int *indexp, /* out: leaf entry index */ > + xfs_da_state_t *state) /* state to fill in */ > +{ > + xfs_dabuf_t *curbp; /* current data/free buffer */ > + xfs_dir2_db_t curdb; /* current data block number */ > + xfs_dir2_data_entry_t *dep; /* data block entry */ > + xfs_inode_t *dp; /* incore directory inode */ > + int error; /* error return value */ > + int index; /* leaf entry index */ > + xfs_dir2_leaf_t *leaf; /* leaf structure */ > + xfs_dir2_leaf_entry_t *lep; /* leaf entry */ > + xfs_mount_t *mp; /* filesystem mount point */ > + xfs_dir2_db_t newdb; /* new data block number */ > + xfs_trans_t *tp; /* transaction pointer */ > + xfs_dacmp_t cmp; /* comparison result */ > + xfs_dabuf_t *ci_bp = NULL; /* buffer with CI match */ Did you try to check the stack usage (scripts/checkstack.pl)? > + dp = args->dp; > + tp = args->trans; > + mp = dp->i_mount; > + leaf = bp->data; > + ASSERT(be16_to_cpu(leaf->hdr.info.magic) == XFS_DIR2_LEAFN_MAGIC); > +#ifdef __KERNEL__ > + ASSERT(be16_to_cpu(leaf->hdr.count) > 0); > +#endif What's this #ifdef for? > + dep = (xfs_dir2_data_entry_t *)((char *)curbp->data + > + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address))); Perhaps a static inline to do this calculation more cleanly (assuming it's done elsewhere as well). ... > + /* > + * Compare the entry, return it if it matches. > + */ > + cmp = args->oknoent ? > + xfs_dir_compname(dp, dep->name, dep->namelen, > + args->name, args->namelen): > + xfs_da_compname(dep->name, dep->namelen, > + args->name, args->namelen); same as comment for 1/7. Josef 'Jeff' Sipek. -- My public GPG key can be found at http://www.josefsipek.net/gpg/public-0xC7958FFE.txt From owner-xfs@oss.sgi.com Wed Apr 2 19:50:17 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 19:50:26 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m332oDTX017472 for ; Wed, 2 Apr 2008 19:50:17 -0700 X-ASG-Debug-ID: 1207191049-3d3902c40000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from filer.fsl.cs.sunysb.edu (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7A8028F531D; Wed, 2 Apr 2008 19:50:49 -0700 (PDT) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by cuda.sgi.com with ESMTP id ro5Uvm3cpJTNlxn4; Wed, 02 Apr 2008 19:50:49 -0700 (PDT) Received: from josefsipek.net (baal.fsl.cs.sunysb.edu [130.245.126.78]) by filer.fsl.cs.sunysb.edu (8.12.11.20060308/8.13.1) with ESMTP id m332YWXC011125; Wed, 2 Apr 2008 22:34:33 -0400 Received: by josefsipek.net (Postfix, from userid 1000) id BBBC31C00E74; Wed, 2 Apr 2008 22:34:34 -0400 (EDT) Date: Wed, 2 Apr 2008 22:34:34 -0400 From: "Josef 'Jeff' Sipek" To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 4/7] XFS: Return case-insensitive match for dentry cache Subject: Re: [PATCH 4/7] XFS: Return case-insensitive match for dentry cache Message-ID: <20080403023434.GF5211@josefsipek.net> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.654277049@chook.melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062708.654277049@chook.melbourne.sgi.com> User-Agent: Mutt/1.5.16 (2007-06-11) X-Barracuda-Connect: filer.fsl.cs.sunysb.edu[130.245.126.2] X-Barracuda-Start-Time: 1207191050 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46677 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15175 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jeffpc@josefsipek.net Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:12PM +1000, Barry Naujok wrote: ... > + /* > + * Directory with a 'disconnected' dentry; get a reference to the > + * 'disconnected' dentry. > + */ > + dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias); list_first_entry does the .next for you. ... > --- kern_ci.orig/fs/xfs/xfs_da_btree.c > +++ kern_ci/fs/xfs/xfs_da_btree.c > @@ -2176,6 +2176,22 @@ xfs_da_reada_buf( > return rval; > } > > + > +kmem_zone_t *xfs_da_name_zone; > + > +uchar_t * > +xfs_da_name_alloc(void) > +{ > + return kmem_zone_zalloc(xfs_da_name_zone, KM_SLEEP); > +} > + > +void > +xfs_da_name_free(const uchar_t *name) Since you don't care about the type anyway, you might want to make it void*, and remove the cast from the lookup_ci code. > +{ > + kmem_zone_free(xfs_da_name_zone, (void *)name); No need for the cast. > --- kern_ci.orig/fs/xfs/xfs_dir2_leaf.c > +++ kern_ci/fs/xfs/xfs_dir2_leaf.c > @@ -1301,6 +1301,15 @@ xfs_dir2_leaf_lookup( > * Return the found inode number. > */ > args->inumber = be64_to_cpu(dep->inumber); > + /* > + * If a case-insensitive match, allocate a buffer and copy the actual > + * name into the buffer. Return it via args->value. > + */ > + if (args->cmpresult == XFS_CMP_CASE) { > + args->value = xfs_da_name_alloc(); > + memcpy(args->value, dep->name, dep->namelen); > + args->valuelen = dep->namelen; Perhaps having a static inline xfs_da_name_dup(...) would be useful... ... > --- kern_ci.orig/fs/xfs/xfs_vnodeops.c > +++ kern_ci/fs/xfs/xfs_vnodeops.c > @@ -1762,24 +1762,33 @@ xfs_inactive( > int > xfs_lookup( > xfs_inode_t *dp, > - bhv_vname_t *dentry, > - xfs_inode_t **ipp) > + bhv_vstr_t *d_name, > + xfs_inode_t **ipp, > + bhv_vstr_t *ci_name) > { > xfs_inode_t *ip; > xfs_ino_t e_inum; > int error; > uint lock_mode; > + xfs_name_t name, rname; > > xfs_itrace_entry(dp); > > if (XFS_FORCED_SHUTDOWN(dp->i_mount)) > return XFS_ERROR(EIO); > > + name.name = (uchar_t *)d_name->name; d_name->name is: const unsigned char* name.name is: const uchar_t* Is there any reason why you use uchar_t - beyond the other parts of XFS use it? (I guess this is the same question that I asked before - coding style.) xfs_types.h defines uchar_t as unsigned char... Josef 'Jeff' Sipek. -- Defenestration n. (formal or joc.): The act of removing Windows from your computer in disgust, usually followed by the installation of Linux or some other Unix-like operating system. From owner-xfs@oss.sgi.com Wed Apr 2 21:03:17 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 21:03:37 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m3343DjF031328 for ; Wed, 2 Apr 2008 21:03:15 -0700 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA02358; Thu, 3 Apr 2008 14:03:43 +1000 Date: Thu, 03 Apr 2008 14:04:05 +1000 To: "Josef 'Jeff' Sipek" Subject: Re: [PATCH 3/7] XFS: Refactor node format directory lookup/addname From: "Barry Naujok" Organization: SGI Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.380299192@chook.melbourne.sgi.com> <20080403015122.GE5211@josefsipek.net> Message-ID: In-Reply-To: <20080403015122.GE5211@josefsipek.net> User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id m3343HjF031342 X-archive-position: 15176 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Thu, 03 Apr 2008 11:51:22 +1000, Josef 'Jeff' Sipek wrote: >> +static int >> +xfs_dir2_leafn_lookup_for_entry( >> + xfs_dabuf_t *bp, /* leaf buffer */ >> + xfs_da_args_t *args, /* operation arguments */ >> + int *indexp, /* out: leaf entry index */ >> + xfs_da_state_t *state) /* state to fill in */ >> +{ >> + xfs_dabuf_t *curbp; /* current data/free buffer */ >> + xfs_dir2_db_t curdb; /* current data block number */ >> + xfs_dir2_data_entry_t *dep; /* data block entry */ >> + xfs_inode_t *dp; /* incore directory inode */ >> + int error; /* error return value */ >> + int index; /* leaf entry index */ >> + xfs_dir2_leaf_t *leaf; /* leaf structure */ >> + xfs_dir2_leaf_entry_t *lep; /* leaf entry */ >> + xfs_mount_t *mp; /* filesystem mount point */ >> + xfs_dir2_db_t newdb; /* new data block number */ >> + xfs_trans_t *tp; /* transaction pointer */ >> + xfs_dacmp_t cmp; /* comparison result */ >> + xfs_dabuf_t *ci_bp = NULL; /* buffer with CI match */ > > Did you try to check the stack usage (scripts/checkstack.pl)? on x86_64: nameops.patch -> no difference ascii_ci.patch -> no difference refactor_leafn_lookup.patch (this one) -> no difference return_name.patch -> xfs_dir_lookup from 152 down to 144 :) unicode_ci.patch -> xfs_mkdir from 152 down to 136 :) -> new xfs_unicode_read_cft @ 120 nls_support.patch -> xfs_dir2_leaf_getdents from 136 up to 200 (ouch!) -> xfs_mkdir from 136 back to 152! -> xfs_create from 152 up to 168 -> xfs_rmdir from 104 down to < 100? This seems to be better than the stack usage Eric posted back in response to my last patch set. From owner-xfs@oss.sgi.com Wed Apr 2 21:09:49 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 21:10:03 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_75 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m3349jjQ032442 for ; Wed, 2 Apr 2008 21:09:47 -0700 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA02526; Thu, 3 Apr 2008 14:10:13 +1000 To: "Barry Naujok" , "Josef 'Jeff' Sipek" Subject: Re: [PATCH 3/7] XFS: Refactor node format directory lookup/addname From: "Barry Naujok" Organization: SGI Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.380299192@chook.melbourne.sgi.com> <20080403015122.GE5211@josefsipek.net> Date: Thu, 03 Apr 2008 14:10:38 +1000 Message-ID: In-Reply-To: User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id m3349njQ032450 X-archive-position: 15177 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Thu, 03 Apr 2008 14:04:05 +1000, Barry Naujok wrote: > On Thu, 03 Apr 2008 11:51:22 +1000, Josef 'Jeff' Sipek > wrote: > >>> +static int >>> +xfs_dir2_leafn_lookup_for_entry( >>> + xfs_dabuf_t *bp, /* leaf buffer */ >>> + xfs_da_args_t *args, /* operation arguments */ >>> + int *indexp, /* out: leaf entry index */ >>> + xfs_da_state_t *state) /* state to fill in */ >>> +{ >>> + xfs_dabuf_t *curbp; /* current data/free buffer */ >>> + xfs_dir2_db_t curdb; /* current data block number */ >>> + xfs_dir2_data_entry_t *dep; /* data block entry */ >>> + xfs_inode_t *dp; /* incore directory inode */ >>> + int error; /* error return value */ >>> + int index; /* leaf entry index */ >>> + xfs_dir2_leaf_t *leaf; /* leaf structure */ >>> + xfs_dir2_leaf_entry_t *lep; /* leaf entry */ >>> + xfs_mount_t *mp; /* filesystem mount point */ >>> + xfs_dir2_db_t newdb; /* new data block number */ >>> + xfs_trans_t *tp; /* transaction pointer */ >>> + xfs_dacmp_t cmp; /* comparison result */ >>> + xfs_dabuf_t *ci_bp = NULL; /* buffer with CI match */ >> >> Did you try to check the stack usage (scripts/checkstack.pl)? > > on x86_64: > > nameops.patch > -> no difference > > ascii_ci.patch > -> no difference > > refactor_leafn_lookup.patch (this one) > -> no difference > > return_name.patch > -> xfs_dir_lookup from 152 down to 144 :) > > unicode_ci.patch > -> xfs_mkdir from 152 down to 136 :) > -> new xfs_unicode_read_cft @ 120 > > nls_support.patch > -> xfs_dir2_leaf_getdents from 136 up to 200 (ouch!) BTW. The CONFIG_XFS_UNICODE_NLS patch, setting that to "N" brings this back to 136 and no other changes. > -> xfs_mkdir from 136 back to 152! > -> xfs_create from 152 up to 168 > -> xfs_rmdir from 104 down to < 100? > > This seems to be better than the stack usage Eric posted back in > response to my last patch set. > > > > From owner-xfs@oss.sgi.com Wed Apr 2 21:33:05 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 21:33:18 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m334X0Zt003411 for ; Wed, 2 Apr 2008 21:33:03 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA03200; Thu, 3 Apr 2008 14:33:32 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m334XUsT119895338; Thu, 3 Apr 2008 14:33:31 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m334XTsW119875760; Thu, 3 Apr 2008 14:33:29 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 3 Apr 2008 14:33:29 +1000 From: David Chinner To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 3/7] XFS: Refactor node format directory lookup/addname Message-ID: <20080403043329.GQ103491721@sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.380299192@chook.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062708.380299192@chook.melbourne.sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15178 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:11PM +1000, Barry Naujok wrote: > The next step for case-insensitive support is to avoid polution of > the dentry cache with entries pointing to the same inode, but with > names that only differ in case. > > To perform this, we will need to pass the actual filename that > matched backup to the XFS/VFS interface and make sure the dentry > cache only contains entries with the actual case-sensitive name. > > But, before we can do this, it was found that the directory lookup > code with multiple leaves was shared with code adding a name to > that directory. Most of xfs_dir2_leafn_lookup_int() could be broken > into two functions determined by if (args->addname) { } else { }. > > For the following patch, only the lookup case needs to handle the > various xfs_nameops, with case-insensitive match handling in > addition to returning the actual name. > > So, this patch separates xfs_dir2_leafn_lookup_int() into > xfs_dir2_leafn_lookup_for_addname() and xfs_dir2_leafn_lookup_for_entry(). > > xfs_dir2_leafn_lookup_for_addname() iterates through the data blocks looking > for a suitable empty space to insert the name while > xfs_dir2_leafn_lookup_for_entry() uses the xfs_nameops to find the entry. > > xfs_dir2_leafn_lookup_for_entry() path also retains the data block where > the first case-insensitive match occured as in the next patch which will > return the name, the name is obtained from that block. > > Signed-off-by: Barry Naujok > > --- > fs/xfs/xfs_dir2_node.c | 373 +++++++++++++++++++++++++++++-------------------- > 1 file changed, 225 insertions(+), 148 deletions(-) > > Index: kern_ci/fs/xfs/xfs_dir2_node.c > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_dir2_node.c > +++ kern_ci/fs/xfs/xfs_dir2_node.c > @@ -387,12 +387,11 @@ xfs_dir2_leafn_lasthash( > } > > /* > - * Look up a leaf entry in a node-format leaf block. > - * If this is an addname then the extrablk in state is a freespace block, > - * otherwise it's a data block. > + * Look up a leaf entry for space to add a name in a node-format leaf block. > + * The extrablk in state is a freespace block. > */ > -int > -xfs_dir2_leafn_lookup_int( > +static int STATIC (and for the other new function) > +xfs_dir2_leafn_lookup_for_addname( > xfs_dabuf_t *bp, /* leaf buffer */ > xfs_da_args_t *args, /* operation arguments */ > int *indexp, /* out: leaf entry index */ .... > @@ -1785,6 +1857,11 @@ xfs_dir2_node_lookup( > if (error) > rval = error; > /* > + * If case-insensitive match was found in a leaf, return EEXIST. > + */ > + else if (rval == ENOENT && args->cmpresult == XFS_CMP_CASE) > + rval = EEXIST; Can you put the comment inside the if branch? if (error) { rval = error; } else if (rval == ENOENT && args->cmpresult == XFS_CMP_CASE) { /* found a case-insensitive match in a leaf */ rval = EEXIST; } I think Josef got the others... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Apr 2 21:49:47 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 21:49:59 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m334naBs005547 for ; Wed, 2 Apr 2008 21:49:46 -0700 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA03569; Thu, 3 Apr 2008 14:50:04 +1000 Date: Thu, 03 Apr 2008 14:50:39 +1000 To: "Josef 'Jeff' Sipek" Subject: Re: [PATCH 1/7] XFS: Name operation vector for hash and compare From: "Barry Naujok" Organization: SGI Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062707.797672682@chook.melbourne.sgi.com> <20080403002246.GB5211@josefsipek.net> Message-ID: In-Reply-To: <20080403002246.GB5211@josefsipek.net> User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id m334nlBs005570 X-archive-position: 15179 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Thu, 03 Apr 2008 10:22:46 +1000, Josef 'Jeff' Sipek wrote: > On Wed, Apr 02, 2008 at 04:25:09PM +1000, Barry Naujok wrote: > ... >> +/* >> + * Name ops for directory and/or attr name operations >> + */ >> + >> +typedef xfs_dahash_t (*xfs_hashname_t)(const uchar_t *, int); >> +typedef xfs_dacmp_t (*xfs_compname_t)(const uchar_t *, int, >> + const uchar_t *, int); > > Why have typedefs for function pointers? Sometimes, they even cause > problems > (I remember Eric finding a nasty 64-bit bug related to a function pointer > typedef). > > Since IRIX isn't on the supported OS list anymore, what's the policy with > coding style within XFS? Ok, I have fixed it: +/* + * Name ops for directory and/or attr name operations + */ +struct xfs_nameops { + xfs_dahash_t (*hashname)(const uchar_t *, int); + xfs_dacmp_t (*compname)(const uchar_t *, int, const uchar_t *, int); +}; > ... >> Index: kern_ci/fs/xfs/xfs_dir2.h >> =================================================================== >> --- kern_ci.orig/fs/xfs/xfs_dir2.h >> +++ kern_ci/fs/xfs/xfs_dir2.h >> @@ -85,6 +85,12 @@ extern int xfs_dir_canenter(struct xfs_t >> char *name, int namelen); >> extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino); >> >> +#define xfs_dir_hashname(dp, n, l) \ >> + ((dp)->i_mount->m_dirnameops->hashname((n), (l))) >> + >> +#define xfs_dir_compname(dp, n1, l1, n2, l2) \ >> + ((dp)->i_mount->m_dirnameops->compname((n1), (l1), (n2), (l2))) > > #define vs. static inline... > > I guess this comes back to my question before...what is the coding style > direction you want XFS to go in? More Linux-like (static inline)? or > keep it > more IRIX-like (#define)? Nasty gotcha in this scenario, I have added a comment before them: +/* + * Macros are used calling for the xfs_inode's xfs_mount's name operations as + * in most cases, xfs_dir2.h is included before xfs_inode.h and xfs_mount.h. + */ +#define xfs_dir_hashname(dp, n, l) \ + ((dp)->i_mount->m_dirnameops->hashname((n), (l))) + +#define xfs_dir_compname(dp, n1, l1, n2, l2) \ + ((dp)->i_mount->m_dirnameops->compname((n1), (l1), (n2), (l2))) The alternative is reorganising the #includes in most of the .c files! From owner-xfs@oss.sgi.com Wed Apr 2 22:21:55 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 22:22:05 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42, J_CHICKENPOX_45,J_CHICKENPOX_47,J_CHICKENPOX_48 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m335LmG7010509 for ; Wed, 2 Apr 2008 22:21:52 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA04298; Thu, 3 Apr 2008 15:22:13 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id m335MBsT119798854; Thu, 3 Apr 2008 15:22:13 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id m335M9L6119781372; Thu, 3 Apr 2008 15:22:09 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 3 Apr 2008 15:22:09 +1000 From: David Chinner To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 4/7] XFS: Return case-insensitive match for dentry cache Message-ID: <20080403052209.GR103491721@sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.654277049@chook.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062708.654277049@chook.melbourne.sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15180 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:12PM +1000, Barry Naujok wrote: > This implements the code to store the actual filename found > during a lookup in the dentry cache and to avoid multiple entries > in the dcache pointing to the same inode. > > It also introduces a new type, xfs_name, which is similar to the > dentry cache's qstr type. It contains a pointer to a zone allocated > string (MAXNAMELEN sized) and the length of the actual name. This > string does not need to be NULL terminated (a counted string). > > xfs_name_t is only used in the lookup path for this patch, but may > be used in other locations too if desired. It maybe desirable not > to use xfs_name_t at all in the lookup functions but stick to > separate parameters (which will mean 7 instead of 5 arguments). > > To avoid polluting the dcache, we implement a new directory inode > operations for lookup. xfs_vn_ci_lookup() interacts directly with > the dcache and the code was derived from ntfs_lookup() in > fs/ntfs/namei.c. The dentry hash and compare overrides introduced > in the ASCII-CI patch has been removed. > > The "actual name" is only allocated and returned for a case- > insensitive match and not an actual match. > +STATIC struct dentry * > +xfs_vn_ci_lookup( > + struct inode *dir, > + struct dentry *dentry, > + struct nameidata *nd) > +{ > + struct xfs_inode *cip; > + int error; > struct dentry *result; > + struct qstr ci_name = {0, 0, NULL}; > + struct inode *inode; > > if (dentry->d_name.len >= MAXNAMELEN) > return ERR_PTR(-ENAMETOOLONG); > > - if (xfs_sb_version_hasoldci(&mp->m_sb)) > - dentry->d_op = &xfs_ci_dentry_operations; > + error = xfs_lookup(XFS_I(dir), &dentry->d_name, &cip, &ci_name); Bit confusing with cip = "child inode" and ci_name = "case insensitive". i.e. same prefix, different meanings... > > - error = xfs_lookup(XFS_I(dir), dentry, &cip); > if (unlikely(error)) { > if (unlikely(error != ENOENT)) > return ERR_PTR(-error); > d_add(dentry, NULL); > return NULL; > } > + inode = cip->i_vnode; > + > + /* if exact match, just splice and exit */ > + if (!ci_name.name) { > + result = d_splice_alias(inode, dentry); > + return result; > + } if (!ci_name.name) return d_splice_alias(inode, dentry); > > - result = d_splice_alias(cip->i_vnode, dentry); > - if (result) > - result->d_op = dentry->d_op; > - return result; > + /* > + * case-insensitive match, create a dentry to return and fill it > + * in with the correctly cased name. Parameter "dentry" is not > + * used anymore and the caller will free it. > + * Derived from fs/ntfs/namei.c > + */ > + > + ci_name.hash = full_name_hash(ci_name.name, ci_name.len); > + > + /* Does an existing dentry match? */ > + result = d_lookup(dentry->d_parent, &ci_name); > + if (!result) { > + /* if not, create one */ > + result = d_alloc(dentry->d_parent, &ci_name); > + xfs_da_name_free((char *)ci_name.name); > + if (!result) > + return ERR_PTR(-ENOMEM); > + dentry = d_splice_alias(inode, result); > + if (dentry) { > + dput(result); > + return dentry; > + } > + return result; > + } This looks like it came from the ntfs code - i find that much easier to follow with "real_dent" and "new_dent" instead of "result" and "dentry" respectively. > + xfs_da_name_free((char *)ci_name.name); > + > + /* an existing dentry matches, use it */ Ah, I see the rest of this is basically a copy and paste of the ntfs code (without some of the useful comments). I think a generic helper function is in order here that contains all the coments from the ntfs code.... > + > + if (result->d_inode) { > + /* > + * already an inode attached, deref the inode that was > + * refcounted with xfs_lookup and return the dentry. > + */ > + if (unlikely(result->d_inode != inode)) { > + /* This can happen because bad inodes are unhashed. */ > + BUG_ON(!is_bad_inode(inode)); > + BUG_ON(!is_bad_inode(result->d_inode)); Bit drastic - how about failing the lookup and returning EIO in this case? > + } > + iput(inode); > + return result; > + } ..... > Index: kern_ci/fs/xfs/linux-2.6/xfs_super.c > =================================================================== > --- kern_ci.orig/fs/xfs/linux-2.6/xfs_super.c > +++ kern_ci/fs/xfs/linux-2.6/xfs_super.c > @@ -566,7 +566,10 @@ xfs_set_inodeops( > inode->i_mapping->a_ops = &xfs_address_space_operations; > break; > case S_IFDIR: > - inode->i_op = &xfs_dir_inode_operations; > + inode->i_op = > + xfs_sb_version_hasoldci(&XFS_I(inode)->i_mount->m_sb) ? + xfs_sb_version_hasoldci(&XFS_M(inode->i_sb)->m_sb) ? > + xfs_ino_t *inum, /* out: inode number */ > + xfs_name_t *ci_name) /* out: actual name if different */ > { > xfs_da_args_t args; > int rval; > @@ -259,9 +260,9 @@ xfs_dir_lookup( > ASSERT((dp->i_d.di_mode & S_IFMT) == S_IFDIR); > XFS_STATS_INC(xs_dir_lookup); > > - args.name = name; > - args.namelen = namelen; > - args.hashval = xfs_dir_hashname(dp, name, namelen); > + args.name = name->name; > + args.namelen = name->len; > + args.hashval = xfs_dir_hashname(dp, name->name, name->len); > args.inumber = 0; > args.dp = dp; > args.firstblock = NULL; > @@ -272,6 +273,8 @@ xfs_dir_lookup( > args.justcheck = args.addname = 0; > args.oknoent = 1; > args.cmpresult = XFS_CMP_DIFFERENT; > + args.value = NULL; > + args.valuelen = 0; Rather than initialising more of the args to zero (already 7 members explicitly initialised to zero or NULL), change it to: memset(&args, 0, sizeof(xfs_da_args_t)); args.name = name->name; args.namelen = name->len; args.hashval = xfs_dir_hashname(dp, name->name, name->len); args.dp = dp; args.whichfork = XFS_DATA_FORK; args.trans = tp; args.oknoent = 1; args.cmpresult = XFS_CMP_DIFFERENT; > > if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL) > rval = xfs_dir2_sf_lookup(&args); > @@ -287,8 +290,17 @@ xfs_dir_lookup( > rval = xfs_dir2_node_lookup(&args); > if (rval == EEXIST) > rval = 0; > - if (rval == 0) > + if (rval == 0) { if (!rval) { > Index: kern_ci/fs/xfs/xfs_dir2_block.c > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_dir2_block.c > +++ kern_ci/fs/xfs/xfs_dir2_block.c > @@ -616,6 +616,15 @@ xfs_dir2_block_lookup( > * Fill in inode number, release the block. > */ > args->inumber = be64_to_cpu(dep->inumber); > + /* > + * If a case-insensitive match, allocate a buffer and copy the actual > + * name into the buffer. Return it via args->value. > + */ > + if (args->cmpresult == XFS_CMP_CASE) { > + args->value = xfs_da_name_alloc(); > + memcpy(args->value, dep->name, dep->namelen); > + args->valuelen = dep->namelen; xfs_da_ci_name_dup(); > + } > xfs_da_brelse(args->trans, bp); > return XFS_ERROR(EEXIST); > } > Index: kern_ci/fs/xfs/xfs_dir2_leaf.c > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_dir2_leaf.c > +++ kern_ci/fs/xfs/xfs_dir2_leaf.c > @@ -1301,6 +1301,15 @@ xfs_dir2_leaf_lookup( > * Return the found inode number. > */ > args->inumber = be64_to_cpu(dep->inumber); > + /* > + * If a case-insensitive match, allocate a buffer and copy the actual > + * name into the buffer. Return it via args->value. > + */ > + if (args->cmpresult == XFS_CMP_CASE) { > + args->value = xfs_da_name_alloc(); > + memcpy(args->value, dep->name, dep->namelen); > + args->valuelen = dep->namelen; xfs_da_ci_name_dup(); > + } > xfs_da_brelse(tp, dbp); > xfs_da_brelse(tp, lbp); > return XFS_ERROR(EEXIST); > Index: kern_ci/fs/xfs/xfs_dir2_node.c > =================================================================== > --- kern_ci.orig/fs/xfs/xfs_dir2_node.c > +++ kern_ci/fs/xfs/xfs_dir2_node.c > @@ -643,6 +643,8 @@ xfs_dir2_leafn_lookup_for_entry( > xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address))); > /* > * Compare the entry, return it if it matches. > + * "oknoent" is set for lookup and clear for > + * remove and replace. > */ That should have been in an earlier patch.... > cmp = args->oknoent ? > xfs_dir_compname(dp, dep->name, dep->namelen, > @@ -1857,10 +1859,22 @@ xfs_dir2_node_lookup( > if (error) > rval = error; > /* > - * If case-insensitive match was found in a leaf, return EEXIST. > - */ > - else if (rval == ENOENT && args->cmpresult == XFS_CMP_CASE) > + * If case-insensitive match was found (xfs_dir2_leafn_lookup_int > + * returns ENOENT for a case-insensitive match, but sets > + * args->cmpresult to XFS_CMP_CASE): > + * - Allocate a buffer and copy the actual name into the buffer and > + * return it via args->value. > + * - set rval to EEXIST > + */ > + else if (rval == ENOENT && args->cmpresult == XFS_CMP_CASE) { > + xfs_dir2_data_entry_t *dep = (xfs_dir2_data_entry_t *) > + ((char *)state->extrablk.bp->data + > + state->extrablk.index); > + args->value = xfs_da_name_alloc(); > + memcpy(args->value, dep->name, dep->namelen); > + args->valuelen = dep->namelen; > rval = EEXIST; > + } Yeah, more reason to move the comment inside the if block.... oh, and xfs_da_ci_name_dup().... > - if (args->cmpresult == XFS_CMP_CASE) > + if (args->cmpresult == XFS_CMP_CASE) { > + /* > + * If a case-insensitive match, allocate a buffer and copy the > + * actual name into the buffer and return it via args->value. > + */ > + args->value = xfs_da_name_alloc(); > + memcpy(args->value, ci_sfep->name, ci_sfep->namelen); > + args->valuelen = ci_sfep->namelen; xfs_da_ci_name_dup() > --- kern_ci.orig/fs/xfs/xfs_utils.c > +++ kern_ci/fs/xfs/xfs_utils.c > @@ -24,6 +24,7 @@ > #include "xfs_trans.h" > #include "xfs_sb.h" > #include "xfs_ag.h" > +#include "xfs_da_btree.h" What's that needed for? What ever it is, i think you've put it in the wrong header file.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Apr 2 22:41:26 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 22:41:36 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m335fPl1013122 for ; Wed, 2 Apr 2008 22:41:26 -0700 X-ASG-Debug-ID: 1207201319-5b1a017b0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtps.tip.net.au (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 612B67336D5 for ; Wed, 2 Apr 2008 22:42:00 -0700 (PDT) Received: from smtps.tip.net.au (chilli.pcug.org.au [203.10.76.44]) by cuda.sgi.com with ESMTP id GpkLCf7mUexoKBfv for ; Wed, 02 Apr 2008 22:42:00 -0700 (PDT) Received: from ash.ozlabs.ibm.com (bh02i525f01.au.ibm.com [202.81.18.30]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by smtps.tip.net.au (Postfix) with ESMTP id A1307368002; Thu, 3 Apr 2008 16:41:26 +1100 (EST) Date: Thu, 3 Apr 2008 16:41:20 +1100 From: Stephen Rothwell To: David Chinner Cc: Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 4/7] XFS: Return case-insensitive match for dentry cache Subject: Re: [PATCH 4/7] XFS: Return case-insensitive match for dentry cache Message-Id: <20080403164120.87e2e44b.sfr@canb.auug.org.au> In-Reply-To: <20080403052209.GR103491721@sgi.com> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.654277049@chook.melbourne.sgi.com> <20080403052209.GR103491721@sgi.com> X-Mailer: Sylpheed 2.5.0beta1 (GTK+ 2.12.9; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="PGP-SHA1"; boundary="Signature=_Thu__3_Apr_2008_16_41_20_+1100_c+Qi.M.Lqw5pZLpj" X-Barracuda-Connect: chilli.pcug.org.au[203.10.76.44] X-Barracuda-Start-Time: 1207201322 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46690 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15181 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sfr@canb.auug.org.au Precedence: bulk X-list: xfs --Signature=_Thu__3_Apr_2008_16_41_20_+1100_c+Qi.M.Lqw5pZLpj Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi all, On Thu, 3 Apr 2008 15:22:09 +1000 David Chinner wrote: > > On Wed, Apr 02, 2008 at 04:25:12PM +1000, Barry Naujok wrote: > > This implements the code to store the actual filename found > > during a lookup in the dentry cache and to avoid multiple entries > > in the dcache pointing to the same inode. I haven't really been following this, but I was wondering if this could be made generic and used in the CIFS code as well. They currently (I think) have an awefull hack where they update the name in the dentry (which throws a warning about dropping a const attribute in the memcpy). --=20 Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ --Signature=_Thu__3_Apr_2008_16_41_20_+1100_c+Qi.M.Lqw5pZLpj Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9G4FTgG2atn1QN8RAlVEAJ93zSowne9T1LxgC2zOGYJNVJZ/qwCeMO1U Xe9szYD4r2MkJMjvBeF6nBA= =ECoH -----END PGP SIGNATURE----- --Signature=_Thu__3_Apr_2008_16_41_20_+1100_c+Qi.M.Lqw5pZLpj-- From owner-xfs@oss.sgi.com Wed Apr 2 23:42:11 2008 Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 23:42:20 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m336gAd3021249 for ; Wed, 2 Apr 2008 23:42:10 -0700 X-ASG-Debug-ID: 1207204966-4b1501120000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from one.firstfloor.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0DDF28FADE9 for ; Wed, 2 Apr 2008 23:42:46 -0700 (PDT) Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by cuda.sgi.com with ESMTP id ECsE6LeqMFolmMKb for ; Wed, 02 Apr 2008 23:42:46 -0700 (PDT) Received: by one.firstfloor.org (Postfix, from userid 503) id EC9F018902B0; Thu, 3 Apr 2008 08:46:08 +0200 (CEST) Date: Thu, 3 Apr 2008 08:46:08 +0200 From: Andi Kleen To: David Chinner Cc: Andi Kleen , Lachlan McIlroy , xfs-dev , xfs-oss X-ASG-Orig-Subj: Re: [Patch] Cacheline align xlog_t Subject: Re: [Patch] Cacheline align xlog_t Message-ID: <20080403064608.GS29105@one.firstfloor.org> References: <20080401231552.GV103491721@sgi.com> <47F3293C.6090708@sgi.com> <20080402054403.GF103491721@sgi.com> <87myocek4o.fsf@basil.nowhere.org> <20080402222347.GK103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402222347.GK103491721@sgi.com> User-Agent: Mutt/1.4.2.1i X-Barracuda-Connect: one.firstfloor.org[213.235.205.2] X-Barracuda-Start-Time: 1207204967 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46695 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15182 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: andi@firstfloor.org Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 08:23:47AM +1000, David Chinner wrote: > > For the dynamic allocation you would rather need to make sure it > > starts at a cache line boundary explicitely because the allocator doesn't > > know the alignment of the target type, otherwise your careful > > padding might be useless. > > Yup. Is there an allocator function gives us cacheline aligned > allocation __get_free_pages() @) [ok not serious] > (apart from a slab initialised with SLAB_HWCACHE_ALIGN)? That too yes. > There isn't one, right? You can always align yourself with kmalloc (or any other arbitary size allocator) with the standard technique: get L1_CACHE_BYTES-1 or possibly better cache_line_size() - 1 bytes more and then align the pointer manually with ALIGN. Only tricky part is that you have to undo the alignment before freeing. -Andi From owner-xfs@oss.sgi.com Thu Apr 3 10:09:53 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 10:10:04 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33H9qZh021739 for ; Thu, 3 Apr 2008 10:09:53 -0700 X-ASG-Debug-ID: 1207242616-766f00000000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 13114902F3A for ; Thu, 3 Apr 2008 10:10:16 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id jtIPeQXFmZWrjFKc for ; Thu, 03 Apr 2008 10:10:16 -0700 (PDT) Received: from hch by bombadil.infradead.org with local (Exim 4.68 #1 (Red Hat Linux)) id 1JhSwb-0001hS-4i; Thu, 03 Apr 2008 17:09:45 +0000 Date: Thu, 3 Apr 2008 13:09:45 -0400 From: Christoph Hellwig To: David Chinner Cc: Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 2/7] XFS: ASCII case-insensitive support Subject: Re: [PATCH 2/7] XFS: ASCII case-insensitive support Message-ID: <20080403170945.GA22385@infradead.org> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.071715758@chook.melbourne.sgi.com> <20080403015331.GP103491721@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080403015331.GP103491721@sgi.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1207242617 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46736 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15184 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 11:53:31AM +1000, David Chinner wrote: > > + if (xfs_sb_version_hasoldci(&mp->m_sb)) > > + sb->s_root->d_op = &xfs_ci_dentry_operations; > > Write a helper function for this: xfs_set_ci_dentry_ops(mp, dentry) > rather than exporting the xfs_ci_dentry_operations structure. yes, please. also the export ops calling d_alloc_anon need to update the dentry ops aswel and should be using this one. > > +#define XFS_SB_VERSION_OLDCIBIT 0x4000 /* ASCII only case-insens. */ > > #define XFS_SB_VERSION_MOREBITSBIT 0x8000 > > #define XFS_SB_VERSION_OKSASHFBITS \ > > Whitespace. > > But it's a shame you're being sensible about this - I kinda liked > the Irix name for this feature (XFS_SB_VERSION_BORGBIT). :) So what exactly prevents us from using it in Linux? please use the old name. From owner-xfs@oss.sgi.com Thu Apr 3 10:36:13 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 10:36:22 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33HaAPQ030015 for ; Thu, 3 Apr 2008 10:36:13 -0700 X-ASG-Debug-ID: 1207243283-7673005d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B97BB903C0B for ; Thu, 3 Apr 2008 10:21:23 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id ZMEbgoFbpoefCx1l for ; Thu, 03 Apr 2008 10:21:23 -0700 (PDT) Received: from hch by bombadil.infradead.org with local (Exim 4.68 #1 (Red Hat Linux)) id 1JhQrm-0007Yd-9U; Thu, 03 Apr 2008 14:56:38 +0000 Date: Thu, 3 Apr 2008 10:56:38 -0400 From: Christoph Hellwig To: Stephen Rothwell Cc: David Chinner , Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 4/7] XFS: Return case-insensitive match for dentry cache Subject: Re: [PATCH 4/7] XFS: Return case-insensitive match for dentry cache Message-ID: <20080403145638.GA3373@infradead.org> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.654277049@chook.melbourne.sgi.com> <20080403052209.GR103491721@sgi.com> <20080403164120.87e2e44b.sfr@canb.auug.org.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080403164120.87e2e44b.sfr@canb.auug.org.au> User-Agent: Mutt/1.5.17 (2007-11-01) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1207243283 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46736 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15187 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 04:41:20PM +1100, Stephen Rothwell wrote: > I haven't really been following this, but I was wondering if this could > be made generic and used in the CIFS code as well. They currently (I > think) have an awefull hack where they update the name in the dentry > (which throws a warning about dropping a const attribute in the memcpy). yes, it should. the new lookup higher level code added here should probably be a helper in dcache.c although that'll need a new abstraction for the unicode table handling. From owner-xfs@oss.sgi.com Thu Apr 3 12:06:12 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 12:06:19 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33J6BVN023216 for ; Thu, 3 Apr 2008 12:06:12 -0700 X-ASG-Debug-ID: 1207248461-22be01530000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from lists.samba.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 37E8C90B8D8 for ; Thu, 3 Apr 2008 11:47:41 -0700 (PDT) Received: from lists.samba.org (mail.samba.org [66.70.73.150]) by cuda.sgi.com with ESMTP id J6icIARE6dnAPTcC for ; Thu, 03 Apr 2008 11:47:41 -0700 (PDT) Received: by lists.samba.org (Postfix, from userid 549) id 36F12163922; Thu, 3 Apr 2008 18:47:41 +0000 (GMT) Date: Thu, 3 Apr 2008 11:47:39 -0700 From: Jeremy Allison To: Christoph Hellwig Cc: Jeremy Allison , Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Subject: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Message-ID: <20080403184739.GB6100@samba1> Reply-To: Jeremy Allison References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062709.011126702@chook.melbourne.sgi.com> <20080403171450.GB22385@infradead.org> <20080403172400.GC22812@samba1> <20080403184333.GA30595@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080403184333.GA30595@infradead.org> User-Agent: Mutt/1.5.11 X-Barracuda-Connect: mail.samba.org[66.70.73.150] X-Barracuda-Start-Time: 1207248462 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46740 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15188 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jra@samba.org Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 02:43:33PM -0400, Christoph Hellwig wrote: > On Thu, Apr 03, 2008 at 10:24:00AM -0700, Jeremy Allison wrote: > > On Thu, Apr 03, 2008 at 01:14:50PM -0400, Christoph Hellwig wrote: > > > Validating file names is not the filesystem job. In fact it's utterly > > > stupid, a unix filename is a sequence of bytes without special meaning > > > except for ., .., / and \0 > > > > This patch will be extremely useful for users who are serving > > Windows clients using Samba. It allow admins to turn off the > > userspace case insensitivity we have to emulate and be a significant > > speed increase. > > CI filenames can work perfectly fine without adding validation of file > names by treating non-conformant bytestreams as not having lower/upper > case variants. Sorry, then I'm not understanding your objection to this patch (and I don't think I understood that sentence :-). Jeremy. From owner-xfs@oss.sgi.com Thu Apr 3 12:06:12 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 12:06:21 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33J6BEY023215 for ; Thu, 3 Apr 2008 12:06:12 -0700 X-ASG-Debug-ID: 1207248926-60ff03dd0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 1AB5A106CCA0; Thu, 3 Apr 2008 11:55:26 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id IT9StmSFTD8Zia2O; Thu, 03 Apr 2008 11:55:26 -0700 (PDT) Received: from hch by bombadil.infradead.org with local (Exim 4.68 #1 (Red Hat Linux)) id 1JhUas-0004lT-3J; Thu, 03 Apr 2008 18:55:26 +0000 Date: Thu, 3 Apr 2008 14:55:26 -0400 From: Christoph Hellwig To: Jeremy Allison Cc: Christoph Hellwig , Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Subject: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Message-ID: <20080403185526.GA6045@infradead.org> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062709.011126702@chook.melbourne.sgi.com> <20080403171450.GB22385@infradead.org> <20080403172400.GC22812@samba1> <20080403184333.GA30595@infradead.org> <20080403184739.GB6100@samba1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080403184739.GB6100@samba1> User-Agent: Mutt/1.5.17 (2007-11-01) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1207248927 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46740 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15189 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 11:47:39AM -0700, Jeremy Allison wrote: > > CI filenames can work perfectly fine without adding validation of file > > names by treating non-conformant bytestreams as not having lower/upper > > case variants. > > Sorry, then I'm not understanding your objection to this patch (and I > don't think I understood that sentence :-). I objected to the part of the patch I've quoted (and the bitsrelated to it), not all of it. That how we do reviews in kernel land, not sure how samba handles it if you have a binary object/don't object policy.. From owner-xfs@oss.sgi.com Thu Apr 3 13:56:14 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 13:56:49 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33KuDt3005138 for ; Thu, 3 Apr 2008 13:56:14 -0700 X-ASG-Debug-ID: 1207246949-404b011c0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from lists.samba.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id CD202739452 for ; Thu, 3 Apr 2008 11:22:29 -0700 (PDT) Received: from lists.samba.org (mail.samba.org [66.70.73.150]) by cuda.sgi.com with ESMTP id eJUFtWL76tQmunTS for ; Thu, 03 Apr 2008 11:22:29 -0700 (PDT) Received: by lists.samba.org (Postfix, from userid 549) id 0D051163945; Thu, 3 Apr 2008 18:22:28 +0000 (GMT) Date: Thu, 3 Apr 2008 11:22:26 -0700 From: Jeremy Allison To: Eric Sandeen Cc: Jeremy Allison , Christoph Hellwig , Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Subject: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Message-ID: <20080403182226.GA6100@samba1> Reply-To: Jeremy Allison References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062709.011126702@chook.melbourne.sgi.com> <20080403171450.GB22385@infradead.org> <20080403172400.GC22812@samba1> <47F51DDE.8070501@sandeen.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47F51DDE.8070501@sandeen.net> User-Agent: Mutt/1.5.11 X-Barracuda-Connect: mail.samba.org[66.70.73.150] X-Barracuda-Start-Time: 1207246949 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46739 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15190 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jra@samba.org Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 01:11:42PM -0500, Eric Sandeen wrote: > Jeremy Allison wrote: > > On Thu, Apr 03, 2008 at 01:14:50PM -0400, Christoph Hellwig wrote: > >> Validating file names is not the filesystem job. In fact it's utterly > >> stupid, a unix filename is a sequence of bytes without special meaning > >> except for ., .., / and \0 > > > > This patch will be extremely useful for users who are serving > > Windows clients using Samba. It allow admins to turn off the > > userspace case insensitivity we have to emulate and be a significant > > speed increase. > > I'd like to see the numbers... Simo tested an earlier version of this > patch, and it was not faster.... Jeremy, what would be a representative > test setup to use? It very much depends on the usage case. We have many users who have large numbers of files per directory, and not having to search these in userspace when we get a stat cache miss is helpful. Just running a generic "netbench" test won't show any difference, as that test uses separate directories for each client with small numbers of files per directory. There's a reason I wrote this HOWTO (having to use an alternate link as samba.org seems to be down right now): http://man.chinaunix.net/newsoft/samba/docs/man/Samba-HOWTO-Collection/largefile.html Jeremy. From owner-xfs@oss.sgi.com Thu Apr 3 14:36:16 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 14:36:24 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_15 autolearn=no version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33LaEIS017625 for ; Thu, 3 Apr 2008 14:36:16 -0700 X-ASG-Debug-ID: 1207249037-3f0a01a80000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id CD2D37399F6; Thu, 3 Apr 2008 11:57:17 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id v1304MOulnCZdtf9; Thu, 03 Apr 2008 11:57:17 -0700 (PDT) Received: from hch by bombadil.infradead.org with local (Exim 4.68 #1 (Red Hat Linux)) id 1JhUcf-0008L9-4O; Thu, 03 Apr 2008 18:57:17 +0000 Date: Thu, 3 Apr 2008 14:57:17 -0400 From: Christoph Hellwig To: Jeremy Allison Cc: Christoph Hellwig , Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Subject: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Message-ID: <20080403185717.GB6045@infradead.org> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062709.011126702@chook.melbourne.sgi.com> <20080403171450.GB22385@infradead.org> <20080403172400.GC22812@samba1> <20080403184333.GA30595@infradead.org> <20080403184739.GB6100@samba1> <20080403185526.GA6045@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080403185526.GA6045@infradead.org> User-Agent: Mutt/1.5.17 (2007-11-01) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1207249037 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46743 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15191 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 02:55:26PM -0400, Christoph Hellwig wrote: > On Thu, Apr 03, 2008 at 11:47:39AM -0700, Jeremy Allison wrote: > > > CI filenames can work perfectly fine without adding validation of file > > > names by treating non-conformant bytestreams as not having lower/upper > > > case variants. > > > > Sorry, then I'm not understanding your objection to this patch (and I > > don't think I understood that sentence :-). > > I objected to the part of the patch I've quoted (and the bitsrelated to > it), not all of it. That how we do reviews in kernel land, not sure > how samba handles it if you have a binary object/don't object policy.. oops, look like the quote actually got deleted accidentally. sorry I'ltake that comment back. The part I object to are the various calls to xfs_unicode_validate in the namespace operations. From owner-xfs@oss.sgi.com Thu Apr 3 15:50:48 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 15:50:55 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33MoiWU005940 for ; Thu, 3 Apr 2008 15:50:48 -0700 X-ASG-Debug-ID: 1207263080-6d1503220000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 9CDC8910C8C; Thu, 3 Apr 2008 15:51:20 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id JVazLVSrW9p9pxJ4; Thu, 03 Apr 2008 15:51:20 -0700 (PDT) Received: from hch by bombadil.infradead.org with local (Exim 4.68 #1 (Red Hat Linux)) id 1JhYHA-0006dt-7l; Thu, 03 Apr 2008 22:51:20 +0000 Date: Thu, 3 Apr 2008 18:51:20 -0400 From: Christoph Hellwig To: David Chinner Cc: Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 1/7] XFS: Name operation vector for hash and compare Subject: Re: [PATCH 1/7] XFS: Name operation vector for hash and compare Message-ID: <20080403225120.GA448@infradead.org> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062707.797672682@chook.melbourne.sgi.com> <20080403012912.GO103491721@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080403012912.GO103491721@sgi.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1207263081 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46758 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15196 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Apr 03, 2008 at 11:29:12AM +1000, David Chinner wrote: > > +#define xfs_dir_hashname(dp, n, l) \ > > + ((dp)->i_mount->m_dirnameops->hashname((n), (l))) > > + > > +#define xfs_dir_compname(dp, n1, l1, n2, l2) \ > > + ((dp)->i_mount->m_dirnameops->compname((n1), (l1), (n2), (l2))) > > + > > Static inline functions, please. Or kill them completely. I find the common Linux style that jut opencodes method invocations a lot more readable. From owner-xfs@oss.sgi.com Thu Apr 3 15:55:23 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 15:55:36 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33MtKSX007413 for ; Thu, 3 Apr 2008 15:55:23 -0700 X-ASG-Debug-ID: 1207263357-7f38033e0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from bombadil.infradead.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D267F73B78F; Thu, 3 Apr 2008 15:55:57 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) by cuda.sgi.com with ESMTP id a7MXEzCFhIVTSZ1U; Thu, 03 Apr 2008 15:55:57 -0700 (PDT) Received: from hch by bombadil.infradead.org with local (Exim 4.68 #1 (Red Hat Linux)) id 1JhYLc-0000jc-Nt; Thu, 03 Apr 2008 22:55:56 +0000 Date: Thu, 3 Apr 2008 18:55:56 -0400 From: Christoph Hellwig To: Barry Naujok Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 2/7] XFS: ASCII case-insensitive support Subject: Re: [PATCH 2/7] XFS: ASCII case-insensitive support Message-ID: <20080403225556.GB448@infradead.org> References: <20080402062508.017738664@chook.melbourne.sgi.com> <20080402062708.071715758@chook.melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080402062708.071715758@chook.melbourne.sgi.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html X-Barracuda-Connect: bombadil.infradead.org[18.85.46.34] X-Barracuda-Start-Time: 1207263357 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46759 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 15197 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Wed, Apr 02, 2008 at 04:25:10PM +1000, Barry Naujok wrote: > + struct qstr *a, > + struct qstr *b) > +{ > + int result = xfs_dir_compname(XFS_I(dir->d_inode), a->name, a->len, > + b->name, b->len) == XFS_CMP_DIFFERENT; > + /* > + * result == 0 if a match is found, and if so, copy the name in "b" > + * to "a" to cope with negative dentries getting the correct name. > + */ > + if (result == 0) > + memcpy((unsigned char *)a->name, b->name, a->len); > + return result; qstr->name is marked const for a reason, please don't overwrite it after it's creation. > +struct dentry_operations xfs_ci_dentry_operations = > +{ struct dentry_operations xfs_ci_dentry_operations = { > +static xfs_dahash_t > +xfs_ascii_ci_hashname( is the use of STATIC now officially phased out for XFS? > + ((sbp)->sb_versionnum & XFS_SB_VERSION_OLDCIBIT); no need for the braces around sbp > From owner-xfs@oss.sgi.com Thu Apr 3 15:56:09 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 15:56:20 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.0 required=5.0 tests=BAYES_50,HTML_MESSAGE autolearn=no version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33Mu7iA007762 for ; Thu, 3 Apr 2008 15:56:08 -0700 X-ASG-Debug-ID: 1207263404-71c5032d0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from yw-out-1718.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id ACB6C910D60 for ; Thu, 3 Apr 2008 15:56:44 -0700 (PDT) Received: from yw-out-1718.google.com (yw-out-1718.google.com [74.125.46.152]) by cuda.sgi.com with ESMTP id wknRGaFoMA6JnwrH for ; Thu, 03 Apr 2008 15:56:44 -0700 (PDT) Received: by yw-out-1718.google.com with SMTP id 6so1237866ywa.32 for ; Thu, 03 Apr 2008 15:56:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; bh=3nzMfKx30J3X7t1hNZ9RCgxE+xMk/yODtD9NLsVenHo=; b=t7rJD3LPicu0uh/8yjxCcFHtke+juIKK/9xX5X+wRNOAwYMXsg12A21LuuLWIK+JsNpdJ5Dp8rMbiDRLLft88T2VpsWnnlEMgb/hcxtto1rO9XGWrku1yOABmtbexW6SPbqAY8qWct3uHoWTSGz6k7aH8QTpWpTQB2ui+adFmV0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:mime-version:content-type; b=V/BUbh0Ye2rT+rAM4AEeUHkkbW+EriZHnK3qoAbva5W8q+awiD9+Z5lvYQ5dk/tXqwiwh0vLYCFNzc98VqG86HNTelUhaXoI2w2Dvkr7kNS6YBpCV8pEzYdLMTemSEdotSpj/HRzJMMyxSO29ZhjEOhF5gMBTdsIEPByD/YsB2A= Received: by 10.151.12.4 with SMTP id p4mr187943ybi.229.1207263403584; Thu, 03 Apr 2008 15:56:43 -0700 (PDT) Received: by 10.150.197.1 with HTTP; Thu, 3 Apr 2008 15:56:43 -0700 (PDT) Message-ID: <4f52331f0804031556n1f00e435g3273c516aacc5d95@mail.gmail.com> Date: Thu, 3 Apr 2008 15:56:43 -0700 From: "Fong Vang" To: xfs@oss.sgi.com X-ASG-Orig-Subj: xfs_check running out of memory Subject: xfs_check running out of memory MIME-Version: 1.0 X-Barracuda-Connect: yw-out-1718.google.com[74.125.46.152] X-Barracuda-Start-Time: 1207263404 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0016 1.0000 -2.0103 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.06 X-Barracuda-Spam-Status: No, SCORE=-1.06 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=HTML_10_20, HTML_MESSAGE X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46758 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 HTML_MESSAGE BODY: HTML included in message 0.94 HTML_10_20 BODY: Message is 10% to 20% HTML X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Status: Clean Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-length: 321 X-archive-position: 15198 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sudoyang@gmail.com Precedence: bulk X-list: xfs xfs_check on one of my 64-bit server keeps running out of memory checking a 6.5TB file system with millions of files. xfsprogs is version 2.9.4 One of the man pages online mentions a xfs_check64 version but I can't seem to find this anyway. Where can I find this tool? thank you. [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Thu Apr 3 16:00:03 2008 Received: with ECARTIS (v1.0.0; list xfs); Thu, 03 Apr 2008 16:00:11 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m33N02nv009527 for ; Thu, 3 Apr 2008 16:00:02 -0700 X-ASG-Debug-ID: 1207263638-71bc02ff0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail2.shareable.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id EA5CC910E62 for ; Thu, 3 Apr 2008 16:00:38 -0700 (PDT) Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by cuda.sgi.com with ESMTP id VKd1ByEt16FYACtZ for ; Thu, 03 Apr 2008 16:00:38 -0700 (PDT) Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from ) id 1JhYPc-0000x8-9h; Fri, 04 Apr 2008 00:00:04 +0100 Date: Fri, 4 Apr 2008 00:00:04 +0100 From: Jamie Lokier To: David Chinner Cc: Christoph Hellwig , Barry Naujok , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org X-ASG-Orig-Subj: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Subject: Re: [PATCH 5/7] XFS: Unicode case-insensitive lookup implementation Message-ID: <20080403230003.GA3422@shareable.org> Mail-Followup-To: Davi