Received: with ECARTIS (v1.0.0; list xfs); Fri, 04 Apr 2008 02:08:58 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m3498LuC031292 for ; Fri, 4 Apr 2008 02:08:23 -0700 X-ASG-Debug-ID: 1207300133-49a702360000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from moutng.kundenserver.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 3B4A0106F71F for ; Fri, 4 Apr 2008 02:08:53 -0700 (PDT) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.188]) by cuda.sgi.com with ESMTP id zcz1JY8RiMyKgrMF for ; Fri, 04 Apr 2008 02:08:53 -0700 (PDT) Received: from lisa (p4FD34C56.dip.t-dialin.net [79.211.76.86]) by mrelayeu.kundenserver.de (node=mrelayeu8) with ESMTP (Nemesis) id 0ML31I-1Jhhuk0wR7-0003Zr; Fri, 04 Apr 2008 11:08:50 +0200 Received: from localhost (localhost [127.0.0.1]) by tyrex.lisa.loc (Postfix) with ESMTP id 05582F3A1C9A; Fri, 4 Apr 2008 11:08:50 +0200 (CEST) X-Virus-Scanned: ClamAV 0.91.2/6021/Wed Feb 27 15:55:48 2008 on oss.sgi.com X-Virus-Scanned: amavisd-new at lisa From: Hans-Peter Jansen To: David Chinner X-ASG-Orig-Subj: Re: [Patch] unique per-AG inode generation number initialisation Subject: Re: [Patch] unique per-AG inode generation number initialisation Date: Fri, 4 Apr 2008 11:08:42 +0200 User-Agent: KMail/1.9.9 Cc: xfs-dev , xfs-oss References: <20080401231815.GW103491721@sgi.com> In-Reply-To: <20080401231815.GW103491721@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200804041108.43442.hpj@urpla.net> X-Provags-ID: V01U2FsdGVkX18d1y7TTFgumYrpUrrTtdiPTHNdF4afpKHLxdE 34NH7Y8a8s07qXDBfikOF+ibX/e0JCLZw0ZNQPCyoFN7AXtoRT HlGcEb6Mz7j/1axHbvUXA== X-Barracuda-Connect: moutng.kundenserver.de[212.227.126.188] X-Barracuda-Start-Time: 1207300138 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.1, rules version 3.1.46800 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Status: Clean X-archive-position: 15214 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hpj@urpla.net Precedence: bulk X-list: xfs Dear David, Am Mittwoch, 2. April 2008 schrieb David Chinner: > Don't initialise new inode generation numbers to zero > > When we allocation new inode chunks, we initialise the generation > numbers to zero. This works fine until we delete a chunk and then > reallocate it, resulting in the same inode numbers but with a > reset generation count. This can result in inode/generation > pairs of different inodes occurring relatively close together. > > Given that the inode/gen pair makes up the "unique" portion of > an NFS filehandle on XFS, this can result in file handles cached > on clients being seen on the wire from the server but refer to > a different file. This causes .... issues for NFS clients. > > Hence we need a unique generation number initialisation for > each inode to prevent reuse of a small portion of the generation > number space. Make this initialiser per-allocation group so > that it is not a single point of contention in the filesystem, > and increment it on every allocation within an AG to reduce the > chance that a generation number is reused for a given inode number > if the inode chunk is deleted and reallocated immediately > afterwards. > > It is safe to add the agi_newinogen field to the AGI without > using a feature bit. If an older kernel is used, it simply > will not update the field on allocation. If the kernel is > updated and the field has garbage in it, then it's like having a > random seed to the generation number.... Is this patch ready for production systems? Did it went through QA already? If it's ready, shouldn't it go to the Linux stable team then, also, since it sounds like a serious issue. My systems hit file servers with solely xfs via NFS3 rather heavily - hence I am concerned.. Thanks, Pete > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_ag.h | 4 +++- > fs/xfs/xfs_ialloc.c | 30 ++++++++++++++++++++++-------- > 2 files changed, 25 insertions(+), 9 deletions(-) > > Index: 2.6.x-xfs-new/fs/xfs/xfs_ag.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ag.h 2008-01-18 18:30:06.000000000 > +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_ag.h 2008-03-26 13:03:41.122918236 > +1100 @@ -121,6 +121,7 @@ typedef struct xfs_agi { > * still being referenced. > */ > __be32 agi_unlinked[XFS_AGI_UNLINKED_BUCKETS]; > + __be32 agi_newinogen; /* inode cluster generation */ > } xfs_agi_t; > > #define XFS_AGI_MAGICNUM 0x00000001 > @@ -134,7 +135,8 @@ typedef struct xfs_agi { > #define XFS_AGI_NEWINO 0x00000100 > #define XFS_AGI_DIRINO 0x00000200 > #define XFS_AGI_UNLINKED 0x00000400 > -#define XFS_AGI_NUM_BITS 11 > +#define XFS_AGI_NEWINOGEN 0x00000800 > +#define XFS_AGI_NUM_BITS 12 > #define XFS_AGI_ALL_BITS ((1 << XFS_AGI_NUM_BITS) - 1) > > /* disk block (xfs_daddr_t) in the AG */ > Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-25 15:41:27.000000000 > +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-26 14:29:47.998554368 > +1100 @@ -309,6 +309,8 @@ xfs_ialloc_ag_alloc( > free = XFS_MAKE_IPTR(args.mp, fbuf, i); > free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC); > free->di_core.di_version = version; > + free->di_core.di_gen = agi->agi_newinogen; > + be32_add_cpu(&agi->agi_newinogen, 1); > free->di_next_unlinked = cpu_to_be32(NULLAGINO); > xfs_ialloc_log_di(tp, fbuf, i, > XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED); > @@ -347,7 +349,8 @@ xfs_ialloc_ag_alloc( > * Log allocation group header fields > */ > xfs_ialloc_log_agi(tp, agbp, > - XFS_AGI_COUNT | XFS_AGI_FREECOUNT | XFS_AGI_NEWINO); > + XFS_AGI_COUNT | XFS_AGI_FREECOUNT | > + XFS_AGI_NEWINO | XFS_AGI_NEWINOGEN); > /* > * Modify/log superblock values for inode count and inode free count. > */ > @@ -896,11 +899,12 @@ nextag: > ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino + offset); > XFS_INOBT_CLR_FREE(&rec, offset); > rec.ir_freecount--; > + be32_add_cpu(&agi->agi_newinogen, 1); > if ((error = xfs_inobt_update(cur, rec.ir_startino, rec.ir_freecount, > rec.ir_free))) > goto error0; > be32_add(&agi->agi_freecount, -1); > - xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT); > + xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT | XFS_AGI_NEWINOGEN); > down_read(&mp->m_peraglock); > mp->m_perag[tagno].pagi_freecount--; > up_read(&mp->m_peraglock); > @@ -1320,6 +1324,11 @@ xfs_ialloc_compute_maxlevels( > > /* > * Log specified fields for the ag hdr (inode section) > + * > + * We don't log the unlinked inode fields through here; they > + * get logged directly to the buffer. Hence we have a discontinuity > + * in the fields we are logging and we need two calls to map all > + * the dirtied parts of the agi.... > */ > void > xfs_ialloc_log_agi( > @@ -1342,22 +1351,27 @@ xfs_ialloc_log_agi( > offsetof(xfs_agi_t, agi_newino), > offsetof(xfs_agi_t, agi_dirino), > offsetof(xfs_agi_t, agi_unlinked), > + offsetof(xfs_agi_t, agi_newinogen), > sizeof(xfs_agi_t) > }; > + int log_newino = fields & XFS_AGI_NEWINOGEN; > + > #ifdef DEBUG > xfs_agi_t *agi; /* allocation group header */ > > agi = XFS_BUF_TO_AGI(bp); > ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC); > #endif > - /* > - * Compute byte offsets for the first and last fields. > - */ > + fields &= ~XFS_AGI_NEWINOGEN; > + > + /* Compute byte offsets for the first and last fields. */ > xfs_btree_offsets(fields, offsets, XFS_AGI_NUM_BITS, &first, &last); > - /* > - * Log the allocation group inode header buffer. > - */ > xfs_trans_log_buf(tp, bp, first, last); > + if (log_newino) { > + xfs_btree_offsets(XFS_AGI_NEWINOGEN, offsets, XFS_AGI_NUM_BITS, > + &first, &last); > + xfs_trans_log_buf(tp, bp, first, last); > + } > } > > /*