X-Spam-Checker-Version: SpamAssassin 3.3.0-rupdated (updated) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.7 required=5.0 tests=BAYES_00,FH_DATE_PAST_20XX, HTML_MESSAGE,J_CHICKENPOX_14 autolearn=no version=3.3.0-rupdated Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o0M1enMR104105 for ; Thu, 21 Jan 2010 19:40:49 -0600 X-ASG-Debug-ID: 1264124510-081403cf0000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from sj-iport-5.cisco.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2048A181CC3 for ; Thu, 21 Jan 2010 17:41:50 -0800 (PST) Received: from sj-iport-5.cisco.com (sj-iport-5.cisco.com [171.68.10.87]) by cuda.sgi.com with ESMTP id EtC9bjxvOqmwXy42 for ; Thu, 21 Jan 2010 17:41:50 -0800 (PST) X-ASG-Whitelist: Barracuda Reputation Authentication-Results: sj-iport-5.cisco.com; dkim=neutral (message not signed) header.i=none X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAHORWEurRN+K/2dsb2JhbADDGpYvhDwE X-IronPort-AV: E=Sophos;i="4.49,321,1262563200"; d="scan'208,217";a="137893105" Received: from sj-core-4.cisco.com ([171.68.223.138]) by sj-iport-5.cisco.com with ESMTP; 22 Jan 2010 01:41:49 +0000 Received: from xbh-sjc-231.amer.cisco.com (xbh-sjc-231.cisco.com [128.107.191.100]) by sj-core-4.cisco.com (8.13.8/8.14.3) with ESMTP id o0M1fntU021089 for ; Fri, 22 Jan 2010 01:41:49 GMT Received: from xmb-sjc-219.amer.cisco.com ([171.70.151.188]) by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 21 Jan 2010 17:41:49 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CA9B04.105572C5" X-ASG-Orig-Subj: 2.6.23 kdb in xfs_bmbt_get_block with unwritten extents Subject: 2.6.23 kdb in xfs_bmbt_get_block with unwritten extents Date: Thu, 21 Jan 2010 17:41:48 -0800 Message-ID: <212AA327A3557741A058E787E06188731FA4AA@xmb-sjc-219.amer.cisco.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 2.6.23 kdb in xfs_bmbt_get_block with unwritten extents Thread-Index: AcqbBBBC4SPXo/x3SCaBOjvi45A68g== From: "Richard Troxell (rtroxell)" To: Cc: "Richard Troxell (rtroxell)" X-OriginalArrivalTime: 22 Jan 2010 01:41:49.0762 (UTC) FILETIME=[10CE7220:01CA9B04] X-Barracuda-Connect: sj-iport-5.cisco.com[171.68.10.87] X-Barracuda-Start-Time: 1264124511 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean This is a multi-part message in MIME format. ------_=_NextPart_001_01CA9B04.105572C5 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hello All, I am getting random kdbs when creating preallocated files that are = excessively 'holey' (ex: 500MB+ file with alternating 4K written 4K = unwritten extents). Creating such files is not my intention, and is = being addressing in the userspace writer. That said, I am still = concerned with running into kdb. I am currently running 2.6.23.9, and have done some digging through the = changelogs, but cant seem to find a match. Also, 2.6.24 seems to have a = massive rewrite in this area, which significantly limits the scope that = I can search. The cause of the crash is a straigtforward NULL derference in = xfs_bmap_btree.c:xfs_bmbt_get_block(), but I suspect the root cause is = going to be some complex condition that corrupts the cursor... if (level < cur->bc_nlevels - 1) { *bpp =3D cur->bc_bufs[level]; <----- cur->bc_bufs[level] = =3D=3D NULL rval =3D XFS_BUF_TO_BMBT_BLOCK(*bpp); <----- BAM! NULL dereferenced } Scanning the source, I see numerous instances of this same unchecked = dereference from bc_bufs, but so far I have only hit this one condition. Here is the call trace... [] xfs_bmbt_increment+0xb0/0x2c0 [] xfs_bmap_add_extent_unwritten_real+0x5eb/0xd50 [] xfs_bmap_add_extent+0x152/0x480 [] kmem_zone_zalloc+0x32/0x50 [] xfs_bmapi+0xbe0/0x11f0 [] _spin_unlock+0x14/0x40 [] _spin_lock+0x1d/0x90 [] xfs_log_reserve+0xa3/0x100 [] _spin_unlock_irq+0x15/0x40 [] __down_write_nested+0x96/0xa0 [] xfs_trans_reserve+0xa9/0x1f0 [] xfs_iomap_write_unwritten+0x14a/0x230 [] xfs_iomap+0x2fe/0x390 [] __lock_text_start+0x16/0x40 [] xfs_end_bio_unwritten+0x0/0x50 [] xfs_end_bio_unwritten+0x31/0x50 [] run_workqueue+0x73/0x130 [] worker_thread+0x9c/0xf0 [] autoremove_wake_function+0x0/0x30 [] autoremove_wake_function+0x0/0x30 [] worker_thread+0x0/0xf0 [] kthread+0x6c/0xa0 [] child_rip+0xa/0x12 [] kthread+0x0/0xa0 [] child_rip+0x0/0x12 Given the trace, I assume that if I avoid all B+tree managed unwritten = extents, I can avoid the crash. However avoiding such files completely = seems a bit unrealistic, as I have the need to store files with a = reasonable amount of holes... Thanks, Richard ------_=_NextPart_001_01CA9B04.105572C5 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable 2.6.23 kdb in xfs_bmbt_get_block with unwritten extents

Hello All,

I am getting random kdbs when creating preallocated files that are = excessively 'holey' (ex: 500MB+ file with alternating 4K written 4K = unwritten extents). Creating such files is not my intention, and is = being addressing in the userspace writer. That said, I am still = concerned with running into kdb.

I am currently running 2.6.23.9, and have done some digging through the = changelogs, but cant seem to find a match. Also, 2.6.24 seems to have a = massive rewrite in this area, which significantly limits the scope that = I can search.

The cause of the crash is a straigtforward NULL derference in = xfs_bmap_btree.c:xfs_bmbt_get_block(), but I suspect the root cause is = going to be some complex condition that corrupts the cursor...

if (level < cur->bc_nlevels - 1) {
        *bpp =3D = cur->bc_bufs[level];        &n= bsp;  <----- cur->bc_bufs[level] =3D=3D NULL
        rval =3D = XFS_BUF_TO_BMBT_BLOCK(*bpp);   <----- BAM! NULL = dereferenced
}

Scanning the source, I see numerous instances of this same unchecked = dereference from bc_bufs, but so far I have only hit this one = condition.

Here is the call trace...

 [<ffffffff8034fec0>] xfs_bmbt_increment+0xb0/0x2c0
 [<ffffffff80346c4b>] = xfs_bmap_add_extent_unwritten_real+0x5eb/0xd50
 [<ffffffff80349c72>] xfs_bmap_add_extent+0x152/0x480
 [<ffffffff8038f8d2>] kmem_zone_zalloc+0x32/0x50
 [<ffffffff8034cd40>] xfs_bmapi+0xbe0/0x11f0
 [<ffffffff806ea404>] _spin_unlock+0x14/0x40
 [<ffffffff806ea4cd>] _spin_lock+0x1d/0x90
 [<ffffffff80375e63>] xfs_log_reserve+0xa3/0x100
 [<ffffffff806ea975>] _spin_unlock_irq+0x15/0x40
 [<ffffffff806e9fa6>] __down_write_nested+0x96/0xa0
 [<ffffffff80381ec9>] xfs_trans_reserve+0xa9/0x1f0
 [<ffffffff8037274a>] = xfs_iomap_write_unwritten+0x14a/0x230
 [<ffffffff8037166e>] xfs_iomap+0x2fe/0x390
 [<ffffffff806ea3c6>] __lock_text_start+0x16/0x40
 [<ffffffff8038fb20>] xfs_end_bio_unwritten+0x0/0x50
 [<ffffffff8038fb51>] xfs_end_bio_unwritten+0x31/0x50
 [<ffffffff802429a3>] run_workqueue+0x73/0x130
 [<ffffffff80242afc>] worker_thread+0x9c/0xf0
 [<ffffffff80246cd0>] autoremove_wake_function+0x0/0x30
 [<ffffffff80246cd0>] autoremove_wake_function+0x0/0x30
 [<ffffffff80242a60>] worker_thread+0x0/0xf0
 [<ffffffff8024660c>] kthread+0x6c/0xa0
 [<ffffffff8020c9a8>] child_rip+0xa/0x12
 [<ffffffff802465a0>] kthread+0x0/0xa0
 [<ffffffff8020c99e>] child_rip+0x0/0x12

Given the trace, I assume that if I avoid all B+tree managed unwritten = extents, I can avoid the crash. However avoiding such files completely = seems a bit unrealistic, as I have the need to store files with a = reasonable amount of holes...

Thanks,
Richard

------_=_NextPart_001_01CA9B04.105572C5--