Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.12.3/8.12.3) with ESMTP id g5T7munC001470 for ; Sat, 29 Jun 2002 00:48:56 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.12.3/8.12.3/Submit) id g5T7mu3x001469 for linux-xfs-outgoing; Sat, 29 Jun 2002 00:48:56 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-linux-xfs@oss.sgi.com using -f Received: from srv.dmz.us.mvd (namodn.com [209.0.100.50] (may be forged)) by oss.sgi.com (8.12.3/8.12.3) with SMTP id g5T7menC001441 for ; Sat, 29 Jun 2002 00:48:40 -0700 Received: from mountainviewdata.com (unknown [202.237.246.10]) by srv.dmz.us.mvd (Postfix) with ESMTP id C471CB713; Sat, 29 Jun 2002 00:52:19 -0700 (PDT) Message-ID: <3D1D647F.3070302@mountainviewdata.com> Date: Sat, 29 Jun 2002 15:40:47 +0800 From: Eric Mei User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020605 X-Accept-Language: zh-cn, en-us MIME-Version: 1.0 To: linux-xfs@oss.sgi.com, eric@mountainviewdata.com Subject: Bug about XFS-1.0.1 on 2.4.5 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 7bit X-Spam-Status: No, hits=1.3 required=5.0 tests=MAY_BE_FORGED version=2.20 X-Spam-Level: * Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk Hi Team, This is a long report. I know some of you might be busy on kernel summit, we just can's believe XFS has such a serious bug. For some reason, we must stick on 2.4.5-xfs-1.0.1. We have the super-pcserver, connect with 5 Mac clients via Giga switch. All of the clients also have Gigaether NIC. After about 2-3 hours stressing, load average goes as high as 21, and continue increasing. All clients have lost connection. During the tests, there are many error messages: __alloc_pages: 0-order allocation failed __alloc_pages: 0-order allocation failed __alloc_pages: 0-order allocation failed ............ and such bulk of messages appear nearly every 10 minutes, 2 more hours later, system goes to dead... server is Duel Xeon 2.0G, 2GB Rambus RAM, ADTX 500G Raid storage, BCM gigaether card. Hyperthread is on, with "noapic" option, kernel 2.4.5-xfs-1.0.1, compiled *without* XFS_DEBUG. partition size is 500G. At the dead time, kdb show the zone infomations: DMA(4096 pages): free 559, inactive_clean 0, inactive_dirty 119 watermark is (128, 256, 384) Normal(225280 pages): free 21332, inactive_clean 92, inactive_dirty 4495 watermark is (255, 510, 765) High(294784 pages): free 516, inactive_clean 189311, inactive_dirty 23466 system overall active pages is 204281, inactive_dirty is 28080 bt shows kswapd is wait on xfs_ilock. and atalkd's backtrace is fsync_dev sys_sync panic kmem_zone_zalloc xfs_btree_init_cursor xfs_alloc_ag_vextend_near xfs_alloc_ag_vextend xfs_alloc_vextend xfs_bmap_alloc xfs_bmapi pagebuf_delalloc_convert pagebuf_write_full_page linvfs_write_full_page_nounlock _write_buffer sync_buffers fsync_dev sys_sync panic kmem_zone_zalloc xfs_btree_init_cursor xfs_alloc_ag_vextend_near xfs_alloc_ag_vextend xfs_alloc_vextend xfs_bmap_alloc xfs_bmapi xfs_strategy linvfs_pb_bmap pagebuf_delalloc_convert pagebuf_write_full_page linvfs_write_full_page_nounlock try_to_free_buffers page_launder do_try_to_free_buffers try_to_free_pages __alloc_pages __get_free_pages __pollwait datagram_poll .... do_select sys_select at that time no other processes in xfs or vm code. During the test, I found the inactive_clean+inactive_dirty in Normal zone is keeping decreasing. Is that a correct vm behaviour? Are there any page leaking in XFS? I appreciate anyone's help. If any further information needed, I'd be glad to do. thanks. Eric