Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.12.3/8.12.3) with ESMTP id g5R63rnC002369 for ; Wed, 26 Jun 2002 23:03:53 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.12.3/8.12.3/Submit) id g5R63qF3002368 for linux-xfs-outgoing; Wed, 26 Jun 2002 23:03:52 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-linux-xfs@oss.sgi.com using -f Received: from CoNetUX (IDENT:NqFC/Ne9POPATXgwJw1VnqpKPLL2aaOm@firewall.conet.cz [213.175.54.250]) by oss.sgi.com (8.12.3/8.12.3) with SMTP id g5R63hnC002340 for ; Wed, 26 Jun 2002 23:03:44 -0700 Received: from conet.cz (Libor [192.168.1.130]) by CoNetUX (8.11.6/8.11.6) with ESMTP id g5R67CE12927 for ; Thu, 27 Jun 2002 08:07:12 +0200 Message-ID: <3D1AAB70.4060400@conet.cz> Date: Thu, 27 Jun 2002 08:06:40 +0200 From: Libor Vanek User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1a) Gecko/20020611 X-Accept-Language: en-us, en MIME-Version: 1.0 To: linux-xfs@oss.sgi.com Subject: XFS corruption! Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, hits=0.1 required=5.0 tests=PLING,SUPERLONG_LINE version=2.20 X-Spam-Level: Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk Hi, we are selling Linux file servers and we wanted to use XFS. Our internal tests passed OK but when we installed first server at customer and migrated data an error occured (usually after copying 60-100 GB). In /var/log/messages we saw this messages: Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK: Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 leftsib -1 rightsib -129 Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK: Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 leftsib -1 rightsib -129 ...MANY MANY SAME... Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK: Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 leftsib -1 rightsib -129 Jun 27 03:10:30 localhost kernel: xfs_force_shutdown(md(9,0),0x8) called from line 1039 of file xfs_trans.c. Return address = 0xc01e816a Jun 27 03:10:30 localhost kernel: Corruption of in-memory data detected. Shutting down filesystem: md(9,0) Jun 27 03:10:30 localhost kernel: Please umount the filesystem, and rectify the problem(s) We tried migrating 160 GB of data using "cp -a" (over NFS), scp and rsync from old server using RH7.0 (ext2) - all resulted in this. The system is running software RAID5 (10x60GB), 1 GHz Celeron, 128 MB RAM, standard RH7.3 with SGI XFS modified installation CD. When we rebooted system everything seems OK (nothing lost) but after copying few more MB the same error occurs. We have built up 2 VERY same machines from same system image and both behave the very same so I think that it's some software failure. I have stress tested system with doing lot of "dd if=/dev/md0 of=/raid/tmp bs=10MB count=100" and recursive directories (about 50 levels deep) and nothing similar occured. Only when copying data over network from the old system. Thanks, Libor