Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.12.3/8.12.3) with ESMTP id g5RAt8nC013355 for ; Thu, 27 Jun 2002 03:55:08 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.12.3/8.12.3/Submit) id g5RAt8ZK013354 for linux-xfs-outgoing; Thu, 27 Jun 2002 03:55:08 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-linux-xfs@oss.sgi.com using -f Received: from CoNetUX (IDENT:Pwe7odLTmj9RDGg/6rxLgh9ypgV7K0PK@firewall.conet.cz [213.175.54.250]) by oss.sgi.com (8.12.3/8.12.3) with SMTP id g5RAstnC013324 for ; Thu, 27 Jun 2002 03:54:56 -0700 Received: from conet.cz (Libor [192.168.1.130]) by CoNetUX (8.11.6/8.11.6) with ESMTP id g5RAwQE20207; Thu, 27 Jun 2002 12:58:26 +0200 Message-ID: <3D1AEFB0.3070609@conet.cz> Date: Thu, 27 Jun 2002 12:57:52 +0200 From: Libor Vanek User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1a) Gecko/20020611 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Seth Mos CC: linux-xfs@oss.sgi.com Subject: Re: XFS corruption! References: <4.3.2.7.2.20020627090504.03c4f4a0@pop.xs4all.nl> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, hits=1.3 required=5.0 tests=PLING,SIGNATURE_DELIM version=2.20 X-Spam-Level: * Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk > > >> Hi, >> we are selling Linux file servers and we wanted to use XFS. Our >> internal tests passed OK but when we installed first server at >> customer and migrated data an error occured (usually after copying >> 60-100 GB). In /var/log/messages we saw this messages: > > One of the developers better comment on those messages. I also think so thats why I post my message here. >> We tried migrating 160 GB of data using "cp -a" (over NFS), scp and >> rsync from old server using RH7.0 (ext2) - all resulted in this. >> The system is running software RAID5 (10x60GB), 1 GHz Celeron, 128 MB >> RAM, standard RH7.3 with SGI XFS modified installation CD. >> When we rebooted system everything seems OK (nothing lost) but after >> copying few more MB the same error occurs. >> We have built up 2 VERY same machines from same system image and both >> behave the very same so I think that it's some software failure. > > It sounds like it. Did you build this filesystem with any special mkfs > options? > What IDE controllers are you using? Did you use the 2.4.18 kernel that > came on the installer disk or is this a selfcompiled version or even a > CVS checkout? I used default 2.4.18-4-XFS-1.1 and also custom build (same version) - no difference. >> I have stress tested system with doing lot of "dd if=/dev/md0 >> of=/raid/tmp bs=10MB count=100" and recursive directories (about 50 >> levels deep) and nothing similar occured. Only when copying data over >> network from the old system. > > Weird. I frequently have to copy large amounts of data over the > network and it works fine so I suspect that something in your > filesystem is not right and causing it to fail again as soon as you > try to copy to it again. Now I remember I had also tried to do this "dd" over NFS between the two same machines also whithout any corruption. Very strange. > Can you check/repair the filesystem and see if it appears again? I can - it does the same but sooner (not after copying tens of GB but after copying GBs). As it is production system (from which I'm copying) my tests are very limited. -- S pozdravem, Libor Vanek Kontakt: +-------------------------------------+ | Email: libor@conet.cz | | ICQ: 124529939 | | WWW: http://www.discobolos.net | | Tel/fax: 05/4122 5091, 6293, 6003 | | Mobil: 0603 536 946 | +-------------------------------------+