Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8A8Upv08690 for linux-xfs-outgoing; Mon, 10 Sep 2001 01:30:51 -0700 Received: from alpha.bytecomm.com.au (byt130674-1.gw.connect.com.au [202.21.11.108]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8A8UVd08667 for ; Mon, 10 Sep 2001 01:30:35 -0700 Received: (qmail 23520 invoked from network); 10 Sep 2001 08:30:10 -0000 Received: from unknown (HELO herbie.local) (192.168.0.11) by 192.168.0.1 with SMTP; 10 Sep 2001 08:30:10 -0000 Received: by herbie.local with Internet Mail Service (5.5.1960.3) id ; Mon, 10 Sep 2001 18:30:04 +1000 Message-ID: From: Adrian Head To: Simon Matter , Adrian Head Cc: linux-xfs@oss.sgi.com Subject: RE: Problems with many processes copying large directories across an XFS volume. Date: Mon, 10 Sep 2001 18:29:55 +1000 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.1960.3) Content-Type: text/plain Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk Thanks for your reply Simon Yes the softraid was fully synced before I started any test. The XFS patch I used to obtain these errors was patch-2.4.9-xfs-2001-08-19 and the errors were: Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360 Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device When I used a later version of the XFS patch I had more descriptive errors written to /var/log/messages: Sep 10 10:14:57 ATLAS kernel: I/O error in filesystem ("md(9,0)") meta-data dev 0x900 block 0x9802bdc Sep 10 10:14:57 ATLAS kernel: (xlog_iodone") error 5 buf count 32768 Sep 10 10:14:57 ATLAS kernel: xfs_force_shutdown(md(9,0),0x2) called from line 940 of file xfs_log.c. Return address - 0xd8cb66f8 Sep 10 10:14:57 ATLAS kernel: Log I/O Error Detected. Shutting down filesystem: md(9,0) Sep 10 10:14:57 ATLAS kernel: Please umount the filesystem, and rectify the problem(s) Sep 10 10:14:57 ATLAS kernel: xfs_force_shutdown(md(9,0),0x2) called from line 714 of file xfs_log.c. Return address = 0xd8cb65d3 Sep 10 10:14:57 ATLAS kernel: attempt to access beyond end of device Sep 10 10:14:57 ATLAS kernel: 02:82: rw=0, want=1602235696, limit=4 I did think at the time that it may have been issues with XFS stomping all over raid code or raid code stomping all over XFS. Although I not sure now as the 2.4.10-pre2-xfs-2001-09-02 patch never wrote any errors out at all. (please see my 2nd post for more info) Thanks for taking the time to test this on your own machine. Adrian Head Bytecomm P/L > -----Original Message----- > From: Simon Matter [SMTP:simon.matter@ch.sauter-bc.com] > Sent: Monday, 10 September 2001 17:45 > To: adrian.head@bytecomm.com.au > Cc: linux-xfs@oss.sgi.com > Subject: Re: Problems with many processes copying large > directories across an XFS volume. > > Hi Adrian > > I did similar tests two months ago. I was having problems as well but > ufurtunately I don't remember what is was exactly. > First question: You created Softraid5, was the raid synced when you > started the tests? > > > In the /var/log/messages log around the same time as the copy test I > get > > entries like: > > Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360 > > Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device > > This looks interesting. I don't know what this means exactly but it > looks to me like you managed to create a filesystem bigger than the > raid > volume was? I got the very same error when I tried to restore data > with > xfsrestore from DAT (xfsrestore from DLT was fine). The issue is still > open. > > I have a test system here with SoftRAID5 on 4 U160 SCSI disks. I'll > try > to kill it today with cp jobs. > > -Simon >