Eric Sandeen wrote:
> > Hi all,
> > I've been having a problem that I thought might be related to Tridge's, but
> > maybe not:
> > When copying directories over NFS (v2) from OSF1 clients to a Linux server
> > with XFS, files and directories will mysteriously "vanish" after the cp
> > completes, however, enough of the file/dir remains (i.e., the name of the
> > file in the inode table) that an ls of the dir will yield this:
> > [root@sdssdp9 rawdata]# ls CANNOT_RM.51940/
> > ls: CANNOT_RM.51940/guider: No such file or directory
> Are you doing anything special to hit this? I move files around NFS all
Nothing terribly special. But I did lie about the kernel/XFS version,
accidentally: I upgraded to linux-2.4.8-xfs on Friday morning, the
corruption was discovered Friday night, but the files were written back when
the machine was running linux-2.4.7pre6-xfs.
Which version of NFS are you using? v2? This might be the root of the
problem. Some of our systems are forcing vers=2 for problems we had, oh,
back around 2.0.36 with OSF1 not being able to autonegotiate correctly. I
think that's all fixed now, so I'm having that mount option removed. I'll
let you know if the problem goes away.
> the time, I have not seen this. Anything else you can tell us to try to
> recreate it? Does this only happen on RAID?
I'd test on a non-RAID system if I had one (how often do you hear that?
;-). I suppose I could try it on my desktop. I'll let you know how that
goes, too. We're moving *lots* of data around (a few hundred GB/week at
least). I'm trying to get the data analysts to use rcp instead of NFS, but
that'll take some time.
> > Aug 18 00:14:10 sdssdp9 kernel: xfs_create looping, dir ino 0xa25e000, ino
> > 0x101000800, md(9,0)
> > Aug 18 00:14:10 sdssdp9 kernel:
> > Aug 18 00:14:10 sdssdp9 kernel: nfsd: non-standard errno: -990
> The -990 is EFSCORRUPTED, generated directly after the "xfs_create
> looping" message, which comes out of a trap commented like this:
> * xfs_create_broken is a trap routine to isolate the cause of a
> * loop condition reported in IRIX 6.4 by PV 522864. If no
> * of this error recur (that is, the trap code isn't hit), this
> * should be removed in future releases.
> so I guess we won't remove it just yet... ;-)
Hm... no. Not quite yet.
PS Tell Steve "Whatever gcc RH ships with 7.1 plus updates." I guess that's
gcc-2.96-85. You want me to try 2.96.66, per the Makefile?
Sloan Digital Sky Survey, Fermilab 630.840.6509
SDSS. Mapping the Universe.