Received: with ECARTIS (v1.0.0; list linux-xfs); Wed, 12 May 2004 08:03:19 -0700 (PDT) Received: from gfdlwebshield.gfdl.noaa.gov (mailbox.GFDL.NOAA.GOV [140.208.1.202]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id i4CF31KO019082 for ; Wed, 12 May 2004 08:03:11 -0700 Received: from mailbox.GFDL.NOAA.GOV(140.208.1.202) by gfdlwebshield.gfdl.noaa.gov via csmap id 78af4ed6_a425_11d8_8dd9_0002b3a99bb9_26631; Wed, 12 May 2004 11:03:15 -0400 (EDT) Received: from gfdlwebshield.gfdl.noaa.gov (majordomo2.GFDL.NOAA.GOV [140.208.1.206]) by mailbox.gfdl.noaa.gov (Netscape Messaging Server 4.15) with ESMTP id HXLWG200.J7R; Wed, 12 May 2004 11:02:26 -0400 Received: from majordomo2.GFDL.NOAA.GOV(140.208.1.206) by gfdlwebshield.gfdl.noaa.gov via csmap id 6de9cd3c_a425_11d8_8aed_0002b3a99bb9_441; Wed, 12 May 2004 11:02:57 -0400 (EDT) Received: from pimdev.gfdl.noaa.gov (root@pimdev [140.208.1.39]) by majordomo2.gfdl.noaa.gov (8.12.10/8.12.10) with ESMTP id i4CF2GdP016109; Wed, 12 May 2004 11:02:16 -0400 Received: from pimdev.gfdl.noaa.gov (smmsp@localhost [127.0.0.1]) by pimdev.gfdl.noaa.gov (8.12.9/8.11.4) with ESMTP id i4CF1miv020343; Wed, 12 May 2004 11:01:48 -0400 Received: (from root@localhost) by pimdev.gfdl.noaa.gov (8.12.9/8.12.4/Submit) id i4CF1m3M020342; Wed, 12 May 2004 11:01:48 -0400 Date: Wed, 12 May 2004 11:01:42 -0400 Message-Id: <1441-Wed12May2004110142-0400-Philip.Macias@NOAA.gov> X-Mailer: emacs 21.2.1 (via feedmail 8 I) To: oar.gfdl.linux-workstations@noaa.gov, cattelan@xfs.org, linux-xfs@oss.sgi.com In-reply-to: <3174-Wed04Feb2004130248-0500-Philip.Macias@NOAA.gov> Subject: Re: xfsrestore fails to exit properly under 2.6.1 From: Phil Macias References: <4229-Wed04Feb2004062354-0500-Philip.Macias@NOAA.gov> <1075916534.96681.5.camel@lupo.thebarn.com> <3174-Wed04Feb2004130248-0500-Philip.Macias@NOAA.gov> Mime-Version: 1.0 (generated by tm-edit 7.106) Content-Type: text/plain; charset=US-ASCII X-NAI-Spam-Score: -4.9 X-NAI-Spam-Rules: 1 Rules triggered BAYES_00=-4.9 X-archive-position: 3110 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: Philip.Macias@noaa.gov Precedence: bulk X-list: linux-xfs Content-Length: 8471 Lines: 232 All, Finally back at this and have found the fix. The problem was the "xfsrestorehousekeepingdir" files. root: ls -la /clone/xfsrestorehousekeepingdir total 304k drwx------ 2 root root 4.0k May 12 02:19 ./ drwxr-xr-x 3 root root 4.0k May 12 02:19 ../ -rw------- 1 root root 24k May 12 02:07 .nfs000067c200000002 -rw------- 1 root root 39k May 12 02:07 .nfs000067c300000003 -rw------- 1 root root 190k May 12 02:07 .nfs000067c400000004 -rw------- 1 root root 46M May 12 02:06 .nfs000067c500000005 -rw------- 1 root root 36k May 12 02:07 .nfs000067c600000001 -rw------- 1 root root 0 May 12 02:06 .nfs0000685d00000006 In the 2.4.x systems we have that dir could be on the NFS-mounted destination drives with no problem. Evendently they were creating a problem for 2.6.x as they remained locked by some process even after the hanging xfsrestore process was killed. Adding a rm -rf /tmp/xfsrestorehousekeepingdir/ and "-a /tmp" parameter to the cloning script fixed everything. - Phil ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Phil Macias" , RSIS, Inc. * NOAA/GFDL * Princeton, NJ ___,___ |_|_, 609 987 5059 office | 609 203 5874 cell >__, | Date: 4 Feb 2004 13:02:49 -0500 | Bcc: root@pimdev.gfdl.noaa.gov | Date: Wed, 4 Feb 2004 13:02:48 -0500 | CC: Philip.Macias@noaa.gov, linux-xfs@oss.sgi.com | From: Phil Macias | | Here you go: | | root: ~# gdb --pid=7896 | GNU gdb Red Hat Linux (5.2-2) | Copyright 2002 Free Software Foundation, Inc. | GDB is free software, covered by the GNU General Public License, and you are | welcome to change it and/or distribute copies of it under certain conditions. | Type "show copying" to see the conditions. | There is absolutely no warranty for GDB. Type "show warranty" for details. | This GDB was configured as "i386-redhat-linux". | Attaching to process 7896 | Reading symbols from /usr/sbin/xfsrestore...done. | Reading symbols from /usr/lib/libhandle.so.1...done. | Loaded symbols for /usr/lib/libhandle.so.1 | Reading symbols from /usr/lib/libattr.so.1...done. | Loaded symbols for /usr/lib/libattr.so.1 | Reading symbols from /lib/libc.so.6...done. | Loaded symbols for /lib/libc.so.6 | Reading symbols from /usr/lib/libgcc_s.so.1...done. | Loaded symbols for /usr/lib/libgcc_s.so.1 | Reading symbols from /lib/ld-linux.so.2...done. | Loaded symbols for /lib/ld-linux.so.2 | 0x4010c566 in open64 () from /lib/libc.so.6 | | (gdb) backtrace | #0 0x4010c566 in open64 () from /lib/libc.so.6 | #1 0x400e4dd9 in opendir () from /lib/libc.so.6 | #2 0x08073a0b in wipepersstate () at content.c:3600 | #3 0x08072701 in content_complete () at content.c:2587 | #4 0x08062de7 in main (argc=4, argv=0xbffff514) at main.c:636 | #5 0x4004ed06 in __libc_start_main () from /lib/libc.so.6 | | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | "Phil Macias" , | | RSIS, Inc. * NOAA/GFDL * Princeton, NJ | | ___,___ | |_|_, 609 987 5059 office | | 609 203 5874 cell | >__, | | | | From: Russell Cattelan | | Cc: linux-xfs@oss.sgi.com | | Date: Wed, 04 Feb 2004 11:42:14 -0600 | | | | Are you able to attach gdb to the hung process and | | get a backtrace? | | | | On Wed, 2004-02-04 at 05:23, Phil Macias wrote: | | > Hello, | | > | | > We have been using XFS on RedHat linux for over two years (since I | | > have been here). I have been using a script to clone workstations from | | > one-another for over a year whereby the host-to-be-cloned exports it's | | > XFS partitions over NFS and they are mounted by the donor host and | | > contents copied with: | | > | | > xfsdump -v5 -l 0 - /dev/hda7 | xfsrestore -v5 - /clone-var | | > | | > This process worked reliably for over a year on the 2.4.x kernels and | | > these versions of xfs tools: | | > | | > acl-2.1.1-gfdl-1-1 | | > attr-2.1.1-gfdl-1-1 | | > dmapi-2.0.5-gfdl-1-1 | | > kernel-2.4.18-XFS-NFS-base-gfdl-2-1 | | > xfsdump-2.2.4-gfdl-1-1 | | > xfsprogs-2.3.6-gfdl-1-1 | | > | | > PROBLEM: I am testing the 2.6.1 kernel and the following relevant | | > packages: | | > | | > libelf-0.8.2-2-gfdl-1-1 | | > elfutils-libelf-0.89-2-gfdl-1-1 | | > popt-1.8.1-0.31-gfdl-1-1 | | > gcc-3.3-gfdl-1-1 | | > kernel-2.6.0-complete-gfdl-1-1 | | > glibc-2.3.2-gfdl-1-1 | | > beecrypt-3.0.1-gfdl-1-1 | | > | | > acl-2.2.21-gfdl.tgz | | > attr-2.4.12-gfdl.tgz | | > binutils-2.14-gfdl.tgz | | > dmapi-2.1.0-gfdl.tgz | | > xfsprogs-2.6.0-gfdl.tgz | | > | | > Everything on the system works well except the remote | | > xfsdump/xfsrestore. Running: | | > | | > xfsdump -v5 -l 0 - /dev/hda7 | xfsrestore -v5 - /clone-var | | > | | > on the 2.6.1 system does dump/restore properly, but xfsrestore never | | > exits. I have to issue these commands to release xfsrestore: | | > | | > kill % | | > fuser -k /clone-var/ | | > | | > ...where /clone-var/ is the target partition. Please note that | | > the target host is still running the old (2.4.x) kernel. | | > | | > Here are the last lines to the "-v5" output: | | > | | > ... | | > xfsrestore: read file hdr off 0 flags 0x0 ino 14680333 mode 0x0000a1ff | | > xfsrestore: preemptchk( ) | | > xfsrestore: restoring lib/scrollkeeper/pt_BR (14680333 0) | | > xfsrestore: restoring symbolic link ino 14680333 lib/scrollkeeper/pt_BR | | > xfsrestore: drive_simple read( want 32 ) | | > xfsrestore: drive_simple return_read_buf( returning 32 ) | | > xfsrestore: xlate_extenthdr | | > xfsrestore: read extent hdr size 32 offset 0 type 4 flags 00000000 | | > xfsrestore: drive_simple read( want 32 ) | | > xfsrestore: drive_simple return_read_buf( returning 32 ) | | > xfsrestore: drive_simple get_mark( ) | | > xfsrestore: drive_simple read( want 256 ) | | > xfsrestore: drive_simple return_read_buf( returning 256 ) | | > xfsrestore: xlate_bstat | | > xfsrestore: xlate_bstat: pre-xlate | | > bs_ino 0 | | > bs_mode 0 | | > xfsrestore: xlate_bstat: post-xlate | | > bs_ino 0 | | > bs_mode 0 | | > xfsrestore: xlate_filehdr: pre-xlate | | > fh_offset 0 | | > fh_flags 83886080 | | > fh_checksum 13835040720794157312 | | > xfsrestore: xlate_filehdr: post-xlate | | > fh_offset 0 | | > fh_flags 5 | | > fh_checksum 13835040720794157312 | | > xfsrestore: read file hdr off 0 flags 0x5 ino 0 mode 0x00000000 | | > xfsrestore: preemptchk( ) | | > xfsrestore: Media_end: pos=3D=3D3 | | > xfsrestore: drive_simple end_read( ) | | > xfsrestore: getting next media file for non-dir restore | | > xfsrestore: Media_mfile_next: purp=3D=3D2 pos=3D=3D0 | | > xfsrestore: tree finalize | | > xfsrestore: restore complete: 139 seconds elapsed | | > ------------------------------------------------- | | > | | > Syslog shows no erors or any info about xfsrestore. | | > | | > Any idea why xfsrestore fails to exit properly? | | > | | > Thanx, | | > | | > | | > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | > "Phil Macias" , | | > | | > RSIS, Inc. * NOAA/GFDL * Princeton, NJ | | > | | > ___,___ | | > |_|_, 609 987 5059 office | | > | 609 203 5874 cell | | > >__, | | > | | > | | > | | -- | | Russell Cattelan | | | | --=-P/55uI4EMIJNOPBs/njs | | Content-Type: application/pgp-signature; name=signature.asc | | Content-Description: This is a digitally signed message part | | | | -----BEGIN PGP SIGNATURE----- | | Version: GnuPG v1.2.4 (FreeBSD) | | | | iD8DBQBAIS72NRmM+OaGhBgRApgsAJ91dL40QRuf489yvuWP0edD5CW3mgCePnHE | | jJKK3969prkDV3Ty8A15YcE= | | =PMku | | -----END PGP SIGNATURE----- | | | | --=-P/55uI4EMIJNOPBs/njs-- | | | | | |