xfs
[Top] [All Lists]

Re: xfs_repair of root filesystem

To: Jeremy Jackson <jerj@xxxxxxxxxxxx>
Subject: Re: xfs_repair of root filesystem
From: Michael Sinz <msinz@xxxxxxxxx>
Date: Tue, 01 Apr 2003 06:48:33 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <1049150626.1258.58.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <1049150626.1258.58.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030314
Jeremy Jackson wrote:
Hi,

I'm wonder what's the official word about xfs_repair on a read-only
mounted fs.  The utilities complain for me, so I have to boot from a
repair partition to fix XFS (a while back when the shutdown files in use
bug was still a problem).  Ext2 has no problem with this.  I'd just to
know for future reference, so I know if I have to have a spare root fs or not.

While I too would like to have a way to repair XFS read-only mounts,
there are other reasons to have a special recovery partition for
this.

What I have done is to make /boot its own partition at the start of
the disk.  This is where the kernel lives, along with lilo stuff (or
grub if you use that)

Anyway, I have made a script that will build a mini-boot system in
that partition and that will then run as the init process to fix up
any other filesystems.  To help reduce the chance that /boot is
corrupted, I mount it read-only and, since nothing normally runs from
it, if it ever needs fixing, I can just unmount it and fix it after
a regular boot.

The install script (see attached) builds everything needed and even
tells you how much space it used in /boot to do its work.  You may need
to add lilo/grub entries that match your environment plus, as currently
written, it assumes a devfs kernel (someone want to make a different
version?)

It does support modular recovery kernels as well, but you do need to add
the linux.lastchance kernel yourself (I take a known good kernel and put
it there and only update it when I once again have a known good kernel)

Given that this is a recurring issue, I may make a web page for this.

--
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
        the master's ability to explain them to others.
#!/bin/sh
#
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
# This module contains the script that, when run, will put onto the /boot
# partition, a recovery boot feature.   The script will also add the entry
# into the lilo.conf such that it can be used.
#
# NOTE - If you do not clean out the /lib/modules tree from various kernel
# builds, you
#
# Note that this script will destroy any recovery boot feature that it
# may have already installed in order to be able to ensure that the new
# one is complete and correct.
#
## Only root can run this
if [ `id -u` != 0 ]; then
        echo "Only root can run this script!"
        exit 1
fi

## Remount /boot as read-write...
mount -o rw,remount /boot
if [ $? != 0 ]; then
        echo "Unable to mount /boot as read/write.  Is /boot a partition?"
        exit 1
fi

echo "Installing recovery boot feature into /boot"

## Make sure that it is all just owned by root...
umask 077

## Clean up any old install
rm -rf /boot/bin /boot/lib /boot/etc /boot/sbin /boot/boot /boot/dev /boot/proc 
/boot/var /boot/tmp

## Get the size before we install
before_size="`du -sb /boot`"

mkdir -p -m 700 /boot/bin /boot/lib /boot/etc /boot/sbin
mkdir -p -m 000 /boot/boot /boot/proc /boot/dev /boot/var /boot/tmp

## We need an "sh" processor too...
ln -s bash /boot/bin/sh

## Swapoff is just a softlink to swapon
ln -s swapon /boot/sbin/swapoff

## And, since we mount read-only, put /proc/mounts in /etc/mtab
ln -s /proc/mounts /boot/etc/mtab

## Make the special recovery fstab - this is needed to
## make sure that we can run in this mode
cat << 'boot-fstab' > /boot/etc/fstab
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
# This is the special recovery boot fstab
# We mount the recovery boot as read-only, just to be safe
# We also bind-mount it into /boot in case lilo.conf is needed.
# We also mount /var and /tmp as tmpfs mounts such that we can
# do some disk operations (everything else is read-only)
/dev/root       /       auto    ro,sync  0 0
/               /boot   none    bind     0 0
none            /proc   proc    defaults 0 0
none            /var    tmpfs   defaults 0 0
none            /tmp    tmpfs   defaults 0 0
#
### Any swap partitions found in the system's fstab go here:
boot-fstab

## Now, grab, from the system fstab any swap information...
grep "^/dev/.*swap.*swap" /etc/fstab >>/boot/etc/fstab

## The common bit of code that starts the recovery system.
cat << 'common-init' >/boot/sbin/init
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
export PATH="/sbin:/bin"
export TERM="ansi"
mount -n -a

## Start a pre-probing shell on vc/4 just as a backup in case
## there is some problem that needs a shell.
export PS1="XFS pre-Recovery Shell\n\w # "
bash 0<>/dev/vc/4 1>&0 2>&0 &

LastWord()
{
        while [ "x$1" != "x" ]; do
                word="$1"
                shift
        done
        echo $word
}

bootdev="`ls -l /dev/root`"

## Export the boot device name (so we skip it)
export xfs_boot="/dev/`LastWord $bootdev`"

## Start building the list of XFS partitions
## (We assume that the boot device is XFS just
## so that we can display something useful there)
export xfs_parts=$xfs_boot

## Turn on swap, if we have it...
swapon -a -e

## Try and let some async boot items finish
## so lets wait for 5 seconds...
echo ""
echo -n "Waiting for the dust to settle ... "
usleep 5000000
echo "done"
echo -n "Probing disk(s) and partition(s) ... "

## Note that we look for all parts of a disk on
## any host/bus/target/lun  -  This includes
## "whole" disks which do not have partitions
## This should work for scsi and ide
##
## Arg! - mount/xfs/kernel output even if redirected!
## So, we jump to vc/2 to do all of the work (and thus
## get all of the nasty details there) and then bounce
## back afterwards...
chvt 2
echo "Probing disk(s) and partition(s) ..." 0<>/dev/vc/2 1>&0 2>&0
for part in /dev/*/host*/bus*/target*/lun*/*; do
        mkdir -p /tmp/test
        echo "$part ... "
        if [ "$part" != "$xfs_boot" ]; then
                ## Note that using mount to test if the
                ## partition is XFS does two things for us:
                ## 1)  It forces XFS to replay anything that is
                ##     in the log for us
                ## 2)  It keeps us from trying to auto-fix any
                ##     partition that is so corrupted that
                ##     mount can not even replay the log
                mount -n -t xfs $part /tmp/test
                if [ $? = 0 ]; then
                        xfs_parts="$xfs_parts $part"
                fi
                umount /tmp/test
        fi
        echo ""
        rmdir /tmp/test
done 0<>/dev/vc/2 1>&0 2>&0
chvt 1
echo "done"

## Start the remaining recovery process
echo -e "\nXFS Recovery System  (details on vc/2)\n"

## Now for each XFS partition that is not the
## boot partition we run xfs_check to see if anything
## is even remotely wrong...
## Note that this exports the xfs_needs_repair variable
## which will contain all of the disks/partitions that
## have something wrong with them.
echo "XFS Partitions:"
echo -e "\n\nXFS Partitions: (details)" 0<>/dev/vc/2 1>&0 2>&0
export xfs_needs_repair=""
for part in $xfs_parts; do
        if [ "$part" != "$xfs_boot" ]; then
                echo -n "Checking $part ... "
                echo -e "\nChecking $part" 0<>/dev/vc/2 1>&0 2>&0
                xfs_check $part 0<>/dev/vc/2 1>&0 2>&0
                if [ $? = 0 ]; then
                        echo -e -n "\b\b\b\b- "
                        echo "OK"
                else
                        echo -e -n "\b\b\b\b- "
                        echo "needs repair"
                        xfs_needs_repair="$xfs_needs_repair $part"
                fi
        else
                echo "Skipping $part - boot partition"
        fi
done
echo ""

## Start a post-probe recovery shell on vc/3
## This is just in case you need more than 1 for some work
export PS1="XFS Recovery Shell\n\w # "
bash -c 'set ; echo -e "\nRun /sbin/repair to auto-repair\n" ; exec bash' 
0<>/dev/vc/3 1>&0 2>&0 &

## Now start the shell or auto-repair script (depending)
export PS1="XFS Recovery Shell (exit to reboot) [auto-repair = 
/sbin/repair]\n\w # "
$AUTO_XFS_REPAIR

echo -n "rebooting..."
sync
swapoff -a
umount -a >/dev/null 2>&1
sync
echo -n "  please wait..."
reboot -f -d
common-init
chmod 500 /boot/sbin/init

## Make out interactive recovery init script
cat << 'manual-init' >/boot/sbin/init-manual
#!/bin/bash
#
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
AUTO_XFS_REPAIR=bash
. /sbin/init
manual-init
chmod 500 /boot/sbin/init-manual || exit 1

## Make our autofix init script
cat << 'auto-init' >/boot/sbin/init-auto
#!/bin/bash
#
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
AUTO_XFS_REPAIR=/sbin/repair
. /sbin/init
auto-init
chmod 500 /boot/sbin/init-auto || exit 1

## Make the auto-repair script/command
cat << 'auto-repair' >/boot/sbin/repair
#!/bin/bash
#
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
## This script uses the exported xfs_needs_repair
## variable to do its work.  This variable should
## contain all of the devices that are XFS filesystems
## that did not pass xfs_check.
_xfs_needs_repair="x $xfs_needs_repair"
for part in $_xfs_needs_repair; do
        if [ "$part" != "x" ]; then
                echo "Repairing $part..."
                xfs_repair $part
                echo "Finished $part"
                echo ""
        fi
done
auto-repair
chmod 500 /boot/sbin/repair || exit 1

## Copy the fstab of the real system into a special file
## for easier reference...
cp /etc/fstab /boot/etc/fstab.system

## A simple routine to check the libraries needed...
CheckLibs ()
{
        if [ "$1" != "not" ]; then
                libfile=$3
                libtarget="/boot/lib/`basename $1`"

                if [ ! -f $libtarget ]; then
                        echo "    requires $libtarget"
                        install -m 500 --strip "$libfile" "$libtarget" || exit 1
                fi
        fi
}

for file in \
        /bin/bash               \
        /bin/cat                \
        /bin/chmod              \
        /bin/chown              \
        /bin/cp                 \
        /bin/dd                 \
        /bin/df                 \
        /bin/dmesg              \
        /bin/echo               \
        /bin/grep               \
        /bin/ls                 \
        /bin/mkdir              \
        /bin/more               \
        /bin/mount              \
        /bin/mv                 \
        /bin/rm                 \
        /bin/rmdir              \
        /bin/sync               \
        /bin/umount             \
        /bin/usleep             \
        /bin/vi                 \
        /etc/lilo.conf          \
        /sbin/lilo              \
        /sbin/reboot            \
        /sbin/swapon            \
        /sbin/xfs_repair        \
        /usr/bin/chvt           \
        /usr/bin/du             \
        /usr/sbin/chroot        \
        /usr/sbin/xfs_check     \
        /usr/sbin/xfs_db        \
; do
        ## We don't have a "/usr" in the boot
        ## recovery area so we delete "/usr"
        ## if it is there...
        target="/boot${file##/usr}"
        
        ## Copy the file
        echo "  installing $target"
        install -m 500 --strip "$file" "$target" 2>/dev/null || exit 1
        
        ## Now, do we also want to make sure that
        ## we have the shared libraries that are needed
        IFS=$'\n'
        for lib in `ldd $file 2>/dev/null`; do
                unset IFS
                CheckLibs $lib
        done
        unset IFS
done

## Just to be sure that we have the modules for
## whatever kernel we are using...  (stripped :-)
echo "  installing kernel modules..."
find /lib/modules -type d -exec mkdir -p /boot\{\} \;
find /lib/modules -type f -exec install -m 500 --strip \{\} /boot\{\} \; 
2>/dev/null

## Check if our lilo.conf has the recovery option yet
if [ "x`grep Recovery /etc/lilo.conf`" == "x" ]; then
        cat << 'lilo.conf' >>/etc/lilo.conf

# Recovery entries
image=/boot/linux.lastchance
        label=Recovery
        root=/dev/ide/host0/bus0/target0/lun0/part1
        append="init=/sbin/init-manual"

image=/boot/linux.lastchance
        label=Autofix
        root=/dev/ide/host0/bus0/target0/lun0/part1
        append="init=/sbin/init-auto"
lilo.conf
fi

## Run lilo, just to be sure...
echo "Running lilo..."
lilo

## Get the size after we install
after_size="`du -sb /boot`"

## Now, remount /boot based on its fstab settings...
sync
mount -o remount /boot || exit

echo "Done."
echo ""

CalcUsage()
{
        diff=$(( $3 - $1 ))
        echo "Recovery feature is using $diff bytes in /boot"
}

CalcUsage $before_size $after_size
<Prev in Thread] Current Thread [Next in Thread>