xfs
[Top] [All Lists]

Re: XFS and knfsd

Subject: Re: XFS and knfsd
From: "D. Stimits" <stimits@xxxxxxxxxx>
Date: Tue, 12 Jun 2001 13:42:29 -0600
Cc: linux-xfs@xxxxxxxxxxx
References: <3B265B70.A5BCF044@xxxxxxxxxxxxx>
Reply-to: stimits@xxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
At one point I had the NULL pointer dereference at the same address
during boot of a new kernel, which did not have SGI patches. It appeared
to be aic7xxx failure, but was not. I keep mentioning this one patch
which is part of Alan Cox's ac 2.4.5 series, but which is not yet in the
main kernel source. Even if you cannot use this patch at all times, can
you test the following? In linux source, fs/block_dev.c, near line 596
(depending on kernel version), there is a function "ioctl_by_bdev". In
that function, add the line I have below that starts with "+" (don't use
the "+"):

int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long
arg)
{
        kdev_t rdev = to_kdev_t(bdev->bd_dev);
        struct inode inode_fake;
        int res;
        mm_segment_t old_fs = get_fs();

        if (!bdev->bd_op->ioctl)
                return -EINVAL;
        inode_fake.i_rdev=rdev;
+        inode_fake.i_bdev=bdev;
        init_waitqueue_head(&inode_fake.i_wait);
        set_fs(KERNEL_DS);
        res = bdev->bd_op->ioctl(&inode_fake, NULL, cmd, arg);
        set_fs(old_fs);
        return res;
}


Please note that on my SMP system, several kernel versions or
combinations die at bootup while trying to work with filesystems.
Without this, loopback encrypted partitions are also very likely to do a
hard lockup on this machine (about 90% of loopback encrypted partition
commands caused lockup). I have added this to every kernel since then
that I have used, and never seen the Oops again. Here is my first Oops
(no ksymoops because it was fatal and unbootable) without XFS:
Trying to unmount old root ... <1>Unable to handle kernel NULL pointer
dereference at virtual address 00000010
 printing eip:
c01c5bda
*pde = 00000000
Oops: 0000
CPU:    1
EIP:    0010:[<c01c5bda>]
EFLAGS: 00010202
eax: 00000000   ebx: 00000000   ecx: 00001261   edx: c1479d98
esi: 00000000   edi: c1479e2c   ebp: ffffffff   esp: c1479d68
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 1, stackpage=c1479000)
Stack: c1478000 cfe97f60 c1479e2c c013b0fb c1479d98 00000000 00001261
00000000
       cfeac560 00000000 fffffffe cfe97f60 cff06ea4 cff06e10 c0346340
0001def6
       cfef0001 c01ea7b2 cff06e00 00000082 00000202 c14e1cb8 cfef8200
cff07e60
Call Trace: [<c013b0fb>] [<c01ea7b2>] [<c0112912>] [<c01bd021>]
[<c01c5b69>] [<c01bdd44>] [<c01374cd>]
       [<c012dcef>] [<c0124bc9>] [<c0147e3f>] [<c014961b>] [<c013b3d4>]
[<c0139027>] [<c013825c>] [<c0106efb>]
       [<c01051d8>] [<c010522a>] [<c01056fb>]

Code: 8b 40 10 83 f8 02 7e 0e b8 f0 ff ff ff eb 7e 8d b4 26 00 00
Kernel panic: Attempted to kill init!


While it is quite possible these are not the same thing, it is still
fatal in some filesystem cases if this fix is not added (with or without
XFS, it is a general bug).

D. Stimits, stimits@xxxxxxxxxx

PS: I'm in the Boulder area too.

Kirk Thoning wrote:
> 
> Has this issue been resolved?  I am getting the same problem on a Redhat
> 7.1 system
> using the SGI kernel-2.4.2-SGI_XFS_1.0.i686.rpm.  Almost all clients are
> Redhat 6.2 (~15 clients) with a couple 7.0 and 3 HP-UX 10.20.  My
> impression is that the load wasn't that high, since it takes 1-2 weeks
> for this to occur.
> 
> Here's my output:
> 
> Jun  6 09:05:40 ccgg kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000010
> Jun  6 09:05:40 ccgg kernel:  printing eip:
> Jun  6 09:05:40 ccgg kernel: c88e7e83
> Jun  6 09:05:40 ccgg kernel: pgd entry c793b000: 0000000000000000
> Jun  6 09:05:40 ccgg kernel: pmd entry c793b000: 0000000000000000
> Jun  6 09:05:40 ccgg kernel: ... pmd not present!
> Jun  6 09:05:40 ccgg kernel: Oops: 0000
> Jun  6 09:05:40 ccgg kernel: CPU:    0
> Jun  6 09:05:41 ccgg kernel: EIP:
> 0010:[ipchains:__insmod_ipchains_S.bss_L1076+564547/16321505]
> Jun  6 09:05:41 ccgg kernel: EIP:    0010:[<c88e7e83>]
> Jun  6 09:05:41 ccgg kernel: EFLAGS: 00010246
> Jun  6 09:05:41 ccgg kernel: eax: 00000000   ebx: 00000000   ecx:
> c7fdadb0   edx: 00000010
> Jun  6 09:05:41 ccgg kernel: esi: c31db5a0   edi: c31dbaa0   ebp:
> c31dbaa0   esp: c23f7edc
> Jun  6 09:05:41 ccgg kernel: ds: 0018   es: 0018   ss: 0018
> Jun  6 09:05:41 ccgg kernel: Process nfsd (pid: 1005,
> stackpage=c23f7000)
> Jun  6 09:05:41 ccgg kernel: Stack: 00000003 0b01e79b c88e8286 c31dbaa0
> 00000003 c2338410 46000000 c2338400
> Jun  6 09:05:41 ccgg kernel:        c23f7f4c c7f56be0 c02fa000 ffffff8c
> 00000000 c88e8614 c7f56a00 0b01e79b
> Jun  6 09:05:41 ccgg kernel:        00000000 00000000 00000001 c2338400
> c2338290 c2338490 c2338000 c24eb800
> Jun  6 09:05:41 ccgg kernel: Call Trace:
> [ipchains:__insmod_ipchains_S.bss_L1076+565574/16320478]
> [ipchains:__insmod_i
> pchains_S.bss_L1076+566484/16319568]
> [ipchains:__insmod_ipchains_S.bss_L1076+559621/16326431]
> [ipchains:__insmod_ipcha
> ins_S.bss_L1076+624832/16261220]
> [ipchains:__insmod_ipchains_S.bss_L1076+558131/16327921]
> [ipchains:__insmod_ipchains_
> S.bss_L1076+624832/16261220]
> [ipchains:__insmod_ipchains_S.bss_L1076+372168/16513884]
> Jun  6 09:05:41 ccgg kernel: Call Trace: [<c88e8286>] [<c88e8614>]
> [<c88e6b45>] [<c88f6a00>] [<c88e6573>] [<c88f6a00>]
>  [<c88b8f08>]
> Jun  6 09:05:41 ccgg kernel:
> [ipchains:__insmod_ipchains_S.bss_L1076+625984/16260068]
> [ipchains:__insmod_ipchai
> ns_S.bss_L1076+624648/16261404]
> [ipchains:__insmod_ipchains_S.bss_L1076+557553/16328499]
> [kernel_thread+35/48]
> Jun  6 09:05:41 ccgg kernel:        [<c88f6e80>] [<c88f6948>]
> [<c88e6331>] [<c010752f>]
> Jun  6 09:05:41 ccgg kernel:
> Jun  6 09:05:41 ccgg kernel: Code: 8b 40 10 39 d0 74 21 8d 58 c8 39 f3
> 75 06 8b 5a 04 83 c3 c8
> 
> >
> > Are there any tricks to getting knfsd working with XFS? Our server which
> > serves an XFS partition gets an "oops" after about 15 hours of extremely
> > heavy use. I'm using the latest distro from CVS (2.4.3-XFS).  Here's the
> > kdb output of the oops, just in case someone has any ideas on how to
> > debug this:
> >
> > Unable to handle kernel NULL pointer dereference at virtual address 00000000
> >  printing eip:
> > c0145933
> > *pde = 00000000
> >
> > Entering kdb (current=0xceffc000, pid 625) on processor 0 Oops: Oops
> > due to oops @ 0xc0145933
> > eax = 0xcff9ae70 ebx = 0xffffffe8 ecx = 0x0000000f edx = 0xcff80000
> > esi = 0x00000000 edi = 0xceffdeb4 esp = 0xceffde48 eip = 0xc0145933
> > ebp = 0xceffde68 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010207
> > xds = 0xcff80018 xes = 0x00000018 origeax = 0xffffffff &regs = 0xceffde14
> > [0]kdb> bt
> >     EBP       EIP         Function(args)
> > 0xceffde68 0xc0145933 d_lookup+0x67 (0xc83e70c0, 0xceffdeb4)
> >                                kernel .text 0xc0100000 0xc01458cc 0xc01459e8
> > 0xceffde7c 0xc013c911 cached_lookup+0x11 (0xc83e70c0, 0xceffdeb4, 0x0)
> >                                kernel .text 0xc0100000 0xc013c900 0xc013c954
> > 0xceffdea0 0xc013d862 lookup_hash+0x52 (0xceffdeb4, 0xc83e70c0)
> >                                kernel .text 0xc0100000 0xc013d810 0xc013d914
> > 0xceffdec0 0xc013d969 lookup_one+0x55 (0xc4b860e0, 0xc83e70c0)
> >                                kernel .text 0xc0100000 0xc013d914 0xc013d97c
> > 0xceffdf04 0xc016d882 nfsd_lookup+0x3b2 (0xcf05ac00, 0xcf05aa00, 0xc4b860e0,
> > 0x6, 0xcf05a800)
> >                                kernel .text 0xc0100000 0xc016d4d0 0xc016d9cc
> > 0xceffdf2c 0xc016b50b nfsd_proc_lookup+0x87 (0xcf05ac00, 0xcf05aa00, 
> > 0xcf05a8
> > 00)
> >                                kernel .text 0xc0100000 0xc016b484 0xc016b520
> > 0xceffdf4c 0xc016adf9 nfsd_dispatch+0xc5 (0xcf05ac00, 0xceff8014)
> >                                kernel .text 0xc0100000 0xc016ad34 0xc016ae90
> > 0xceffdfa8 0xc0293bca svc_process+0x2ca (0xcfef5b20, 0xcf05ac00)
> >                                kernel .text 0xc0100000 0xc0293900 0xc0293e30
> > 0xceffdfec 0xc016abba nfsd+0x1a2
> >                                kernel .text 0xc0100000 0xc016aa18 0xc016ad34
> >            0xc0105547 kernel_thread+0x23
> >                                kernel .text 0xc0100000 0xc0105524 0xc010555c
> >
> >
> > Any help is appreciated.
> >
> > Ajay
> 
> --
> ************************************************************
> * Kirk Thoning                         Phone: 303 497-6078 *
> * NOAA/CMDL                              Fax: 303 497-6290 *
> * R/CMDL1                  e-mail: Kirk.W.Thoning@xxxxxxxx *
> * 325 Broadway                                             *
> * Boulder, Colorado 80303                                  *
> ************************************************************

<Prev in Thread] Current Thread [Next in Thread>