xfs
[Top] [All Lists]

Re: NFS crash ... xfsdump?

To: "P.Dixon" <P.Dixon@xxxxxxxxx>
Subject: Re: NFS crash ... xfsdump?
From: Martin Josefsson <gandalf@xxxxxxxxxxxxxx>
Date: Mon, 11 Jun 2001 14:21:23 +0200 (CEST)
Cc: <linux-xfs@xxxxxxxxxxx>
In-reply-to: <Pine.LNX.4.33.0106111237310.2882-100000@xxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Mon, 11 Jun 2001, P.Dixon wrote:

> Hi,
>
> Whilst running xfsdump on our serve, NFS crashed and couldn't be
> restarted. The output from /var/log/messages is shown below. I've read
> that ext2dump shouldn't be used with 2.4 kernels - does this apply to
> xfsdump?
>
> Any time I see a NULL pointer being de-referenced, I get worried...
>
> I am running kernel-smp-2.4.2-SGI_XFS_1.0 from the Red Hat 7.1 SGI XFS
> install CD.

To mee this looks like a SCSI related Oops. First there's the SCSI one and
after one Oops has occured the kernel is left in a very unstable state so
anything can crash after that even though there's no bugs in that code.
See below.

You should really consider a kernelupgrade, a lot has happend with both
the main kernel and the XFS code since that release.
So I suggest that you checkout the latest version from the CVS and compile
that. I think the kernel in CVS is 2.4.6-pre1 or -pre2 with the XFS code.


> Jun 11 12:12:31 hepserv kernel: Unable to handle kernel NULL pointer 
> dereference at virtual address 00000000
> Jun 11 12:12:31 hepserv kernel:  printing eip:
> Jun 11 12:12:31 hepserv kernel: 00000000
> Jun 11 12:12:31 hepserv kernel: pgd entry cb13a000: 0000000000000000
> Jun 11 12:12:31 hepserv kernel: pmd entry cb13a000: 0000000000000000
> Jun 11 12:12:31 hepserv kernel: ... pmd not present!
> Jun 11 12:12:31 hepserv kernel: Oops: 0000
> Jun 11 12:12:31 hepserv kernel: CPU:    1
> Jun 11 12:12:31 hepserv kernel: EIP:    0010:[<00000000>]
> Jun 11 12:12:31 hepserv kernel: EFLAGS: 00010282
> Jun 11 12:12:31 hepserv kernel: eax: 00000000   ebx: cc458dc0   ecx: 00000000 
>   edx: c0374620
> Jun 11 12:12:31 hepserv kernel: esi: cc458e40   edi: cc458dc0   ebp: cc458dc0 
>   esp: cd941ec4
> Jun 11 12:12:31 hepserv kernel: ds: 0018   es: 0018   ss: 0018
> Jun 11 12:12:31 hepserv kernel: Process nfsd (pid: 1950, stackpage=cd941000)
> Jun 11 12:12:31 hepserv kernel: Stack: d0901f74 c9dac060 cc458e40 00000000 
> 05c0f573 d09023f6 cc458dc0 00000000
> Jun 11 12:12:31 hepserv kernel:        cf9f1214 11270000 cf9f1204 00000001 
> cf475fe0 cd941f24 ffffff8c 00000000
> Jun 11 12:12:31 hepserv kernel:        d09027a4 cf475e00 05c0f573 00000004 
> 00000000 00000001 cf9f1204 cf9f1090
> Jun 11 12:12:31 hepserv kernel: Call Trace: 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+958468/127681248] 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+959622/127680094] 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+960564/127679152]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+986009/127653707] 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+1021008/127618708] 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+951891/127687825] 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+1021008/127618708]
> Jun 11 12:12:31 hepserv kernel: Call Trace: [<d0901f74>] [<d09023f6>] 
> [<d09027a4>] [<d0908b09>] [<d09113c0>] [<d09005c3>] [<d09113c0>]
> Jun 11 12:12:31 hepserv kernel:        
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+698248/127941468] 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+1020880/127618836] 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+1019560/127620156] 
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+951289/127688427]
> [kernel_thread+35/48]
> Jun 11 12:12:31 hepserv kernel:        [<d08c26f8>] [<d0911340>] [<d0910e18>] 
> [<d0900369>] [<c01075e3>]
> Jun 11 12:12:31 hepserv kernel:
> Jun 11 12:12:31 hepserv kernel: Code:  Bad EIP value.

Here's the SCSI one.
Now the kernel is in a very unstable state, this may corrupt other things
in the kernel and the SCSI subsystem is probably not working anymore.

> Jun 11 12:13:56 hepserv kernel: xfs_iget_core: ambiguous vns: vp/0xc6734970, 
> invp/0xc58acab0
> Jun 11 12:13:56 hepserv kernel: Unable to handle kernel NULL pointer 
> dereference at virtual address 00000008
> Jun 11 12:13:56 hepserv kernel:  printing eip:
> Jun 11 12:13:56 hepserv kernel: c01e7892
> Jun 11 12:13:56 hepserv kernel: pgd entry c4a51000: 0000000000000000
> Jun 11 12:13:56 hepserv kernel: pmd entry c4a51000: 0000000000000000
> Jun 11 12:13:56 hepserv kernel: ... pmd not present!
> Jun 11 12:13:56 hepserv kernel: Oops: 0000
> Jun 11 12:13:56 hepserv kernel: CPU:    1
> Jun 11 12:13:56 hepserv kernel: EIP:    0010:[vn_revalidate+34/232]
> Jun 11 12:13:56 hepserv kernel: EIP:    0010:[<c01e7892>]
> Jun 11 12:13:56 hepserv kernel: EFLAGS: 00010282
> Jun 11 12:13:56 hepserv kernel: eax: 00000084   ebx: c58acab0   ecx: cf4f3000 
>   edx: 00000000
> Jun 11 12:13:56 hepserv kernel: esi: c58acab0   edi: 00000084   ebp: c58acab0 
>   esp: c4a53a24
> Jun 11 12:13:56 hepserv kernel: ds: 0018   es: 0018   ss: 0018
> Jun 11 12:13:56 hepserv kernel: Process xfsdump (pid: 2319, 
> stackpage=c4a53000)
> Jun 11 12:13:57 hepserv kernel: Stack: c58acab0 c58acab0 c174eee0 00000001 
> 14003fff 00000000 00000001 cf4f3000
> Jun 11 12:13:57 hepserv kernel:        00000000 c6333dbc 00000514 c6333dd4 
> 00000008 00000000 0079bc68 00000000
> Jun 11 12:13:57 hepserv kernel:        c17ca1cc 00000002 00000000 107afea0 
> 00000000 00000000 00000000 ffffffff
> Jun 11 12:13:57 hepserv kernel: Call Trace: [xfs_bmbt_get_state+51/60] 
> [xfs_iget_core+1916/1956] [xfs_getattr+64/636] [xfs_vn_iget+52/60] 
> [vn_initialize+213/344] [linvfs_read_inode+30/80] [get_new_inode+227/376]
> Jun 11 12:13:57 hepserv kernel: Call Trace: [<c01a161b>] [<c01bd320>] 
> [<c01d5db8>] [<c01bd3b8>] [<c01e7649>] [<c01e6b86>] [<c01505f7>]
> Jun 11 12:13:57 hepserv kernel:        [iget4+221/232] 
> [xfs_open_by_handle+275/796] [xlog_state_clean_log+163/212] 
> [xfs_ioctl+3001/3836] [xlog_state_clean_log+163/212] 
> [xlog_state_clean_log+163/212] [xfs_size_fn+0/20] [_xfs_imap_to_bmap+43/768]
> Jun 11 12:13:57 hepserv kernel:        [<c0150945>] [<c01dfa07>] [<c01c586b>] 
> [<c01e0d3d>] [<c01c586b>] [<c01c586b>] [<c01c2100>] [<c01e4163>]
> Jun 11 12:13:57 hepserv kernel:        [xfs_size_fn+0/20] 
> [xfs_bmbt_get_state+51/60] [xfs_bmap_do_search_extents+736/960] 
> [xfs_bmap_search_extents+77/84] [xfs_bmapi+835/4840] [<e2800920>]
> [eepro100:__insmod_eepro100_O/lib/modules/2.4.2-SGI_XFS_1.0smp/kernel+-375818/96]
>  
> [eepro100:__insmod_eepro100_O/lib/modules/2.4.2-SGI_XFS_1.0smp/kernel+-535039/96]
> Jun 11 12:13:57 hepserv kernel:        [<c01c2100>] [<c01a161b>] [<c019834c>] 
> [<c0198479>] [<c019992f>] [<e2800920>] [<d08273f6>] [<d0800601>]
> Jun 11 12:13:57 hepserv kernel:        [do_generic_file_read+1606/1620] 
> [generic_file_read+101/128] [xfs_inactive_free_eofblocks+240/720] 
> [xfs_iunlock+67/104] [xfs_inactive_free_eofblocks+257/720] 
> [xfs_release+198/228] [xfs_iunlock+67/104]
> [xfs_release+218/228] Jun 11 12:13:57 hepserv kernel:        [<c012a62e>] 
> [<c012a7a9>] [<c01d76ec>] [<c01bd8ff>] [<c01d76fd>] [<c01d7f96>] [<c01bd8ff>] 
> [<c01d7faa>]
> Jun 11 12:13:57 hepserv kernel:        [linvfs_ioctl+47/60] 
> [xlog_state_clean_log+163/212] [linvfs_ioctl+0/60] 
> [xlog_state_clean_log+163/212] [sys_ioctl+619/708] 
> [xlog_state_clean_log+163/212] [system_call+51/56] 
> [xlog_state_clean_log+163/212]
> Jun 11 12:13:57 hepserv kernel:        [<c01df523>] [<c01c586b>] [<c01df4f4>] 
> [<c01c586b>] [<c014a1c7>] [<c01c586b>] [<c01090cb>] [<c01c586b>]
> Jun 11 12:13:57 hepserv kernel:        [stext+43/203]
> Jun 11 12:13:57 hepserv kernel:        [<c010002b>]
> Jun 11 12:13:57 hepserv kernel:
> Jun 11 12:13:57 hepserv kernel: Code: 8b 4a 08 6a 00 25 80 00 00 00 50 8d 44 
> 24 18 50 52 8b 41 14

And here XFS blows up, probably because of the first Oops that left the
kernel in a unstable state.

> Jun 11 12:14:52 hepserv named[1810]: lame server on 'elo-relay.elotecnico.pt' 
> (in 'elotecnico.pt'?): 194.65.3.21#53
> Jun 11 12:15:00 hepserv login(pam_unix)[2183]: session opened for user root 
> by LOGIN(uid=0)
>
>

/Martin

-- 
Linux hackers are funny people: They count the time in patchlevels.


<Prev in Thread] Current Thread [Next in Thread>