On Wed, Dec 08, 2010 at 01:39:10AM -0800, blacknred wrote:
> >You've done a forced module load. No guarantee your kernel is in any
> >sane shape if you've done that....
> Agree, but I'm reasonably convinced that module isn't the issue, because it
> works fine with my other servers......
> >Strange failure. Hmmm - i386 arch and fedora - are you running with
> 4k stacks? If so, maybe it blew the stack...
> i386 arch, rhel 5.0
Yup, 4k stacks. This is definitely smelling like a stack blowout.
XFS on 4k stacks is a ticking timebomb - it will explode and you've
got no idea of when it will go boom. Recompile your kernel with 8k
stacks or move to x86_64.
> ># dd if=<device> bs=512 count=1 | od -c
> This is what i get now, but now server's been rebooted and running OK, what
> should i be expecting or rather what are we looking for in this output at
> point of failure?
Well, what you see here:
> 0000000 X F S B \0 \0 020 \0 \0 \0 \0 \0 025 324 304 \0
Is a valid XFS superblock magic number.
If you are getting this error:
> >> XFS: bad magic number
> >> XFS: SB validate failed
Then I'd expect to see anything other than "XFSB" as the magic
number. Of course, if you smashed the stack during mount, then there
will most likely be nothing wrong with the value on disk...
> >why did I flash the controller
> I was on 5.22 fw version which has a known 'lockup' issue which is fixed in
> 7.x ver.
> This is a critical fix.
Is the version 7.x firmware certified with such an old kernel? It's
not uncommon for different firmware versions to only be supported on
specific releases/kernel versions.