xfs
[Top] [All Lists]

System freezes with high kernel CPU usage

To: linux-xfs@xxxxxxxxxxx
Subject: System freezes with high kernel CPU usage
From: Russell Howe <rhowe@xxxxxxxxxx>
Date: Tue, 20 Aug 2002 16:33:38 +0100
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mutt/1.4i
Hi,

XFS has been performing great for me so far on my desktop machine (a
dual CPU system), although lately I've noticed some odd behaviour. The
system will lock up for several seconds, respond for a split second,
then lock up again for another 5-10 seconds or so, then again, this goes
on for a minute or two. The system isn't under any load at the time, X
is running, noatun open playing MP3s (although without the rest of KDE).
Just light normal use.

I was working on the Apache configuration and the above even happened as
I was starting apache. I now have a terminal window that looks like
this:
# /etc/init.d/apache start
<nothing>

Trying to enter anything into this window does nothing, no characters
echoed, nothing. Moving the window off-screen and back on again causes
the window to redraw itself, so I don't think the aterm process is
locked. There is no trace of the /etc/init.d/apache script running in
the ps output from what I can see.

This has happened a few times lately, all with this kernel. In the past
the system used to lock up solid and I just power-cycled it, blaming it
on the motherboad (there are some tweaks you can make to the board to
improve its stability). I also used to have the machine overclocked,
although this is no longer the case (was trying to get to the bottom of
the lockups).

A screenshot of gkrellm shortly afterwards is here:
http://rhowe.sfarc.net/gkrellm.png

Admittedly, it's not exactly a scientific analysis but it gives an idea
of what went on. The reason I thought it may be related to the XFS
kernel is that the first time the machine unfreezes for a split second,
there is HDD activity, and gkrellm shows drive activity too.

This is a CVS checkout, of 2.4.19-rc1 from the XFS CVS tree, with ALSA
0.9 from Debian unstable patched into it, lirc and lm-sensors.

The relevant part of the kernel log:
Aug 20 15:40:46 xiao kernel: SysRq : Emergency Sync
Aug 20 15:40:46 xiao kernel: SysRq : Emergency Sync
Aug 20 15:40:46 xiao kernel: SysRq : HELP : loglevel0-8 reBoot tErm kIll
saK showMem showPc unRaw Sync showTasks Unmount 
Aug 20 15:40:46 xiao last message repeated 5 times
Aug 20 15:40:46 xiao kernel: SysRq : Emergency Sync
Aug 20 15:40:46 xiao kernel: SysRq : HELP : loglevel0-8 reBoot tErm kIll
saK showMem showPc unRaw Sync showTasks Unmount 
Aug 20 15:40:46 xiao last message repeated 3 times
Aug 20 15:40:46 xiao kernel: SysRq : Emergency Sync
Aug 20 15:40:46 xiao kernel: SysRq : Emergency Sync
Aug 20 15:40:46 xiao kernel: SysRq : HELP : loglevel0-8 reBoot tErm kIll
saK showMem showPc unRaw Sync showTasks Unmount 
Aug 20 15:40:46 xiao kernel: SysRq : Emergency Sync
Aug 20 15:44:16 xiao kernel: Syncing device 03:01 ... OK
Aug 20 15:44:16 xiao kernel: Syncing device 16:42 ... OK
Aug 20 15:44:16 xiao kernel: Syncing device 16:43 ... OK
Aug 20 15:44:16 xiao kernel: Syncing device 16:45 ... OK
Aug 20 15:44:16 xiao kernel: Syncing device 16:01 ... OK
Aug 20 15:44:16 xiao kernel: Done.

(I tried SysRq-S to see if I could kick off some fs activity after
seeing this loosen a lockup in someone else's case). I guess the system
unfroze at 15:44:16. Also, I don't think I did 6 SysRq-S within the
space of a second..

One of my filesystems is rather full...

Filesystem            Size  Used Avail Use% Mounted on
/dev/hda1             4.0G  2.6G  1.4G  65% /
/dev/hdd2             4.9G  1.5G  3.5G  29% /usr/local
/dev/hdd3             4.0G  969M  3.0G  25% /var
/dev/hdd5             2.0G  1.3G  765M  62% /home
/dev/hdc1              75G   75G  201M 100% /usr/local/media
tmpfs                 284M     0  284M   0% /dev/shm

In case all this information is useless, what can I do to prepare for
the next time this happens? I have KDB in the kernel, although it's
currently disabled, since I keep hitting the Break key by accident. I'm
thinking of removing the key from the keyboard :)

-- 
Russell Howe     | Why be just another cog in the machine,
rhowe@xxxxxxxxxx | when you can be the spanner in the works?


<Prev in Thread] Current Thread [Next in Thread>