Can you summarize all the data that we gather over this thread into one
Yes - hope it helps.
- what kernel does it happens? Seems like 3.0 and 3.1 hit it easily,
2.6.38 some times, 2.6.32 is fine. Did you test anything between
2.6.32 and 2.6.38?
Hits very easily: 3.0.4 and 3.1-rc5
Very rare: 2.6.38 - as it happened only some times i cannot 100%
guarantee that it is really the same issue
No issues at all: 2.6.32
I've not tested anything between 2.6.32 as i cannot reproduce it under
2.6.38 at all - seen once a week of 500.
I've seen this only on multi core CPUs with > 2.8Ghz and fast SAS Raid
10 or SSD. I cannot say if it's the CPU or the fast disks - as our low
cost systems have only small CPUs and the high end ones have big cpus
with fast disks.
- what hardware hits it often/sometimes/never?
What do you exactly mean? I've seen this on 1TB and 160GB SSD devices
with totally different disk layout.
- what is the fs geometry?
- what is the hardware?
- is this a 32 or 64-bit kernel, or do you run both?
I'm pretty sure most got posted somewhere, but let's get a summary
as things was a bit confusing sometimes.
I'm nearly willing todo anything to solve this. What can i do to help.
My last hope from today was to get some code lines with kgdb - sadly it
does not happen at all when kgdb is attached ;-(
Note that 2.6.38 moved the whole log grant code to a lockless algorithm,
so this might be a likely culprit if you're managing to hit race windows
no one else does, i.e. this really is a timing issue.