On Thu, Nov 22, 2007 at 01:06:11PM +0100, Andi Kleen wrote:
> > FWIW from a "real time" database POV this seems to make sense to me...
> > in fact, we probably rely on filesystem metadata way too much
> > (historically it's just "worked".... although we do seem to get issues
> > on ext3).
> For that case you really would need priority inheritance: any metadata
> IO on behalf or blocking a process needs to use the process' block IO
How do you do that when the processes are blocking on semaphores,
mutexes or rw-semaphores in the fileysystem three layers removed from
the I/O in progress?
e.g. a low priority process transaction is holding the AGF buffer
locked but the transaction is blocked waiting for some other
metadata I/O it has issued needed in the transaction. That metadata
I/O is being held out by a higher priority process doing lots of
Another process at the same priority creates a file, requiring
inodes to be allocated so it locks the directory into the
transaction and later blocks on the AGF buffer semaphore trying to
allocate space for the new inode.
A very high priority process now comes along and tries to read the
directory locked in the create transaction, and blocks on the
directory inode ilock because it's already held in write mode.
That's three processes all blocked on locks unrelated to the I/O
that is being held out, and there is no direct connection that can
be used to pass the priority down to the blocked I/O that is causing
all the problems.....
It's a Bad Idea.
SGI Australian Software Group