[Top] [All Lists]

Re: RAID6 r-m-w, op-journaled fs, SSDs

To: linux-xfs@xxxxxxxxxxx
Subject: Re: RAID6 r-m-w, op-journaled fs, SSDs
From: David Brown <david.brown@xxxxxxxxxxxx>
Date: Sun, 01 May 2011 20:32:22 +0200
Cc: linux-raid@xxxxxxxxxxxxxxx
In-reply-to: <19901.31958.368144.832086@xxxxxxxxxxxxxxxxxx>
References: <19900.10868.583555.849181@xxxxxxxxxxxxxxxxxx> <20110501082717.5116e575@xxxxxxxxxxxxxx> <19901.31958.368144.832086@xxxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20110428 Fedora/3.1.10-1.fc14 Thunderbird/3.1.10
On 01/05/11 17:31, Peter Grandi wrote:
[ ... ]

* Can Linux MD do "abbreviated" read-modify-write RAID6
updates like for RAID5? [ ... ]

No. (patches welcome).

Ahhhm, but let me dig a bit deeper, even if it may be implied in
the answer: would it be *possible*?

That is, is the double parity scheme used in MS such that it is
possible to "subtract" the old content of a page and "add" the
new content of that page to both parity pages?

If I've understood the maths correctly, then yes it would be possible. But it would involve more calculations, and it is difficult to see where the best balance lies between cpu demands and IO demands. In general, calculating the Q parity block for raid6 is processor-intensive - there's a fair amount of optimisation done in the normal calculations to keep it reasonable.

Basically, the first parity P is a simple calculation:

P = D_0 + D_1 + .. + D_n-1

But Q is more difficult:

Q = D_0 + g.D_1 + g².D_2 + ... + g^(n-1).D_n-1

where "plus" is xor, "times" is a weird function calculated over a G(2^8) field, and g is a generator for that field.

If you want to replace D_i, then you can calculate:

P(new) = P(old) + D_i(old) + D_i(new)

Q(new) = Q(old) + g^i.(D_i(old) + D_i(new))

This means multiplying by g_i for whichever block i is being replaced.

The generator and multiply operation are picked to make it relatively fast and easy to multiply by g, especially if you've got a processor that has vector operations (as most powerful cpus do). This means that the original Q calculation is fairly efficient. But to do general multiplications by g_i is more effort, and will typically involve cache-killing lookup tables or multiple steps.

It is probably reasonable to say that when md raid first implemented raid6, it made little sense to do these abbreviated parity calculations. But as processors have got faster (and wider, with more cores) while disk throughput has made slower progress, it's maybe a different balance. So it's probably both possible and practical to do these calculations. All it needs is someone to spend the time writing the code - and lots of people willing to test it.

<Prev in Thread] Current Thread [Next in Thread>