Sunday, July 13, 2008

A LESSER-KNOWN FACT ABOUT RAID-4

Given a tight budget and the requirement that we maximize usable capacity, I would choose RAID-4 over RAID-1 and RAID-5.

The Internet abounds in numerous articles about the different RAID levels. Most articles recommend RAID-5 although a fair number suggest RAID-1 or RAID-10.

There’s no point in duplicating information that is readily found in these articles. I did notice, however, that I have yet to encounter an article that mentions a little or lesser-known fact about RAID-4.

RAID-4’s lesser-known fact is the capability it offers of being able to add additional drives to the array without disrupting the array’s operation. This is a huge advantage over the more popular and more recommended RAID-5. Adding capacity to a RAID-5 array is a lengthy, tedious, risky, and disruptive process.
  • First, the contents of the entire array must be copied to some external storage space.
  • Second, the additional drive(s) is added.
  • Third, the array must be configured again to operate as a RAID-5 array.
  • Finally, the contents must be copied back into the new RAID-5 array.
Contrast this with RAID-4. Depending upon the controller hardware, you may be able to just physically add the extra drive to the array without even powering it down. If the controller doesn’t allow this, the array must first be powered down, the extra drive added, and then the array powered up again. That’s it!

RAID-4 requires at least three drives (or disks; the terms are used interchangeably here). Two disks are usable and the third is used to store parity data. The two disks are data disks and the third is the parity disk.

Parity data is information that can be used to re-create the actual data from which the parity data came from. Why is this useful? If either data disk fails, the surviving data disk can recreate the contents of its failed partner with the assistance of the parity disk. Refer to the title diagram. If the parity disk fails, it should be replaced as soon as possible since the RAID array is unprotected without that parity disk. All this time, the contents of your data disks remain uncorrupted.

CAPACITY

As mentioned earlier, in a 3-drive RAID-4 array, one drive is used to store the parity information of the two data disks. The usable capacity is 67%. In a 4-drive RAID-4, the usable capacity is 75% of the four disks. In a 5-drive, its 80%, and so forth.

MY PERSONAL EXPERIENCE

This dates back to 2000 before RAID-6 and all its proprietary variants (like NetApp's so-called RAID-DP) were invented. Like everyone else, RAID-5 was the default configuration. There was no question about it. That is, until our largest client, a hospital no less (!), experienced a RAID-5 array failure. Theory is one thing but reality is another. We could not restore the client's data. We tried e-v-e-r-y-t-h-i-n-g. We replaced the controller. Twice and with known good ones. We reinstalled the array software. We physically transferred the array to another server. Like I said, we tried e-v-e-r-y-t-h-i-n-g. But we weren't able to restore the data of the drive that failed. After that sorry experience, we discussed it with peers and the IT community at large. And we learned that our experience was not unique. As we learned these, we also found out that many VARs (Value Added Resellers; the archaic term for IT consulting firms) had more success with RAID-1 and -4 arrays. RAID-4 provides a similar level of protection as RAID-5 albeit without striping. And that's why I devoted this time to extol the virtue of RAID-4.

PARITY

Reproduced below is an excellent explanation of parity data from StorageReview.com.
You may have heard the term "parity" before, used in the context of system memory error detection; in fact, the parity used in RAID is very similar in concept to parity RAM. The principle behind parity is simple: take "N" pieces of data, and from them, compute an extra piece of data. Take the "N+1" pieces of data and store them on "N+1" drives. If you lose any one of the "N+1" pieces of data, you can recreate it from the "N" that remain, regardless of which piece is lost. Parity protection is used with striping, and the "N" pieces of data are typically the blocks or bytes distributed across the drives in the array. The parity information can either be stored on a separate, dedicated drive, or be mixed with the data across all the drives in the array.

The parity calculation is typically performed using a logical operation called "exclusive OR" or "XOR". As you may know, the "OR" logical operator is "true" (1) if either of its operands is true, and false (0) if neither is true. The exclusive OR operator is "true" if and only if one of its operands is true; it differs from "OR" in that if both operands are true, "XOR" is false. This truth table for the two operators will illustrate:


Uh huh. So what, right? Well, the interesting thing about "XOR" is that it is a logical operation that if performed twice in a row, "undoes itself." If you calculate "A XOR B" and then take that result and do another "XOR B" on it, you get back A, the value you started with. That is to say, "A XOR B XOR B = A." This property is exploited for parity calculation under RAID. If we have four data elements, D1, D2, D3 and D4, we can calculate the parity data, "DP" as "D1 XOR D2 XOR D3 XOR D4." Then, if we know any four of D1, D2, D3, D4 and DP, we can XOR those four together and it will yield the missing element.

Let's take an example to show how this works; you can do this yourself easily on a sheet of paper. Suppose we have the following four bytes of data: D1=10100101, D2=11110000, D3=00111100, and D4=10111001. We can "XOR" them together as follows, one step at a time:

D1 XOR D2 XOR D3 XOR D4
= ( (D1 XOR D2) XOR D3) XOR D4
= ( (10100101 XOR 11110000) XOR 00111100) XOR 10111001
= (01010101.XOR 00111100) XOR 10111001
= 01101001 XOR 10111001
= 11010000

So "11010000" becomes the parity byte, DP. Now let's say we store these five values on five hard disks, and hard disk #3, containing value "00111100", goes el-muncho. We can retrieve the missing byte simply by XOR-ing together the other three original data pieces, and the parity byte we calculated earlier, as so:

D1 XOR D2 XOR D4 XOR DP
= ( (D1 XOR D2) XOR D4) XOR DP
= ( (10100101 XOR 11110000) XOR 10111001) XOR 11010000
= (01010101 XOR 10111001) XOR 11010000
= 11101100 XOR 11010000
= 00111100

Which is D3, the missing value. Pretty neat, huh? This operation can be done on any number of bits, incidentally; I just used eight bits for simplicity. It's also a very simple binary calculation—which is a good thing, because it has to be done for every bit stored in a parity-enabled RAID array.

Compared to mirroring, parity (used with striping) has some advantages and disadvantages. The most obvious advantage is that parity protects data against any single drive in the array failing without requiring the 50% "waste" of mirroring; only one of the "N+1" drives contains redundancy information. (The overhead of parity is equal to (100/N)% where N is the total number of drives in the array.) Striping with parity also allows you to take advantage of the performance advantages of striping.

The chief disadvantages of striping with parity relate to complexity: all those parity bytes have to be computed—millions of them per second—and that takes computing power. This means a hardware controller that performs these calculations is required for high performance—if you do software RAID with striping and parity the system CPU will be dragged down doing all these computations. Also, while you can recover from a lost drive under parity, the missing data all has to be rebuilt, which has its own complications; recovering from a lost mirrored drive is comparatively simple.

All RAID levels from RAID-3 to RAID-7 use parity; the most popular of these today is RAID-5. Incidentally, RAID-2 uses a similar but not identical concept of parity.


Sphere: Related Content