Sunday, July 13, 2008

A LESSER-KNOWN FACT ABOUT RAID-4

Given a tight budget and the requirement that we maximize usable capacity, I would choose RAID-4 over RAID-1 and RAID-5.

The Internet abounds in numerous articles about the different RAID levels. Most articles recommend RAID-5 although a fair number suggest RAID-1 or RAID-10.

There’s no point in duplicating information that is readily found in these articles. I did notice, however, that I have yet to encounter an article that mentions a little or lesser-known fact about RAID-4.

RAID-4’s lesser-known fact is the capability it offers of being able to add additional drives to the array without disrupting the array’s operation. This is a huge advantage over the more popular and more recommended RAID-5. Adding capacity to a RAID-5 array is a lengthy, tedious, risky, and disruptive process.
  • First, the contents of the entire array must be copied to some external storage space.
  • Second, the additional drive(s) is added.
  • Third, the array must be configured again to operate as a RAID-5 array.
  • Finally, the contents must be copied back into the new RAID-5 array.
Contrast this with RAID-4. Depending upon the controller hardware, you may be able to just physically add the extra drive to the array without even powering it down. If the controller doesn’t allow this, the array must first be powered down, the extra drive added, and then the array powered up again. That’s it!

RAID-4 requires at least three drives (or disks; the terms are used interchangeably here). Two disks are usable and the third is used to store parity data. The two disks are data disks and the third is the parity disk.

Parity data is information that can be used to re-create the actual data from which the parity data came from. Why is this useful? If either data disk fails, the surviving data disk can recreate the contents of its failed partner with the assistance of the parity disk. Refer to the title diagram. If the parity disk fails, it should be replaced as soon as possible since the RAID array is unprotected without that parity disk. All this time, the contents of your data disks remain uncorrupted.

CAPACITY

As mentioned earlier, in a 3-drive RAID-4 array, one drive is used to store the parity information of the two data disks. The usable capacity is 67%. In a 4-drive RAID-4, the usable capacity is 75% of the four disks. In a 5-drive, its 80%, and so forth.

MY PERSONAL EXPERIENCE

This dates back to 2000 before RAID-6 and all its proprietary variants (like NetApp's so-called RAID-DP) were invented. Like everyone else, RAID-5 was the default configuration. There was no question about it. That is, until our largest client, a hospital no less (!), experienced a RAID-5 array failure. Theory is one thing but reality is another. We could not restore the client's data. We tried e-v-e-r-y-t-h-i-n-g. We replaced the controller. Twice and with known good ones. We reinstalled the array software. We physically transferred the array to another server. Like I said, we tried e-v-e-r-y-t-h-i-n-g. But we weren't able to restore the data of the drive that failed. After that sorry experience, we discussed it with peers and the IT community at large. And we learned that our experience was not unique. As we learned these, we also found out that many VARs (Value Added Resellers; the archaic term for IT consulting firms) had more success with RAID-1 and -4 arrays. RAID-4 provides a similar level of protection as RAID-5 albeit without striping. And that's why I devoted this time to extol the virtue of RAID-4.

PARITY

Reproduced below is an excellent explanation of parity data from StorageReview.com.
You may have heard the term "parity" before, used in the context of system memory error detection; in fact, the parity used in RAID is very similar in concept to parity RAM. The principle behind parity is simple: take "N" pieces of data, and from them, compute an extra piece of data. Take the "N+1" pieces of data and store them on "N+1" drives. If you lose any one of the "N+1" pieces of data, you can recreate it from the "N" that remain, regardless of which piece is lost. Parity protection is used with striping, and the "N" pieces of data are typically the blocks or bytes distributed across the drives in the array. The parity information can either be stored on a separate, dedicated drive, or be mixed with the data across all the drives in the array.

The parity calculation is typically performed using a logical operation called "exclusive OR" or "XOR". As you may know, the "OR" logical operator is "true" (1) if either of its operands is true, and false (0) if neither is true. The exclusive OR operator is "true" if and only if one of its operands is true; it differs from "OR" in that if both operands are true, "XOR" is false. This truth table for the two operators will illustrate:


Uh huh. So what, right? Well, the interesting thing about "XOR" is that it is a logical operation that if performed twice in a row, "undoes itself." If you calculate "A XOR B" and then take that result and do another "XOR B" on it, you get back A, the value you started with. That is to say, "A XOR B XOR B = A." This property is exploited for parity calculation under RAID. If we have four data elements, D1, D2, D3 and D4, we can calculate the parity data, "DP" as "D1 XOR D2 XOR D3 XOR D4." Then, if we know any four of D1, D2, D3, D4 and DP, we can XOR those four together and it will yield the missing element.

Let's take an example to show how this works; you can do this yourself easily on a sheet of paper. Suppose we have the following four bytes of data: D1=10100101, D2=11110000, D3=00111100, and D4=10111001. We can "XOR" them together as follows, one step at a time:

D1 XOR D2 XOR D3 XOR D4
= ( (D1 XOR D2) XOR D3) XOR D4
= ( (10100101 XOR 11110000) XOR 00111100) XOR 10111001
= (01010101.XOR 00111100) XOR 10111001
= 01101001 XOR 10111001
= 11010000

So "11010000" becomes the parity byte, DP. Now let's say we store these five values on five hard disks, and hard disk #3, containing value "00111100", goes el-muncho. We can retrieve the missing byte simply by XOR-ing together the other three original data pieces, and the parity byte we calculated earlier, as so:

D1 XOR D2 XOR D4 XOR DP
= ( (D1 XOR D2) XOR D4) XOR DP
= ( (10100101 XOR 11110000) XOR 10111001) XOR 11010000
= (01010101 XOR 10111001) XOR 11010000
= 11101100 XOR 11010000
= 00111100

Which is D3, the missing value. Pretty neat, huh? This operation can be done on any number of bits, incidentally; I just used eight bits for simplicity. It's also a very simple binary calculation—which is a good thing, because it has to be done for every bit stored in a parity-enabled RAID array.

Compared to mirroring, parity (used with striping) has some advantages and disadvantages. The most obvious advantage is that parity protects data against any single drive in the array failing without requiring the 50% "waste" of mirroring; only one of the "N+1" drives contains redundancy information. (The overhead of parity is equal to (100/N)% where N is the total number of drives in the array.) Striping with parity also allows you to take advantage of the performance advantages of striping.

The chief disadvantages of striping with parity relate to complexity: all those parity bytes have to be computed—millions of them per second—and that takes computing power. This means a hardware controller that performs these calculations is required for high performance—if you do software RAID with striping and parity the system CPU will be dragged down doing all these computations. Also, while you can recover from a lost drive under parity, the missing data all has to be rebuilt, which has its own complications; recovering from a lost mirrored drive is comparatively simple.

All RAID levels from RAID-3 to RAID-7 use parity; the most popular of these today is RAID-5. Incidentally, RAID-2 uses a similar but not identical concept of parity.


Sphere: Related Content

Wednesday, June 11, 2008

HOW TO CHOOSE YOUR IT PRIORITIES WISELY

I ran recently across photos of co-workers from my early days in IT. We got in touch and over a cold one we reminisced about how the IT function has changed. The IT discipline has matured. Fifteen years ago IT’s primary function was to keep systems up and running. Today, that’s taken for granted. Today, IT performance is judged by the results it contributes towards the parent organization's goals. The measure of an effective IT manager is how s/he uses resources to satisfy the organization’s goals.

FIRST, PUT THE HOUSE IN ORDER

Among the first things an incoming IT manager must know are the expectations by the parent business of IT. Armed with that knowledge, he must assess the IT organization's capabilities and identify any gaps between what is expected and what can be delivered. He must perform a gap analysis in short.

Next, the analysis results should be prioritized. The priorities came from the business managers and are the same expectations he learned earlier. The outcome of this step is a prioritized list of the processes and activities that need to be improved.

KEEP THE CORPORATE CULTURE IN MIND

Any good marketing management book will explain at least three ways that companies differentiate themselves in the marketplace. They can position themselves as the best in:
  • Customer responsiveness
They excel at staying close to their customers. They can anticipate their customer needs more quickly.
  • Product or service innovation
They provide the latest and greatest. Their products or service speak for themselves.
  • Operational efficiency
They have the most efficient operations. Their efficiency means that they can sell you their goods and services at the least possible cost.
A former employer, Shared Medical Systems (SMS), is a good example of a company whose corporate focus was on exceptional customer responsiveness, the first item above. This was SMS’s competitive strength. That strength was emphasized at the cost of product innovation though. SMS has never been known for devising the most efficient solutions. Their solutions tended to be evolutionary improvements of existing products and, sometimes, that wasn’t enough. In my team’s area of responsibility-the Great Lakes region and Central Canada-I know we lost several accounts because our competitors were able to introduce superior solutions. This was SMS corporate culture though—conservative but steady and sure. This mindset had served them well. Founded in 1969 by three IBM salesmen, SMS provided time-sharing services to hospitals. Three decades later, SMS had grown to become the largest IT service provider for the healthcare industry. Siemens, the German conglomerate, acquired SMS in 2000. It turned SMS into the core component of its medical division, and renamed it Siemens Medical.

It was a wrenching change. Our regional VP—the highest-ranking officer at our regional office—had his title changed to… Senior Manager.

At any rate, the digression was meant to illustrate the importance of knowing the corporate culture of the parent organization in planning the IT functions.

THE RELATIONSHIP BETWEEN BUSINESS PROCESSES AND IT SYSTEMS

Business processes are supported by IT systems. Let’s understand that first. Consider this example of a business process:

In any modern supermarket, when it’s time to check out, you select a line and then place the items you’re buying on the conveyor belt. Once the items reach the cashier, she scans each piece and the item's description and price appears on the display terminal. That’s a business process. Behind it is information technology and clearly, the process is enabled or can only work through IT. The scanned bar code is converted into the data that is the item’s description and price. This data is pushed back to the cashier’s terminal which then displays it for both customer and cashier to see and verify.

¿SIX SIGMA OR NARROW CRITICAL GAPS?

A fast growing company doesn’t have much time to focus on improving its business processes. That task of improving its processes can’t be ignored in the long-term however. In a normal process transformation, the focus would be on the process side while IT adapts to the changes. In hospitals-this is the industry I’m most familiar with-processes have become more tightly integrated with IT. Consequently, process improvements require simultaneous improvements in IT systems. This holds true twice over for hospitals. The aging baby boomers have put healthcare on the fast track and hospitals are incorporating technology into its operations to just keep up.

In my opinion, this was the reality that our corporate did not or chose not to see. Our customers-at least the dynamic ones-were fast-tracking changes. For example, Six Sigma was in vogue with management during those days but Six Sigma is a change methodology that is very detail-oriented. Changes occur incrementally and, therefore, slowly. That’s fine if you’re fine-tuning a process. Six Sigma, however, is impractical to the point of actually being detrimental if you're making major changes to a process.

Our more dynamic customers adopted a different approach that I thought was more effective. They would work on the top three process capability gaps. Hospital administrators define these gaps differently. A growing hospital will choose gaps that hamper the business from scaling up (i.e., expanding). Narrowing those gaps deliver more immediate and observable results.

A typical hospital has a very uneven workflow pattern. A bottleneck occurs almost daily in the Emergency Room. Apart from being dangerous to the patients, it exposes the hospital to greater legal risk and it threatens the critical revenue-generating process. Clearly, this is a problem area.

A results-driven organization demands tangible contributions from IT (or any other business function) in short-term cycles. For example, if this year’s goal is to streamline the uneven workflow, hospital administrators may want to see tangible improvements every 90 days.

This kind of pressure makes communication and the task of change management very important. The IT manager must be able to explain his roadmap as well as progress and challenges regularly.

Generally speaking, any system that closes the care-giving loop between the patient and doctor or coordinates the different clinical departments better is desirable. And if the system gives the hospital a competitive edge then it becomes all that much better and more important to implement.


Sphere: Related Content