RAID is NOT a back up solution!!
RAID is a Redundant Array of Inexpensive Disks, designed by combining multiple disk drives into an array of disks. Why? To yield performance? Yes. To act as backup? No. Yet, many resellers still mistakenly believe a RAID solution sufficiently protects their data, and neglect to backup their mission critical data remotely or to tape. The Mean Time Between Failures (MTBF) of a RAID solution is the MTBF of an individual drive, divided by the number of drives in the array. You’re thinking, “Well, wait a second. This means that the MTBF becomes lower, not higher. How does that help?” Keep reading.
Disk arrays are designed to provide fault tolerance by redundantly storing information in a variety of methods.
RAID-0
RAID-0 is a striping solution. In level 0, data is split across the drives, resulting in higher data throughput. Performance is enhanced, but the failure of any disk in the array results in data loss. For improved performance in RAID0 solutions, synchronized spindles are recommended, especially when allocating small stripes. RAID0 solutions provide NO redundancy.
I would only recommend using RAID-O only if the data there is transient, as it WILL be eventually lost. Here especially, maintain remote offsite backups because of the increased risk.
RAID-1
RAID Level 1, on the other hand, does provide redundancy by writing data to two or more drives. Reads tend to be faster, but writes slower as compared to a single drive, however if either drive fails, no data is lost. This is commonly called mirroring and only requires two drives.
If you have a failure of a single drive in a RAID1 array (either software or hardware) all you would have to is put a new drive in and tell the controller (or the software drivers) to rebuild the array. This is considered replacing a failed drive of an existing RAID array.
RAID1 is not economical past four hard drives. RAID1 OS disks are well worth their expense.
RAID-2
RAID Level 2 is intended for use with drives that don’t have built-in error detection. Unfortunately SCSI drives do support built-in error detection – not a good mix.
RAID-3
RAID Level 3 stripes data at a byte level across several drives, with parity stored on one of the drives.
RAID-4
RAID Level 4 stripes data (at a block level) across several drives, with parity stored on one drive. Parity facilitates recovery from any failed drive. Read times are the same as RAID0 and writes (even though relatively fast), require parity data to be updated each time.
RAID-5
The difference between 4 and 5 is that parity is spread across all drives in the array. Parity is no longer a bottleneck, but reads are slower than RAID-4. You win some – you lose some.
As the disk count increases in a RAID-5 array, so does the storage efficiency. This is because there is one disk’s worth of redundancy (parity) per array. For example a 3-disk RAID-5 has one disk’s worth of parity and two disk’s worth of usable space, therefore the efficiency is 67%, i.e., 67% of the total disk space is available for user data.
Efficiency = (DiskCount-1) / DiskCount
A degraded RAID-5 is an array with a failed disk. If the user tries to read a block on the failed disk the RAID software will have to access all the other disks in the array to reconstruct that missing data. However if the user tries to read a block on one of the remaining good disks then nothing special happens. The data is simply read from the disk.
RAID -10
RAID10 is a combination of mirroring and striping. Each disk block is completely duplicated on its drives mirror. If a drive in the RAID10 array dies, data is returned from its mirror drive in a single read with only minor performance reduction. What happens though when you lose the mirror drive during recovery? Ouch!
Still most hard drives failures are related to manufacturing defects, so one pro-active approach is to mirror each drive with one from a different manufacturer’s lot number. I’m still reading a thread in one forum about massive simultaneous Seagate 1.5TB drive failures. Multiple simultaneous drive failures in any RAID array is not as uncommon as you may think. Think about this. Most companies buy the hard drives they install in servers from preferred vendors, and buy in volume to get discount pricing. If there’s a manufacturing defect in that lot of hard drives, the MTBF of each of those drives is very similar. When one drive fails, does it put a heavier load on the remaining drives in the array? Hard drives have moving parts, thus will eventually wear out. RAID cards do fail as well, but that’s very rare.
RAID arrays provide a buffer to swap drives without powering down, but it’s still very necessary to maintain offsite remote backup in case your server completely crashes. Years ago, I had a client bring in a server that had lightning damage – charred black components – DOA. Minus a RAID array (in this case, the server was fried – LOL), you can still recover from backups. Downtime is the persuasive consideration, as your customers will notice, thus increasing the likelihood of churn. If your site gets hacked or you accidentally delete half your root partition, RAID will provide no protection.
The common (minimum) configurations are 2 drives in RAID-1 and 4 drives in RAID-10 as that is the most economical setup to get an array benefit. RAID-5 can be provisioned with 3 drives to give you a stripe and a parity drive.
Hardware versus Software RAID Solutions
Software RAID solutions occupy their hosts system memory and CPU resources (system dependent) – degrading overall server performance. Hardware RAID solutions allow the host server to execute user applications while the array adapter’s processor simultaneously executes the array functions.
What about fault tolerance?
Software based solutions generally require a separate boot drive, which is NOT included in the array. If the boot drive is in the array and it fails, the software array will not boot, as it must be read from the disk and executed from resident memory.
Hardware arrays are highly fault tolerant since its array logic is based in hardware, eliminating the need to boot from software.
Horror stories of multiple simultaneous drive failures in RAID arrays
I’ve seen threads pop up in forums, a little more frequently, about multiple simultaneous drive failures in RAID arrays. I recall an episode related to Seagate hard drives. Seagate’s SD1A firmware update, meant to fix problems with its Barracuda 7200.11 models, only managed to make things worse-bricking the drives of those who bothered to install it. They pulled their update pending validation. Barracuda owners who flashed their disks with the firmware found that after they rebooted, they’d receive a system disk failure error message. Backups, if they were stored on the same drive that was flashed, also became unavailable. Wait a moment! Who does backups on the same drive? I saw one analogy that went like this. It’s like installing seat belts in a car, but not allowing you to buckle them until you’ve been thrown through the windshield.
I’ve seen threads from quite a few furious OPs in various forums flaming their hosts because their mission critical data was lost forever because of multiple simultaneous drive failures in a RAID array on their server. When they picked their host, they were on the same forums asking for FREE this and FREE that – lowest cost – yet the data they intended to entrust to that host was mission critical to their business. This whole concept slays me. Your data is your business.
My recommendation
My preference is hot swappable hard drives – always have a hot spare, and if possible a second hot spare. Be sure to back up your data remotely and on tape. One hardcore statistic is that over 80% of companies that have lost their data go out of business within one year. Don’t allow yourself to be part of that statistic. Don’t rely solely on RAID array solutions to protect your data. Make them one part of a disaster recovery and business continuity plan. Redundant solutions (remote and tape back ups) should be a vital component of that plan.