Archive

Archive for the ‘Disastery Recovery’ Category

Disaster Recovery Strategies

June 7th, 2010 Steve No comments

Most business owners understand the need for IT disaster recovery, in some fashion. I still see businesses, mostly Mom and Pop shops, with a single UPS securing their entire network, and if they’re really on top of disaster recovery, they may have digital tape back-up, and store those tapes off site daily.

Let’s face it. Protecting your investment in IT is a 24/7/365 commitment. It doesn’t matter if a disaster is man-made or natural – lose your data and you’ll likely lose your company.

Protecting your data
Just this weekend, my daughter called to say her friend lost her computer and Wii to a lightning strike in central Illinois, even though they were on a surge protector. This brought back memories of my first week on the job as a bench tech in the Central West End when a client brought in his NT server. It was still smokin’ from a lightning strike, and obviously dead. I think I replaced the MOBO, hard drive, memory modules and video card, and reinstalled NT. Yipes! Just last month, my credit card terminal in the Salon went up in smoke to a power interruption. Fortunately, I had a replacement in my garage, standing by.

It doesn’t matter how small or how large your business is, you really need a definitive DR plan for your company. In the Marine Corps, we wrote SOP’s, or Standard Operation Procedures and Contingency Scenarios. In business – processes, policies and procedures are the crucial elements in an effective disaster recovery plan. 0bviously, people are an important piece of the pie, but a substantial portion of any good DR plan addresses infrastructure – or the facility that provides physical protection for the technology itself.

Infrastructure (data center) disaster recovery options

Cold

  • Least expensive
  • No equipment
  • Has electrical, environmental and telecommunications accommodations
  • Offers longer recovery time, but does give the client somewhere to go in case of a disaster

 

Warm

  • Is essentially a site that has all necessary IT equipment, ready to go live, but lacks live data
  • Requires set a brief set up period

 

Hot

  • Fully equipped site ready to go at a moment’s notice
  • Normally contains continuously replicated data
  • Most expensive, but ready when needed
  • Essential for hospitals, financial institutions and ecommerce operations

 

An Important Note
Disaster recovery encompasses more than simply restoring or replicating data – it’s having people, processes and infrastructure in place to restore your business to full functionality in case of disaster.

Always couple DR with business continuity planning – how will your business continue to operate following a disaster – and what impact will it extract

Categories: Disastery Recovery Tags:

Hacker causes widespread destruction for yet another provider

June 15th, 2009 Steve 3 comments

I recently read through a thread (about network outages) on WHT that contained 177 pages of posts, 2644 replies and attracted 152,980 views. It was a very powerful thread about the destruction and ensuing consequences of a few very popular web hosting providers. The hacker himself posted in the thread (although his post was deleted rather quickly), claiming it was the provider’s lax security in the assignment of passwords that enabled the attack.  This reinforces a question I routinely pose on this blog.

Is YOUR mission critical data backed up and protected?

A quick Google search for remote backup software returned 6,810,000 results. I’d say that’s significant.

I think everyone agrees that mission critical data needs to be backed up, but how is debatable. In the hundreds of businesses I’ve serviced over the years, most in-house IT departments used DAT tapes. Very few actually physically removed those tapes from their premises every day. Even fewer remotely backed up their data. So maybe the better question to ask would be, “To what degree is your mission critical data backed up and protected?”

As an ex-RMA Manager (for a local networking firm), I witnessed quite a few defective DAT drives doing hard time on my shelves. I’ve also seen my share of managers scrambling to recover lost data following “unscheduled events” like virus contamination or hacks. Do you think it can’t happen to you? Keeping your fingers crossed isn’t the wisest strategy to ensure your business’s continued success.

Disaster Recovery and Business Continuity Plans are Important

I always recommend incorporating comprehensive disaster recovery and business continuity plans, then periodically reviewing their effectiveness. One part of that plan should be remote offsite backups. Very often incorporating a remote backup is as easy as downloading a software client onto your network server or personal computer. Many have setup wizards to walk you through the steps of connecting to the backup server, setting up your backup sets, creating a backup schedule and setting a secret encryption key. Typically, backup sets can be configured to run in a variety of ways – backing up data files at the end of the week or your My Documents folder multiple times per day.

Remote backups traveling across the Internet need to be encrypted so that you and only you have the ability to decrypt your data. I recommend programs that use DES, Triple-DES, Blowfish or Twofish algorithms for encryption.

Measuring the success of the data transfer is important. Look for programs with email notification of successful backups or backups with warnings (with log files attached).

Once your data is remotely backed up

Ok, you’ve backed up your data, but now have a need for one file, or an entire volume of data from two months ago. Is this possible? Simply answered – Yes. There are programs that allow instant access to any version of your data files, from the initial backup to the last incremental backup and EVERY version in between.

Locking down clients

Locking down clients simply refers to implementing procedures to protect critical backup sets from being accidentally changed or deleted, while flexible enough for administers to view and change those settings that control the level of usage each client is offered.

When to backup?

Most organizations schedule backups in the evening, during lulls in their business operations. Some programs allow you to run in silent modes (in the background) without displaying any Windows or Task Bar icons – allowing you to run backups throughout the day.

What if my backup gets interrupted?

Let’s say you start a backup and you lose power. Will the remote server retain the ongoing transfer, or bite the bullet? Features like event managers allow you to resume interrupted backups.

Does remote backup software offer file filters?

Most do – file filters allow you to include or exclude files from the backup selection, mostly via file extensions.

Just the tip of the iceberg

There are so many things that can and do go wrong in business every day. One thing is for sure. If you have hardware, particularly IT hardware, it will go down sooner or later. Power supplies fail, memory modules flake out, hard drives crash, DAT drives melt down – stuff happens. Some issues can be resolved in minutes or hours, but others may take days or weeks.

Backing up your mission critical data is an integral ingredient to averting disaster, but just the tip of the iceberg, in developing and managing a comprehensive disaster recovery and business continuity plan that will ensure your business’s continued success. Step back and ask yourself, “What if?” What if a disgruntled employee, possibly a sys admin, corrupted your main servers, then disappeared? What if your building burnt to the ground? What if that DAT drive refuses to release last night’s tape – holding it hostage with a strangle hold on its recording heads? What if? What if?

Categories: Disastery Recovery, Security Tags:

Is your mission critical data backed up and protected?

May 15th, 2009 Steve No comments

Is your mission critical data backed up and protected?

A quick Google search for remote backup software returned 6,810,000 results. I’d say that’s significant.

I think everyone agrees that mission critical data needs to be backed up, but how is debatable. In the hundreds of businesses I’ve serviced over the years, most in-house IT departments used DAT tapes. Very few actually physically removed those tapes from their premises every day. Even fewer remotely backed up their data. So maybe the better question to ask would be, “To what degree is your mission critical data backed up and protected?”

As an ex-RMA Manager (for a local networking firm), I witnessed quite a few defective DAT drives doing hard time on my shelves. I’ve also seen my share of managers scrambling to recover lost data following “unscheduled events” like virus contamination or hacks. Do you think it can’t happen to you? Keeping your fingers crossed isn’t the wisest strategy to ensure your business’s continued success.

Disaster Recovery and Business Continuity Plans are Important

I always recommend incorporating comprehensive disaster recovery and business continuity plans, then periodically reviewing their effectiveness. One part of that plan should be remote offsite backups. Very often incorporating a remote backup is as easy as downloading a software client onto your network server or personal computer. Many have setup wizards to walk you through the steps of connecting to the backup server, setting up your backup sets, creating a backup schedule and setting a secret encryption key. Typically, backup sets can be configured to run in a variety of ways – backing up data files at the end of the week or your My Documents folder multiple times per day.

Remote backups traveling across the Internet need to be encrypted so that you and only you have the ability to decrypt your data. I recommend programs that use DES, Triple-DES, Blowfish or Twofish algorithms for encryption.

Measuring the success of the data transfer is important. Look for programs with email notification of successful backups or backups with warnings (with log files attached).

Once your data is remotely backed up

Ok, you’ve backed up your data, but now have a need for one file, or an entire volume of data from two months ago. Is this possible? Simply answered – Yes. There are programs that allow instant access to any version of your data files, from the initial backup to the last incremental backup and EVERY version in between.

Locking down clients

Locking down clients simply refers to implementing procedures to protect critical backup sets from being accidentally changed or deleted, while flexible enough for administers to view and change those settings that control the level of usage each client is offered.

When to backup?

Most organizations schedule backups in the evening, during lulls in their business operations. Some programs allow you to run in silent modes (in the background) without displaying any Windows or Task Bar icons – allowing you to run backups throughout the day.

What if my backup gets interrupted?

Let’s say you start a backup and you lose power. Will the remote server retain the ongoing transfer, or bite the bullet? Features like event managers allow you to resume interrupted backups.

Does remote backup software offer file filters?

Most do – file filters allow you to include or exclude files from the backup selection, mostly via file extensions.

Just the tip of the iceberg

There are so many things that can and do go wrong in business every day. One thing is for sure. If you have hardware, particularly IT hardware, it will go down sooner or later. Power supplies fail, memory modules flake out, hard drives crash, DAT drives melt down – stuff happens. Some issues can be resolved in minutes or hours, but others may take days or weeks.

Backing up your mission critical data is an integral ingredient to averting disaster, but just the tip of the iceberg, in developing and managing a comprehensive disaster recovery and business continuity plan that will ensure your business’s continued success. Step back and ask yourself, “What if?” What if a disgruntled employee, possibly a sys admin, corrupted your main servers, then disappeared? What if your building burnt to the ground? What if that DAT drive refuses to release last night’s tape – holding it hostage with a strangle hold on its recording heads? What if? What if?

Is your mission critical data secured by a RAID array on your server?

March 3rd, 2009 Steve No comments

RAID is NOT a back up solution!!

RAID is a Redundant Array of Inexpensive Disks, designed by combining multiple disk drives into an array of disks. Why? To yield performance? Yes. To act as backup? No. Yet, many resellers still mistakenly believe a RAID solution sufficiently protects their data, and neglect to backup their mission critical data remotely or to tape. The Mean Time Between Failures (MTBF) of a RAID solution is the MTBF of an individual drive, divided by the number of drives in the array. You’re thinking, “Well, wait a second. This means that the MTBF becomes lower, not higher. How does that help?” Keep reading.

Disk arrays are designed to provide fault tolerance by redundantly storing information in a variety of methods.

RAID-0

RAID-0 is a striping solution. In level 0, data is split across the drives, resulting in higher data throughput.  Performance is enhanced, but the failure of any disk in the array results in data loss.  For improved performance in RAID0 solutions, synchronized spindles are recommended, especially when allocating small stripes. RAID0 solutions provide NO redundancy.

I would only recommend using RAID-O only if the data there is transient, as it WILL be eventually lost. Here especially, maintain remote offsite backups because of the increased risk.

RAID-1

RAID Level 1, on the other hand, does provide redundancy by writing data to two or more drives.  Reads tend to be faster, but writes slower as compared to a single drive, however if either drive fails, no data is lost. This is commonly called mirroring and only requires two drives.

If you have a failure of a single drive in a RAID1 array (either software or hardware) all you would have to is put a new drive in and tell the controller (or the software drivers) to rebuild the array. This is considered replacing a failed drive of an existing RAID array.

RAID1 is not economical past four hard drives. RAID1 OS disks are well worth their expense.

RAID-2

RAID Level 2 is intended for use with drives that don’t have built-in error detection. Unfortunately SCSI drives do support built-in error detection – not a good mix.

RAID-3

RAID Level 3 stripes data at a byte level across several drives, with parity stored on one of the drives.

RAID-4

RAID Level 4 stripes data (at a block level) across several drives, with parity stored on one drive. Parity facilitates recovery from any failed drive. Read times are the same as RAID0 and writes (even though relatively fast), require parity data to be updated each time.

RAID-5

The difference between 4 and 5 is that parity is spread across all drives in the array. Parity is no longer a bottleneck, but reads are slower than RAID-4.  You win some – you lose some.

As the disk count increases in a RAID-5 array, so does the storage efficiency. This is because there is one disk’s worth of redundancy (parity) per array. For example a 3-disk RAID-5 has one disk’s worth of parity and two disk’s worth of usable space, therefore the efficiency is 67%, i.e., 67% of the total disk space is available for user data.

Efficiency = (DiskCount-1) / DiskCount

A degraded RAID-5 is an array with a failed disk. If the user tries to read a block on the failed disk the RAID software will have to access all the other disks in the array to reconstruct that missing data. However if the user tries to read a block on one of the remaining good disks then nothing special happens. The data is simply read from the disk.

RAID -10

RAID10 is a combination of mirroring and striping. Each disk block is completely duplicated on its drives mirror.  If a drive in the RAID10 array dies, data is returned from its mirror drive in a single read with only minor performance reduction. What happens though when you lose the mirror drive during recovery? Ouch!

Still most hard drives failures are related to manufacturing defects, so one pro-active approach is to mirror each drive with one from a different manufacturer’s lot number.  I’m still reading a thread in one forum about massive simultaneous Seagate 1.5TB drive failures. Multiple simultaneous drive failures in any RAID array is not as uncommon as you may think. Think about this. Most companies buy the hard drives they install in servers from preferred vendors, and buy in volume to get discount pricing. If there’s a manufacturing defect in that lot of hard drives, the MTBF of each of those drives is very similar. When one drive fails, does it put a heavier load on the remaining drives in the array? Hard drives have moving parts, thus will eventually wear out. RAID cards do fail as well, but that’s very rare.

RAID arrays provide a buffer to swap drives without powering down, but it’s still very necessary to maintain offsite remote backup in case your server completely crashes. Years ago, I had a client bring in a server that had lightning damage – charred black components – DOA.  Minus a RAID array (in this case, the server was fried – LOL), you can still recover from backups. Downtime is the persuasive consideration, as your customers will notice, thus increasing the likelihood of churn. If your site gets hacked or you accidentally delete half your root partition, RAID will provide no protection.

The common (minimum) configurations are 2 drives in RAID-1 and 4 drives in RAID-10 as that is the most economical setup to get an array benefit. RAID-5 can be provisioned with 3 drives to give you a stripe and a parity drive.

Hardware versus Software RAID Solutions

Software RAID solutions occupy their hosts system memory and CPU resources (system dependent) – degrading overall server performance. Hardware RAID solutions allow the host server to execute user applications while the array adapter’s processor simultaneously executes the array functions.

What about fault tolerance?

Software based solutions generally require a separate boot drive, which is NOT included in the array. If the boot drive is in the array and it fails, the software array will not boot, as it must be read from the disk and executed from resident memory.

Hardware arrays are highly fault tolerant since its array logic is based in hardware, eliminating the need to boot from software.

Horror stories of multiple simultaneous drive failures in RAID arrays

I’ve seen threads pop up in forums, a little more frequently, about multiple simultaneous drive failures in RAID arrays. I recall an episode related to Seagate hard drives. Seagate’s SD1A firmware update, meant to fix problems with its Barracuda 7200.11 models, only managed to make things worse-bricking the drives of those who bothered to install it. They pulled their update pending validation. Barracuda owners who flashed their disks with the firmware found that after they rebooted, they’d receive a system disk failure error message. Backups, if they were stored on the same drive that was flashed, also became unavailable. Wait a moment! Who does backups on the same drive? I saw one analogy that went like this. It’s like installing seat belts in a car, but not allowing you to buckle them until you’ve been thrown through the windshield.

I’ve seen threads from quite a few furious OPs in various forums flaming their hosts because their mission critical data was lost forever because of multiple simultaneous drive failures in a RAID array on their server. When they picked their host, they were on the same forums asking for FREE this and FREE that – lowest cost – yet the data they intended to entrust to that host was mission critical to their business. This whole concept slays me. Your data is your business.

My recommendation

My preference is hot swappable hard drives – always have a hot spare, and if possible a second hot spare. Be sure to back up your data remotely and on tape. One hardcore statistic is that over 80% of companies that have lost their data go out of business within one year. Don’t allow yourself to be part of that statistic. Don’t rely solely on RAID array solutions to protect your data. Make them one part of a disaster recovery and business continuity plan. Redundant solutions (remote and tape back ups) should be a vital component of that plan.

So are you prepared to lose your data?

February 19th, 2009 Steve No comments

Of course not, but I read threads every day from businesses (on various Internet forums) that have lost their data because their website violated the Terms of Services (TOS) of their host. Often their sites are taken down without notice. Some scenarios were because the client didn’t keep their security patches up-to-date, then were hacked. Others were because they were using a shared IP and that IP was blacklisted for spam violations – maybe not that specific IP – just in that range.

So are you prepared to lose your data?

Seems like a ridiculous question, but many aren’t prepared because they have no plan beyond simply trusting that their web host will provide back ups if necessary. I write about disaster recovery moreso than any other topic because of the severity related to losing mission critical data. More often than not, if you lose your data, you lose your business – or it’s severely impacted.

When selecting a web host, read their Terms of Service carefully – they’re there to protect the host and you, spelling out legal expectations. Regardless, use due diligence to formulate a disaster recovery and business continuity plan that includes routinely scheduled remote offsite backup. Prepare for a worst case scenario.

I relate this to car or health insurance. I hate to pay that bill each month, but I know it’s for my own protection. If you’re the owner or president of your company, you owe it to your clients and employees to secure your business. Stuff happens. It can and does happen to businesses just like yours everyday.

Multiple hard drives in a RAID array fail simultaneously (defective lot). You thought RAID was your backup solution, but turns out – it wasn’t.

Fire destroys your servers and DAT tape drive. You forgot to take that tape offsite last night.

Web host locks access to your server because your bookkeeper didn’t pay the bill. I see lots of posts related to this where the recommendation generally is – be nice to the host and maybe they’ll let you have access to your data.

Bottom line

Set aside some time to review and update your disaster recovery and business continuity plan if you have one. If you don’t have one – keep your fingers crossed and hope that Murphys Law passes you by and hits that business down the street first.

 

Categories: Disastery Recovery Tags: