Disasters Caused by Hardware Far Outpace Those Caused by Mother Nature

Natural disasters have certainly made headlines over the past years. The fallout is always extreme: Among the many lives lost and homes crushed, businesses, too, lay in ruins. After hurricanes Katrina and Rita, for example, 60% of Mississippi’s small businesses closed, according to the director of the Mississippi Small Business Development Center. But disasters for SMBs extend well beyond freak storms, hurricanes, earthquakes and tornadoes.

As this Quorum, Inc.‘s DR Report, Q1 13 will show, disasters caused by hardware failures, for example, far outpace those caused by Mother Nature, although they seldom make the evening news.

Regardless of a disaster’s origin, one thing is certain: The resulting downtime can bring a small to mid-sized business to the brink. In fact, a report from HP and SCORE estimates that 25% of small businesses do not reopen following a major disaster. And an oft-quoted Aberdeen Group report estimates that just an hour of downtime costs a mid-sized business $74,000; IDC puts it at an almost equally frightening $70,000 an hour.

Unfortunately, system downtime typically lasts for much longer than an hour. In fact, IT managers surveyed by Harris Interactive estimate that it typically takes 30 hours for recovery. (This may, by the way, come as a surprise to executives, who put the estimate closer to a mere 10 hours.)

While we cannot hope to stop disasters from occurring, we can stem their damage. DR technology that enables instant recovery of data, applications and systems – along with frequent and regular system testing – is the only way to safeguard a company’s revenue, customers and reputation.

In an effort to put into sharper focus the importance of DR technology as part of an overall business continuity plan, this report presents findings of the frequency of various disasters, from hardware failures to more infrequent – although obviously still devastating – natural disasters. Real-world examples culled from customer feedback from Quorum’s IT support center bring home the point.

By using the report as a predictive tool, SMBs will have a heightened awareness of the spontaneous nature of disasters. With this, they can then take measures, best met using a two-pronged approach: installing a DR system that will ensure their business is operational again in minutes, rather than days; and performing regular tests of the system for added peace of mind.

Findings
The following infographic depicts the four most common causes of system downtime.

quorum_disaster_recovery_report_540

Hardware Failure
At 55%, hardware failure is the ≠1 cause of downtime for SMBs. With several levels of redundancy of various components – such as multiple power supplies, network controllers and hard drives – it may seem like your bases are covered. Still, like any other disaster, no one can predict when the air conditioning will fail on a hot day, what unforeseen event will trigger a widespread power outage, or which cords the resident rodent will chew through.

SAN failures are among the hardware-failure disasters many SMBs experience. It’s common for these businesses to have a large SAN, and all storage servers virtualized onto that SAN. Unfortunately, this means that when the SAN dies, a company’s entire environment dies with it.

Randy Mateo, IT manager at California Bankers Association (CBA), is all too familiar with this type of hardware failure. In 2010, he noticed a multiple hard drive failure on CBA’s SAN. In quick succession, the company’s Citrix XEN servers began failing. But more bad news awaited: He found that the system didn’t failover. An investigation determined that with the hard drive failure, the primary SAN server corrupted the company’s virtual servers, and all the data on them. It then replicated onto the secondary SAN.

"Needless to say, it was a total disaster," said Mateo. "The VMs couldn’t be revived, so everything had to be rebuilt."

Thankfully, Mateo had manually backed up most of the virtual servers using a utility on the XEN servers. The Exchange server, however, was a different story.

"It was just too big," he said. "We would have had to bring down the server for at least an entire day to back it up. So since it wasn’t backed up, we had to rebuild it. We were down for three-and-a-half days doing that."

After this nightmare, Mateo deployed a hybrid cloud solution to ensure it never happened again.

Human Error
Of course, we can’t attribute all disasters to technical difficulties. According to our findings, 22% of disasters are caused by human error. This could include accidentally wiping out a file system on a server.

While something like this may be considered a ‘rookie move,’ it’s not necessarily relegated exclusively to rookies. An executive of a private healthcare center in Florida, for example, has (more than once) deleted her entire mailbox. Thanks to her choice to deploy a DR system that takes incremental snapshots of the center’s servers, this executive was able to recover valuable correspondence and avoid certain personal disaster.

Software Failure
Software failure ranks third in overall disasters at 18%, and it’s no wonder, given the number of patches routinely sent out (so numerous, in fact, that Microsoft dedicated a day of the week to sending them). The issue lies in the lack of attention to testing patches before they are sent out, resulting in corruption of applications that can bring down entire systems or make them otherwise unavailable.

OSs that have been limping along for some time and finally die also contribute greatly to software failure. And we cannot overlook the impact viruses and malware have, of course. In fact, in the first half of 2012, 36% of targeted attacks were unleashed on SMBs, which was double the number seen in a six-month period in 2011, according to the Symantec Internet Security Threat Report. These attacks can infect entire networks, effectively bringing a company to its knees.

Indeed, software failures come in all shapes and sizes. Brent Schlueter, IT director at Sente Mortgage in Austin, TX, recently recounted a software failure episode that caused significant downtime. During a routine software upgrade to the company’s loan origination software, Schlueter determined it was necessary to revert to a previous version. In doing this, he discovered that the system’s backed-up SQL data was corrupt and unusable. Ultimately, this was all attributed to an underlying file structure problem, but the result was at least four hours of system downtime.

"We ended up sending people home," said Schlueter – and later invested in a hybrid cloud DR system.

Natural Disasters
Since tornadoes, earthquakes and the like are often the first that come to mind when we consider the word ‘disaster,’ it’s ironic that natural disasters comprise a mere 5%. Still, their effects are obviously without parallel. For example, a report from HP and SCORE entitled Impact on U.S. Small Business of Natural and Man-Made Disasters indicates that 70% of small firms that experience a major data loss go out of business within a year.

With this in mind, talent recruiting firm 24 Seven, Inc. prepared early. Along with deploying a solid DR system, IT professionals at the company also tested the system regularly to ensure it would perform in a real disaster. So when Superstorm Sandy hit on October 29, 2012, the company was ready. Even though its New York headquarters office was forced to close during the storm, operations critical to the health of the company went along uninterrupted, thanks to foresight and due diligence.

The DR Solution Landscape
The DR market is laden with various solutions. The following describes three of the most common types.

Tape and Disk Backup
For many years, tape and disk backup dominated the DR industry. These solutions are still widely used, but their foothold as the preferred method is weakening as their drawbacks become ever clearer; specifically, their expense and complexity, and their inability to recover systems and applications in real time. Given the prohibitive cost of downtime for SMBs, hours- or days-long lags in restoration are devastating.

Furthermore, regular testing, so imperative to ensuring business continuity during a real disaster, is also particularly difficult and time-consuming with these solutions, and many products do not even offer testing as a functionality.

Cloud Backup
Cloud backup has more recently emerged as an alternative to tape and disk backup, leveraging virtualization and the cloud to make data backup more convenient.

Still, for SMBs, cloud backup alone can sometimes make recovery times worse because of the limited Internet bandwidth available to them. And if a large amount of data must be recovered, it still involves shipping physical media, which defeats the main purpose of the move from offsite tape to cloud.

Hybrid Cloud
Hybrid cloud solutions present a reliable alternative to both tape and disk backup and cloud backup alone, as they deliver the advantages of virtualized data center replication without the high cost and complexity. These solutions function by maintaining up-to-date, ready-to-run VM clones of a company’s critical systems that can run locally or in the cloud. And because they transparently take over for failed servers within minutes, recovery is instant and business-as-usual resumes without impact. In addition, testing in environments that have deployed a hybrid cloud solution is made easy with automatic and on-demand capabilities.

In Conclusion
Results from the Quorum DR Report underscore the notion that a disaster has many definitions, and can occur at any time. Therefore, preparation is key. So, too, is the solution choice, as DR is only valuable if it helps SMBs avoid any length of downtime.

Regular testing plays a key role as well. IT professionals often avoid regular testing due to the complexities associated with the process, and the inordinate amount of time it takes. Therefore, it is imperative to deploy a system that enables automatic testing to shore up confidence that the solution will work as expected in an actual disaster.

Quorum DR Report Methodology
Quorum derived statistics from incoming calls in its IT support center, representing a cross-section of hundreds of customers. They are SMBs that span a variety of industries in the United States, EMEA, and AsiaPac.