JeffTP

Squandering a perfectly good opportunity to shut up and listen.

My Introduction to Disaster Recovery

Published: 2023-10-22 • Reading time: 6 min

#disaster recovery #history #story time

Have I ever told you the story about my introduction to disaster recovery?

Although IT platforms are notoriously complex and fragile, business leaders expect those IT platforms to run continuously without error or interruption, at all times, forever. There are many techniques for building an IT platform to be highly available, but I recommend against confusing high availability with disaster recovery. You will still need a plan to recover everything should disaster strike.

The disaster at the forefront of every business leader's mind today is probably ransomware. Ransomware is where a malicious adversary finds a way into your IT platform and then encrypts all the important data within that IT platform with a secret key. Once your data is encrypted, the adversary then asks you to provide them with a payment in some form of crypto-currency in order to provide you with the secret key. Otherwise they will delete the secret key and your data will become lost forever.

Ransomware is a relatively new form of disaster. Traditionally, natural disasters such as fires, hurricanes, and floods were the primary disaster recovery concern. As we have become completely reliant on our IT platforms to conduct business, the threat of a failure in our IT platforms has raised disaster recovery in priority for IT managers. For most organizations, a disaster recovery and business continuity plan is a requirement. Today, without a recovery plan, any disaster could permanently end your business.

When I was getting my start in IT in the early 1990s, it was common for organizations to maintain a means to fall back on good old paper and pen when the IT systems inevitably failed. Over time, the tolerance for such outages and failures vanished. The ability to fail back to paper and pen processes were forgotten. For a time, organizations would be paralyzed during an IT outage--unable to conduct any business due to complete dependence upon the IT systems.

The history of disaster recovery stretches back the 1970s, but only the most computer-reliant, and well-funded organizations could afford to keep copies of all their most critical business data in another data center. Through the 1980s and 1990s, disaster recovery remained firmly out of reach for most organizations. At best, you might have some tape backups stored off-site and a vague plan on how to recover your data in the event of a failure. But there were few organizations who could tolerate a complete loss of their primary data centers prior to the 21st century.

The terrorist attacks on the World Trade Center in New York City on September 11, 2001 was a turning point in the IT world regarding disaster recovery. That terrible morning whole businesses, along with the lives of 2,977 people, were wiped out as the World Trade Center buildings collapsed.

Perhaps it was just cynical marketing teams spreading fear, uncertainty, and doubt, but it was after 9/11 when disaster recovery and business continuity became the top topic for CTOs everywhere. All businesses began to evaluate in earnest how to survive a complete loss of their data centers and the increasingly important data stored within.

It was some time in 2002 that I first sat in a conference room discussing business continuity planning and disaster recovery. The first question I was asked was, "What would happen if the data center blew up?" At the time, my answer was, "We'd have to do everything on paper and pen until we could build a new data center and start over."

Believe it or not, in 2002 we still had processes to do most business operations without computers. Most of the staff I worked with at the time still viewed the computers as a necessary annoyance. Many still regarded five-copy carbonless forms in gold, pink, canary, green, and white the greatest technology invented for getting work done.

Over the course of the next four years, we built a disaster recovery data center. We were only able to obtain 20% of the budget we requested, so unsurprisingly only 20% of the disaster recovery solution actually worked. Nonetheless, in 2006 we had plans to recover email, payroll, and CRM systems along with running a skeleton web site.

The first test of our disaster recovery plan came in 2008, in the form of Hurricane Ike. So far, this is the only hurricane I've "hunkered down" within my house. It was a terrifying experience, with 100 MPH (160 KPH) winds pounding against the creaking walls of my house for eight very dark hours in the middle of the night.

I went out in the middle of the storm to clear debris from the water drainage ditches that ran through the neighborhood. It is only through dumb luck that I wasn't seriously injured. The winds knocked me down multiple times. There was lightning strikes happening within the immediate area. Objects were being carried by the wind and shot through the air. And there I was, carrying a steel shovel (lightning rod anyone?), standing in knee deep water, clearing out the drains.

My mission was a success. I'm certain my neighbor's house would have flooded and possibly my own home had I not gone on this insane mission.

In the aftermath of the storm, most of Houston had lost utility power. Cell service was limited to text messages. Those people who still had land line telephones said they were dead. Prior to the storm there was a run on gasoline and diesel fuel so there was none to be found within 100 miles of Houston. Roads were blocked due to flood waters or downed trees.

Does any disaster recovery plan matter if all your people needed to execute the plan are without power, without internet, barely able to communicate with one another and potentially still in harms way due to flood waters and downed trees?

Fortunately, we were able to get additional fuel for the generator which kept the data center running until utility power was restored. It took 2 weeks for everyone critical to the disaster recovery plan to report in. Had we not been able to keep the generators running, we would have missed a payroll cycle for 35,000 people who were all desperately in need of that next paycheck so they could start putting their lives back together after the storm.

That was my introduction to disaster recovery. I lived through the disaster, discovered for myself that no plan survives first contact with the enemy, and learned the most important element of a disaster recovery plan is not in the technology but in the people who will execute that disaster recovery plan.