Data Disaster Recovery with Real life example

As Wikipedia suggests

Disaster recovery involves a set of policies, tools, and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.

This is just a getting started post to know basically what is Disater Recovery and why it is important. This post doesn't share strategy used. Its targeted for begineers.

Every Cloud service or SaaS based business should have a Disaster Recovery setup, so even when the main site goes down due to various reasons, the alternative backup site can help in continuing the business and run the website without much issue.

Real Life Example

The following real-life example is based on my experience, each and everyone will be having a different experience.

My father is working as a wedding photographer, whose main job is to capture the beautiful moments of a lifetime. He will be capturing them using his camera (both photos and videos) and has to deliver the pics with added effects later as album and short video format. In India, the wedding function spans two full days. At the end of the second day, he and his team reaches our studio with all the hardware equipment they used for the function.

Once they reach the studio the first thing they do is to copy the captured photos and videos from the camera to our physical machine at our studio. Once the first file transfer is done from the camera to our studio computer, they will be again copied to two different external hard disks. One harddisk is kept safely at our house which is located 1KM from our studio and another harddisk is transferred to my uncle's studio which is around 3KM from our studio. After all this transfer is done, the original file in our cameras are cleaned up so it can be used for next event.

Goal of data Transfer

The main goal of copying these many times is to make sure the files are safely present until the project is delivered. These moments are so precious, and recreating them is nearly impossible.

There are several reasons why these many times the files are copied.

The hard disk in the machine where we are going to work can crash at any time.
The software which we use can corrupt the file during import.
Due to a power surge, the machine can stop working due to a malfunction in the mother board.

Due to our personal reason, these moments should not be missed; so we personally take care of the captured photos and vidoes.

Data Disaster Recovery by SaaS companies

Similarly think of the SaaS companies which are hosting your code or survey responses you have collected. Their main goal is to make sure the data provided by you is save and is not earsed or lost due to any glitch. These code or survey responses are almost equivalent to the life time precious data, coding them or collecting it is not an easy task.

Why SaaS Companies need to plan Disaster Recovery

Like in the above example, there are lot of reasons in SaaS vendors too

All the above mentioned reasons.
Software upgrade / Hardware maintenance in data center.
Sudden problem in ISP connection at data center.
Natural calamities like heavy rain, earthquake due to which whole Data center is gone and so on.

Consider your SaaS vendor is using Amazon Web Services. Their main service will be running at AWS Asia Pacific (Mumbai) region, and their backup may be Asia Pacific (Singapore). So in case iif any of the above mentioned issues happens then all the traffic which comes to Mumbai can not be redirected to Singapore data center, so within few minutes we can recovery the system from down time (its good to have 99.99% availability i.e., less than 1 hour down time per year).

Generally this is achieved using Master <-> Master replication. So when we are writing at one Master (Say M1) at one data center, the another master (Say M2) will be listening and taking the backup. When the DC goes down, the other Master(M2) at other datacenter takes up the writer role and start accepting the data coming from clients. So now the M2 accepts writing from end users and Master at other DC (M1) will be listening to the writes and takes the backup.

Share yours thoughts on how you have plan for disaster recovery.

Vishy tech notes