One topic that has been coming up a lot recently, when comparing cloud based software systems, is the location of disaster recovery (DR) or secondary data centres.
Whilst the first step is helping the business to understand the need for a secondary hosting site, be it:
- a DR site activated in a major failure / disaster or
- secondary sites servicing active customers in parallel to the main site
the next step is ensuring the business isn’t then bamboozled with false promises from vendors.
Moving from a single point of failure architecture to one that can easily scale out and handle failure is no small task.
You would be surprised how many so called “cloud” solutions actually are just old single point of failure architectures dropped into the “cloud” and then “sold” as a cloud service. Vendor sales teams skip over this detail with you and leave vital details out of contracts…
Anymore more on that later….
For a secondary hosting site to be effective it must provide two things
- An actual DECREASE in the risk carried by the architecture when considering typical disaster situations
- Not compromise the ability of the architecture to handle scale for more customer demand.
Unfortunately these objectives can conflict, especially in architectures that use technologies dependant on latency, e.g. Oracle RAC or use of Fibre Channels in certain ways do not work well when there is great distances between hosting sites.
So, assuming your worried only about disaster recovery what is the ideal distance between DR sites?
Sorry not so easy. No such broadly agreed standard is available. What I can say is this:
The two sites should not carry the same risk profile. This is sometimes referred to as geographical diversity.
Say I have one data centre site located in the Sydney CBD and the DR is close by, lets say North Sydney or Macquaire Park.
Both these locations are close to each about 10-15km proximity and they are also located very close to the ocean.
Risk profile of Sydney site would include risks on:
- Power outage (with some mitigation’s typically used)
- General Fire risk
- Extreme weather events including possibly cyclone / tsunami, earthquake
- Large scale fire / bush fire
- Physical attack / civil commotion
The second site has carries the same risks and in fact the power outage, weather and bush fire risks could hit both sites simultaneously rendering both sites useless at the same time.
The CBD one is probably more prone to events such as civil commotion (a physical attack) but that depends on many other factors (e.g. physical protection of the data centre)
In summary this is not good DR planning as both sites run very similar and shared risks. This is only a little better than running a single data centre and not much different to a single data centre with more thought put into localised risks such as fire or power failure or dependant infrastructure (including internet link) failures. Most well designed data centres already have redundancies at the hardware level and some such as Azure can provide software level redundancies and there are examples of data centre sites being planned at a town planning level to minimise environmental risks as part of the town itself. (multiple power grids, elevated site, away from oceans, great emergency access, etc)
A slight improvement here is to locate one of the sites inland and another improvement is to ensure both do not feed off the same part of the power grid. (yes sure sites can also partly protect themselves from power outages by hooking up expensive generators but that’s not to say the infrastructure linking the site to the rest of the world is similarly protected)
Now, on the flip side of this off-shoring the DR site which maximises the distance between sites but then introduces additional problems. Doing this may trip “trans-border” legislation as the second site falls under the legal jurisdiction of foreign government and therefore could put you at risk of being in breach of privacy legislation not to mention the legal difficulties involved in navigating a legal system in another country in the event of contract breach. Consider carefully the implications when using off shoring and also think about the technical network latency and the extra risk of thousands of additional kilometres of network infrastructure coming into the equation.
Closer to ideal is locating sites in different states such as Melbourne and Sydney which gives you some protection even from bush fire, flood and other broad scale events but does not then put the backup data centre in foreign hands. It of course assumes you have technology that works over distances greater than 100 kilometres.
On other last thing to keep in mind, cloud hosting and cloud based solutions are not always better (or even typically better) than what your own organisation could support unless you of course have no IT department! I’ve seen several terrible examples this year of vendors “cloud hosting” being considerably worse than what any of my recent clients could do with their existing infrastructure (and hosting the vendors software using my clients IT team) enough to have decided to speak on the topic at a Enterprise Architecture Symposium this year.
This is usually because the vendor didn’t have experience in hosting or is cutting costs out where they think customers don’t see it, such as failing to provide archiving of data or having token secondary sites (picked probably on price rather than on function).
Many vendors wanted the long tail income stream from hosting but are not willing to spend the time learning how to leverage hosting to give customers value. This is not to say that cloud first approach doesn’t offer benefits to businesses, its a new approach that when it is used correctly can provide great value and innovation but you must never assume cloud is better.
On top of this there are of course degrees of quality that can be provided depending on cost once you’re dealing with options that are based on modern and well designed architectures which then becomes a business decision of cost vs risk.
Do your due diligence always and make sure your armed with the right information!