Patrick Allaire

IT Service Continuity Perspective: How Far Is Far Enough Between Datacenters?

Blog Post created by Patrick Allaire on May 10, 2017

In February I shared my blog Lies, Damned Lies and Uptime Statistics which covered some justifications for higher IT resiliency.  San Jose flood.pngJust a few days later, California floods were front and center in the news as many populated neighborhoods were devastated by an overflow of rain….  A few miles away from the HDS Santa Clara headquarters, San Jose experienced some of the worst flooding in a century; this was an ironic turn around since California had been in a drought for the last 4 years. 

 

It reminded me how vulnerable we are to mother nature’s mood swings and led me to catch-up on business continuity reading, such as the 2017 Forrester Research study  on the State of Disaster Recovery Preparedness published in the 2017 spring Disaster Recovery Journal.

 

I needed to create some sales and partner communication in support of the of Hitachi Storage Virtualization Operating System (SVOS) 7.1 and the Global-Active Device (GAD) enhancements released in March 2017.  If you are not familiar with Hitachi VSP Family active-active storage solution, here is a short description: GAD enables you to achieve a strict zero RTO and RPO; it ensures continuous operations for key applications and nonstop, uninterrupted data access for critical SAN and NAS deployments.

 

Hitachi GAD solution distance limitation was recently extended from 100km to 500km.  Why 500km? Who cares about this distance improvement? Why does it matter?  This blog post provides the insights I gathered from HDS solution engineering testing, product management (Tom Attanese) and market data.

 

One of the most effective ways to improve the resilience of IT infrastructure is to increase the geographical separation between primary and recovery data center sites. By keeping data protection solutions and failover servers in a different region from primary IT infrastructure, companies reduce the chances that a single natural disaster or power grid failure will take out their backup along with the systems the primary systems that the backup is supposed to protect. However, determining the appropriate level of geographical separation remains the subject of confusion and debate.

 

US Power Grid.pngThe basic rule requires that the datacenter sites should be far enough apart that they are not subjected to the majority of the same risks. 

 

Whether it's winter storms, power outages, or terror threats, you need to make sure that it's highly unlikely that a single event could take down both sites since the two most common causes of declared disasters are power failures and floods.  Tactically, your backup datacenter needs to be connected to an alternate power grid.

 

From a United States perspective, here is a short guide for distances per threats facing your business:

  • Hurricanes: 105 miles of distance (170 kilometers)
  • Volcanoes: more than 70 miles (112 kilometers)
  • Floods: more than 40 miles (64 kilometers)
  • Power grid failure: more than 20 miles (32 kilometers)
  • Tornadoes: more than 10 miles (16 kilometers)

 

Sterling Research.pngThe Sperling research illustrates the risk and the area of each threat.   While the datacenter distance of a hundred miles is never a problem in the U.S; for some European or Asian countries with a smaller geographical size, the easiest solution would be to position a backup site in a neighboring country with compatible laws and regulations.  To mitigate most of the risks, industry consensus suggests you place a disaster recovery location somewhere between 30 miles (50 kilometers) and 100 miles (160 kilometers) away from your primary location. But again, please do your risk assessment first.  By performing a risk assessment on the business that includes threats to the physical location of operations, labor force availability and customer locations, the business will be able to make informed decisions on “how far is far enough”.

 

So, is GAD distance support of 500km overkill? Quite the opposite, many customer requests drove this enhancement; the demand came from our customers who already own their datacenters, or operate in co-locations, with many of these customers located outside the U.S. 

 

The IT industry already recognizes Hitachi’s resiliency as the gold standard, but did you ever wonder why our engineering is so good at it? High availability requirements from our Japanese customers are some of the most demanding due to the risk associated with doing business in one the densest populated areas in the world which is exposed to nearly every possible natural disaster risk (below extract from Lloyd’s City Risk Index report).

WW Cities Risk.png
The Lloyd’s City Risk Index quantifies exposure potential in 301 cities from both man-made and natural threats. Tokyo, which is vulnerable to a much wider combination of risks, including natural (e.g. Tsunami, wind storm, earth quake, flooding, etc) and man-made, sits near the top of Lloyd’s list (after Tapei) with 183 $US Billion in gross domestic product (GDP) at risk. New York City is the highest U.S. city with 91 $US Billion in GDP at risk.  Tokyo businesses need their active-active data center requirements extended to 500km to support their business continuity plan. 

WW Cities Risk Ranking.png

Another Hitachi customer from Switzerland assisted us in the validation as they needed to go beyond the 70 miles (120km) distance to accommodate the location of their secondary data center; it was too cost prohibitive to deploy a high-availability solution outside their existing datacenter locations.

 

What made an impression on me after interviewing Product Manager  (Tom Attanese) was the Hitachi VSP F series test results from our solution engineering that demonstrated the low and consistent latency of the GAD 500km extension.  Keep in mind that the read I/Os are executed locally.  It is only the write I/Os that incurred the response time increases by 1 millisecond per 100km round trip. See the following chart of the Hitachi VSP F series test results at local, 100km, 200km, 300km and 500km distances. Response time is pretty much flat until the workload saturates the system.

GAD long distance test results.png

If you feels only partially prepared or totally unprepared to meet your business continuity goals or you need to raise your SLAs to 24x7 to stay online and be competitive, or thought that a synchronous active-active storage solution was out of reach due to the distance between your datacenters; we are here to help with enterprise wide high availability and disaster recovery solutions.  

 

I am looking forward to your comments.

 

 

 

Patrick Allaire

Outcomes