Rich Vining

How RPO Impacts RTO, part 1

Blog Post created by Rich Vining on Oct 29, 2014

 

This blog series examines the business costs associated with backup and recovery, with a focus on the costs of downtime caused by both backup and restore operations. The holy grail of data protection is to drive these costs to zero, by reducing or eliminating the time to perform a backup (defined by the backup window objective) and the time to restore operations following an outage (defined by the recovery time objective). This entry examines another data protection metric – the recovery point objective.

 

Recovery Point Objective, or RPO, is the measure of the granularity of previous points in time that you want to be able to recover a particular data set from. An RPO of 24 hours says that a single backup operation per day is “good enough”. Other ways to describe it include:

 

  • The frequency of backup operations
  • The amount of new data that you are willing to risk losing

 

RPO is totally different than Recovery Time Objective. RTO is the goal for how long it should take to restore a system or application or access to a data set following an unplanned event, such as caused by human error, hardware failure, or natural disaster. Some people include “software errors” in this description, but software doesn’t make mistakes – the programmer who wrote it did (just kidding, coding buddies).

 

RTO defines how much time, and therefore how much cost, risk or lost revenue opportunity, the organization is comfortable absorbing following an outage or disaster.

 

Since RPO and RTO are totally different, does one impact the other? My first reaction to this question was, “no”, but in reality the way that you reach your RPOs can have a dramatic impact on your ability to meet your RTOs.

 

Let’s say you have a very large database that can only be backed up on long weekends. So to meet an RPO of 24 hours, you have to backup the database journals or redo logs on a nightly basis. This way, you can restore the last full database backup and then roll-forward database transactions using the journals or redo logs.

 

So will the time it takes to restore the last full backup and then all of the journals meet your established recovery time objective for this large database system? Unless the RTO is measured in weeks or months, probably not. The conclusion is that this methodology of database protection can be used to meet reasonable recovery point objectives, but will not support a reasonable recovery time objective.

 

A similar situation exists in just the traditional FULL + INCREMENTAL backup methods for standard file systems. In this model, you typically (hopefully) complete a full backup each weekend, and then perform an incremental capture of each day’s new data during the week. If you suffer a failure on Monday and need to perform a full recovery, no problem – just restore the FULL backup from the weekend.

 

But if your failure happens on Friday, you need to first restore the FULL from the previous weekend and then each of the INCREMENTAL sets, sequentially, from Monday through Thursday evening. A recovery on Friday will therefore take significantly longer than one on Monday. Does your RTO take this into account? Also, the recovery on Friday is more fraught with risk, since it’s a multi-step process and you may be overwriting some of the restored files, maybe as many as 4 times.

 

Clearly, as data volumes continue to increase in a compound fashion and the IT landscape becomes ever more complex, better approaches to meeting backup (RPO) and recovery (RTO) requirements are needed.

 

Shameless Plug

 

The goal of Hitachi’s data protection software solutions is to drive the costs of backup and recovery toward zero, by reducing or eliminating the time to perform a backup (defined by the backup window objective), the amount of data at risk of loss (defined by the recovery point objective), and the time to restore operations following an outage (defined by the recovery time objective).

 

Hitachi Data Systems offers a solution to the problem of protecting large databases and critical applications that both enables much better RPOs and also meets tight RTOs. This solution consists of:

 

  • Storage-based snapshot and replication technologies that:
    • Eliminate the load of data protection operations from the database system
    • Eliminate the need for a backup window and its associated downtime
    • Enable much more frequent backup operations, reducing the amount of data at risk by 90% or more
  • Database- and application-aware snapshot and replication management software that:
    • Places the database or application into a backup-ready, or quiesced, state automatically
    • Executes the storage-based snapshot and then releases the database / application to continue normal operations
    • Enables, fast, fully application-consistent operational recovery – in minutes, not weeks
    • Completely orchestrates the entire process, eliminating the need to create and manage large numbers of scripts
  • Assessment and implementation services to define and configure the optimal solution for your unique environment

 

To learn more:

 

Rich Vining is a senior product marketing manager at HDS and has more than 25 years in the storage industry. The contents of this blog are his own.


LinkedIn.pngtwitter_logo1-Copy.png

Outcomes