[Updated on 03/03/2015]
Improvement to the Estimated Recovery Time (ERT) and Recovery Point Objective (RPO) for Basic, Standard and Premium database tiers.
This post continues our series on the Business Continuity/Disaster Recovery (BCDR) capabilities of Azure SQL Database and focuses on geo-restore. In my previous post on standard geo-replication I discussed how the various business continuity features map to service tiers. The table below is a quick refresher.
BCDR option |
Basic tier |
Standard tier |
Premium tier |
Point In Time Restore | Any restore point within 7 days | Any restore point within 14 days | Any restore point within 35 days |
Geo-Restore | ERT* < 12h RPO† < 1h | ERT* < 12h RPO† < 1h | ERT* < 12h RPO† < 1h |
Standard Geo-Replication | Not included | ERT* < 30s RPO† < 5s | ERT* < 30s RPO† < 5s |
Active Geo-Replication | Not included | Not included | ERT* < 30s RPO† < 5s |
* Estimated Recovery Time (ERT) – The estimated duration for the database to be fully functional after a restore/failover request.
† Recovery Point Objective (RPO) – The amount of most recent data changes (time interval) the application could lose after recovery.
What is Geo-restore?
Geo-restore provides the ability to restore a database from a geo-redundant backup to create a new database. The database can be created on any server in any Azure region. Because it uses a geo-redundant backup as its source it can be used to recover a database even if the database is inaccessible due to an outage. Geo-restore is automatically enabled for all service tiers at no extra cost.
Geo-restore in detail
Geo-restore uses the same technology as point in time restore with one important difference. It restores the database from a copy of the most recent daily backup in geo-replicated blob storage (RA-GRS). For each active database, the service maintains a backup chain that includes a weekly full backup, multiple daily differential backups, and transaction logs saved every 5 minutes. These blobs are geo-replicated this guarantees that daily backups are available even after a massive failure in the primary region. Figure 1 illustrates this process.
Figure 1. Geo-replication of weekly and daily backups copied to the storage container(s).
If a large scale incident in a region results in unavailability of your database application you can use geo-restore to restore a database from the most recent backup to a server in any other region. The backups are geo-replicated and can have a delay between when the backup is taken and geo-replicated to the Azure blob in different geo. For large scale incident, there can be up to 1 hour data loss, i.e., RPO of up to 1 hour. Figure 2 illustrates the recovery process.
Figure 2. Restore of the database from the last daily backup.
How to use Geo-restore
Geo-restore can be invoked from the Azure Management Portal using the Backups tab on an affected server. This tab provides a list of all the available backups for all databases on that server, showing the last backup time for each database. Once you have selected a backup for restore you can provide a name for the new database and specify the target server, which may be in any region. Once confirmed, the restore request is placed in a queue for processing in the target region. Figure 3-5 illustrates these steps.
Figure 3. Select the degraded server that contained the database you want to recover
Figure 4. Select the database you want to recover from the list of backups available backups
Figure 4. Specify the server where the database will be restored
Figure 5. Monitor the status of the database being restored
Like standard and active geo-replication, geo-restore can be managed and invoked using a REST API, PowerShell as well as the Azure Management Portal. The Azure Management Portal is well suited to ad hoc geo-restore of small numbers of databases. The REST API or PowerShell can be used to script recovery of multiple databases or to integrate with custom management scripts or applications. You can read more about Azure SQL Database REST API and Azure SQL Database Recovery PowerShell API.
If you initiated a restore operation by mistake and would like to cancel it you can do it by connecting to the target master database and dropping the restored database. Note, the database record for it is displayed as soon as the restore process starts. So you can drop it without waiting for the restore complete. That way you can avoid any billing impact.
Factors affecting recovery time
Recovery time is impacted by several factors: the size of the database and the performance level of the database, and the number of concurrent restore requests being processed in the target region. If there is prolonged outage in a region it is possible that there will be large numbers of geo-restore requests being processed by other regions. The service will if necessary limit the resources used by restore operations to ensure that existing workloads in that region are not adversely impacted. If there are a large number of requests this may increase the recovery time for databases in that region.
While geo-restore is available with all service tiers, it is the most basic of the DR solutions available in SQL Database with the longest RPO and Estimate Recovery Time (ERT). For Basic databases with maximum size of 2 GB Geo-restore provides a reasonable DR solution with an ERT of 12 hours. For larger Standard or Premium databases, if significantly shorter recovery times are desired, or to reduce the likelihood of data loss you should consider using standard or active geo-replication. Both geo-replication recovery options offer a much lower RPO and ERT as they only require you initiate a failover to a continuously replicated secondary.
DR Drills
It is often a compliance requirement to demonstrate that production databases are adequately protected from a disaster. As geo-restore can be used at any time you can conduct periodic DR drills to satisfy yourself that DR procedures based on this recovery option are effective.
Summary
The combination of geo-restore, standard geo-replication and active geo-replication provide you with a range of options to implement a business continuity solution that meets the needs of your application and business. In addition to the cost and SLO differences we discussed earlier the option you choose will also define what business continuity scenarios are enabled. The following table summarizes these differences:
Scenario | Geo-restore | Standard Geo-replication |
Active Geo-replication |
Regional disaster | Yes | Yes | Yes |
DR drill | Yes | Yes | Yes |
Online application upgrade | No | No | Yes |
Online application relocation | No | No | Yes |
Read load balancing | No | No | Yes |
We encourage you try geo-restore to see where it might fit in your BCDR strategy. As always we’re listening closely to feedback so please tell us what you think.
You can read more about Azure SQL Database Backup and Restore in this article.