Geo-disaster recovery available in Event Grid
Posted on 29 May 2019
Event Grid now has built-in automatic geo-disaster recovery (GeoDR) of metadata, applicable to all existing domains, topics and event subscriptions, not just for new ones. This means that, in the event of an outage that takes out an entire Azure region, the Event Grid service will already have all your eventing infrastructure metadata synced to a paired region and your new events will begin to flow again with no intervention required, avoiding service interruption.
Disaster recovery is generally measured with two metrics:
- Recovery Point Objective (RPO): the minutes or hours of data that may be lost.
- Recovery Time Objective (RTO): the minutes or hours the service may be down.
Event Grid’s automatic failover has different RPO’s and RTO’s for your metadata (event subscriptions, etc.) and data (events). If you need different specification from below, you can still always implement your own client-side failover using the topic health APIs.
- Metadata RPO: Zero minutes. You read that right. Whenever a resource is created in Event Grid, it’s instantly replicated across regions. In the event of a failover, no metadata is lost.
- Metadata RTO: Although generally this happens much more quickly, within 60 minutes, Event Grid will begin to accept create/update/delete calls for topics and subscriptions.
- Data RPO: If your system is healthy and caught up on existing traffic at the time of regional failover, the RPO for events is about 5 minutes.
Data RTO: Like metadata, this generally happens much more quickly, however within 60 minutes, Event Grid will begin accepting new traffic after a regional failover.