Skip to main content
Azure
  • 2 min read

Building resilient Azure ExpressRoute connectivity for business continuity and disaster recovery

Moving business-critical workloads demand a disaster recovery strategy not only for the frontend network connectivity to the workloads, but also for backend network connectivity directly to organizations. We share the best practices, architecture, and Azure Networking features that help maximize high availability and ExpressRoute connectivity for disaster recovery. The architecture leverages existing geo-redundant ExpressRoute circuits of an organization additionally for disaster recovery.

As more and more organizations adopt Azure for their business-critical workloads, the connectivity between organizations’ on-premises networks and Microsoft becomes crucial. Azure ExpressRoute provides the private connectivity between on-premises networks and Microsoft. By default, an ExpressRoute circuit provides redundant network connections to Microsoft backbone network and is designed for carrier grade high availability. However, the high availability of a network connectivity is as good as the robustness of the weakest link in its end-to-end path. Therefore, it is imperative that the customer and the service provider segments of ExpressRoute connectivity are also architected for high availability.

Designing for high availability with ExpressRoute addresses these design considerations and talks about how to architect a robust end-to-end ExpressRoute connectivity between a customer on-premises network and Microsoft network core. The document addresses how to maximize high availability of an ExpressRoute in general, as well as components specific to private peering and to Microsoft peering.

Private peering high availability

Each component of the ExpressRoute connectivity is key to build for high availability, including the first mile from on-premises to peering location, from multiple circuits to the same virtual network (VNet), and the virtual network gateway within the VNet.

To improve the availability of ExpressRoute virtual network gateway, Azure offers zone-redundant virtual network gateways utilizing availability zones. ExpressRoute also supports bidirectional forwarding detection (BFD) to expedite link failure detection and thereby significantly improving mean time to recover (MTTR) following a link failure.

Microsoft peering high availability

Further, where and how you implement network address translation (NAT) impacts MTTR of Microsoft PaaS services (including O365) consumed over Microsoft peering following a connection failure. Path selection between the Internet and ExpressRoute on Microsoft peering is also imperative to ensure a highly reliable and scalable architecture.

 

expressroute

ExpressRoute disaster recovery strategy

How about architecting ExpressRoute connectivity for disaster recovery and business continuity? Would it be possible to optimize ExpressRoute circuits in different regions both for local connectivity and to act as a backup for another regional ExpressRoute failure?  In the following architecture, how do you ensure symmetrical cross-regional traffic flow either via Microsoft backbone or via the organization’s global connectivity (outside Microsoft)? Designing for disaster recovery with ExpressRoute private peering addresses these concerns and talks about how to architect for disaster recovery using ExpressRoute private peering.

expressroute2

Summary

To build a robust ExpressRoute circuit, end-to-end ExpressRoute connectivity should be architected for high availability that maximizes redundancy and minimizes MTTR following a failure. A robust ExpressRoute circuit can withstand many single-point failures. However, to safeguard against disasters that impact an entire peering location, your disaster recovery plans should include geo-redundant ExpressRoute circuits. Failing over to geo-redundant ExpressRoute circuits face challenges including asymmetrical routing. The following documents help you architect highly available ExpressRoute circuit and design for disaster recovery using geo-redundant ExpressRoute circuits.