Whether your organization is a one-person shop or a global enterprise, cloud computing makes it easier to do business with customers and partners around the world, and it’s disrupting traditional IT practices in the process. Cloud computing reduces costs and improves service quality. It empowers organizations to quickly respond to changing demands for new services and lets them focus on their core business rather than IT. Enterprises are moving on-premises servers, datacenters, and services to the cloud. Startup companies are building cloud-based businesses from the ground up. Both are offloading infrastructure concerns to cloud providers and they’re getting nearly unlimited on-demand compute, storage, networking, and software as a service capabilities from almost anywhere in the world.
Ideally, cloud services “are secure, compliant, and just work.” Although you may realize that there is a massive datacenter infrastructure behind them, you may not know that the quality and integrity of the service you get depends on robust and secure networks. No matter how good the underlying server infrastructure is, a slow or low-quality network connection at any point between you, or your customer, and the datacenter will degrade your experience.
At Microsoft, our goal is to offer cloud services that any customer, anywhere in the world, can securely use without worrying about capacity constraints or service quality. We want customers to be able to get to their resources from anywhere, at any scale, with no limitations, easily and securely. However, when we started developing cloud offerings, we quickly realized that connecting an enterprise-grade cloud infrastructure across the entire world would take new networking technologies and novel management strategies. Traditional networking approaches wouldn’t give us the speed, reliability, and security needed by customers. To meet these challenges, we’ve been innovating and heavily investing in network infrastructure.
Figure 1. The Microsoft global network
Software-Defined Networking innovations
Hardware takes time to rack, stack, and configure, but we wanted to let customers scale their services up and down with a click. Using the pioneering work of Microsoft Research in Software-Defined Networking, we built a scalable and flexible datacenter network. It uses a virtualized layer 3 overlay network that is independent of the physical network topology. In this design, multiple virtual networks run on the same physical network in the datacenter, just like multiple virtual machines run isolated from each other on the same physical server. Each customer has their own isolated virtual network. Customers get on-demand network services with the network defined and managed in software, and are not tied to specific hardware.
For our Azure datacenters, we use scalable software load balancing developed by Microsoft Research which pushes networking intelligence into software. We eliminated hardware load balancers and replaced them with Azure Load Balancer running on standard compute servers. Now customers provision a load balancer with just a click. Although this approach is widely accepted now, it was novel in the industry when we first introduced it.
Azure handles the most demanding networking workloads by providing each virtual machine with up to 25 Gbps bandwidth with very low latency within each region. To achieve world-class performance, we optimized the network from an end-to-end perspective. Servers running in our datacenters have special network cards (NICs) that offload network processing to the hardware. We’ve also developed novel network acceleration techniques using Field Programmable Gate Array (FPGA) technology incorporated into our SmartNIC project introduced at SIGCOMM 2015. These network optimizations free up the server CPU to handle application workloads. Customers get a great networking experience. Linux and Windows virtual machines will experience these performance improvements while returning valuable CPU cycles to the application. When our world-wide deployment completes in April, we’ll update our VM Sizes table so you can see the expected networking throughput performance numbers for our virtual machines.
Another area we tackled to improve performance was how we connect our regional datacenters. Worldwide, Microsoft has regions comprised of multiple campuses and each campus may have multiple datacenters. The sheer physical size and power consumption of the physical network gear needed to connect our datacenters within these campuses presented a design challenge. We took the learnings from designing and deploying in-datacenter flat, high bandwidth networks and applied them to inter-DC networks. We created a regional network high bandwidth interconnection architecture using networking optics that Microsoft co-developed. These optics will be available from third-party suppliers, thereby allowing other cloud providers to take advantage of our innovations in this area.
Global backbone and edge: Connecting from any client, anywhere
We wanted to optimize the network experience as customers connect to our cloud services from anywhere in the world. We built a backbone network that spans the globe, even laying undersea cables to Europe and Asia. All our datacenters connect to this global network that supports Azure, Bing, Dynamics 365, Office 365, OneDrive, Skype, Xbox, and soon LinkedIn. It’s one of the largest backbone networks in the world.
Our backbone network also connects to the Microsoft edge network, which in turn connects our peers to the Internet. We peer with thousands of networks with more than 4,500 connections globally. Our goal is that latency will be dictated only by the physics of the speed of light, not by the lack of a networking path or lack of sufficient bandwidth in a geography. Since network latency is a function of physical distance, we strategically locate our edge nodes close to customers. We continue to grow our network, with more than 130 edge nodes around the world. To further reduce latency, we allow customers to cache content at the edge nodes. We’ve developed Traffic Manager, a network service that automatically routes customer traffic to the closest datacenter and acts as a global cloud load balancer. Customers define a routing policy, and we implement it. In addition to performance, policies can be defined for disaster recovery and round-robin load sharing.
At selected edge locations, we also allow private network connectivity via a service called ExpressRoute. Customers can use their existing network carriers to bypass the Internet to reach our cloud services. Customers enter our network at select edge locations; from there, they reach any of our datacenters. For example, customers can get connectivity to a local ExpressRoute site in Dallas and access their virtual machines in Amsterdam, Busan, Dublin, Hong Kong, Osaka, Seoul, Singapore, Sydney, Tokyo, or any of our other datacenters, with the traffic safely staying on our global backbone network. We have 37 ExpressRoute sites with one near each Azure datacenter, as well as other strategic locations. Every time we announce a new Azure region, like we recently did in Korea, you can expect that ExpressRoute will also be there.
Microsoft is a global software and services company. Our rich heritage, combined with years of operational experience running a global cloud infrastructure, permeates our perspective and approach. We’ve built a cloud-scale network using automation, and we’re moving intelligence from hardware to software. In future posts over the next few weeks, we’ll dive deeper into Microsoft networking technologies, detailing our journey as we continue to pioneer and transform the computing landscape in this exciting era of cloud disruption. We’ll cover topics such as our approach to open source networking, a deeper inspection of our global WAN, details on network security, and insights into how we manage a global network that supports some of the biggest services in the world. We hope you’ll join us for this insider’s tour of Microsoft networking.
To read more posts from this series please visit: