• 4 min read

Report from Open Networking Summit: Achieving Hyper-Scale with Software Defined Networking

Today, I'm excited to deliver a keynote talk at the Open Networking Summit, where I’ll be talking about how Microsoft is leveraging software-defined networking to power one of the largest public clouds in the world.

Today, I am excited to deliver a keynote talk at the Open Networking Summit, where I’ll be talking about how Microsoft is leveraging software-defined networking to power one of the largest public clouds in the world – Microsoft Azure.

SDN is probably not a new term to you so what is the hype really about? To answer that question we need to take a step back and look at how the datacenter is evolving to meet the growing need for scalability, flexibility and reliability that many IT users need in this mobile-first, cloud-first world. Cloud-native apps and services are creating an unprecedented demand for scale and automation on IT infrastructure. Across the industry, this is driving the move of control systems from hardware devices into software in a trend called Software Defined Datacenter (SDDC), which means empowering customers to virtualize servers, storage and networking to optimize resources and apps with a single click.

With 22 hyper-scale regions around the world, Azure storage and compute usage doubling every six months, and 90,000 new Azure subscriptions a month, Azure has experienced exponential growth. In this environment, we’ve had to learn how to run a software-defined datacenter within our own infrastructure to deliver Azure services to a growing user base. Since the inception of SDDC, we have applied the principles of virtualized, scale-out, partitioned cloud design and central control to everything from the Azure compute plane implementation to cloud storage, and of course, to networking.

Leveraging SDN for Industry-Leading Virtual Networks

We are investing in bringing a cloud design pattern to networking to deliver scalability and flexibility to our customers consuming cloud services both from Azure and within their datacenters. How exactly are we doing this? For starters, we are delivering industry-leading virtual networks (Vnets), which are critical for any public cloud customer. Vnets are built using overlay and Network Functions Virtualization (NFV) technologies implemented in software running on commodity servers, on top of a shared physical network.

By abstracting the software from the hardware layer, we have developed Vnets that are both scalable and agile, but also secure and reliable. Through segmentation of subnets and security groups, traffic flow control with User Defined Routes, and ExpressRoute for private enterprise grade connectivity, we are able to mimic the feel of a physical network with these Vnets.

Elastic Scale through Disaggregating the Network

With the demands on Azure, Vnets must be able to scale up for very large workloads and back down for small workloads. By both separating the control plane and data plane, and centralizing the control plane, we enable networks that can be modified, scaled and programmed quickly. To give a concrete example of the kind of hyper-scale we can achieve in one region, we can scale the data plane to hundreds of thousands of servers by abstracting to hosts.

We use the Azure Virtual Filtering Platform (VFP) in the Hyper-V hosts to enable Azure’s data plane to act as a Hyper-V virtual network switch, enabling us to provide core SDN functionality for Azure networking services. VFP is a programmable switch that exposes an easy-to-program abstract interface to network agents that act on behalf of network controllers like the Vnet controller and our software load balancer controller. By leveraging host components and doing much of packet processing on each host running in the datacenter, the Azure SDN data plane scales massively – both out and up nodes from 1 Gbs to 40 Gbs, and growing.

Scaling up to 40 Gbs and beyond requires significant computation for packet processing. To help us scale up without consuming CPU cycles that can otherwise be made available for customer VMs, Microsoft is building network interface controller (NIC) offloads on Azure SmartNICs. With SmartNICs, Microsoft is bringing the flexibility and acceleration of Field Programmable Gate Arrays (FPGAs) into cloud servers. FPGAs have not yet been widely used as compute accelerators in servers, so Microsoft using them to enable rapid scale with the programmability of SDN and the performance of dedicated hardware is unique in the industry.

Network Security and Reliability with Azure Innovation

Security and reliability are paramount for us. On Azure, one of the ways we ensure a reliable, secure network is through partitioning Vnets with Azure Controllers, which are organized as a set of inter-connected services. Each service is partitioned to scale and runs protocols on multiple instances for high availability. A partition manager service is responsible for partitioning the load among these services based on subscriptions, while a gateway manager service routes requests to the appropriate partition by utilizing the partition service.

Introduced at //Build, Azure Service Fabric is the platform we used to build our network controllers. Service Fabric’s microservices-based architectural design, customers can update individual application components on a rolling basis without having to update the entire application – resulting in a more reliable service, faster updates and higher scalability for building mission-critical applications. Service Fabric powers a broad range of Microsoft hyper-scale services like Azure Data Factory, SQL Database, Bing Cortana, and Event Hubs.

Bringing Azure SDN Innovation to Our Customers’ Datacenters

Every day we learn from the hyper-scale deployments of Microsoft Azure.  Those learnings enable us to bring new capabilities to your datacenter, functioning at a smaller scale to bring you cloud efficiency and reliability.  Our strategy is to adapt the cloud design patterns, points of innovation and structural practices that make Azure a true enterprise grade offering.  The capabilities for the on-premises components are the same, and they’re resident in technology currently in production in datacenters across the world.

We first released SDN technology in Windows Server 2012 including network virtualization and subsequently enhanced this with the release of Windows Server 2012 R2 and System Center 2012 R2.  SDN capabilities in Windows Server derive from the foundational networking technologies that underlie Azure.  Moving forward, we will continue to enhance SDN capabilities with the release of Windows Server 2016 and Microsoft Azure Stack. New features include a data plane and programmable network controller based on Azure, as well as load balancer that is proven at Azure scale.

To see more of what’s going on at ONS, check out the recording here.