Both the cloud and the enterprise depend on high-speed, highly available networks to power their services. This makes it critical for network operators to be able to control their own destiny by rapidly adding to their network features they need while keeping out feature changes that increase risk and complexity.
At Microsoft, we believe there are many excellent switch hardware platforms available on the market, with healthy competition between many vendors driving innovation, speed increases, and cost reductions. However, what the cloud and enterprise networks find challenging is integrating the radically different software running on each different type of switch into a cloud-wide network management platform. Ideally, we would like all the benefits of the features we have implemented and the bugs we have fixed to stay with us, even as we ride the tide of newer switch hardware innovation.
The Azure Cloud Switch (ACS) is our foray into building our own software for running network devices like switches. It is a cross-platform modular operating system for data center networking built on Linux. ACS allows us to debug, fix, and test software bugs much faster. It also allows us the flexibility to scale down the software and develop features that are required for our datacenter and our networking needs.
ACS also allows us to share the same software stack across hardware from multiple switch vendors. This is done via the Switch Abstraction Interface (SAI) specification, the first open-standard C API for programming network switching ASICs, of the Open Compute Project (OCP). Microsoft was a founding member of the SAI effort and remains a leading contributor to the project as we view SAI as an instrumental piece to make the ACS a success.
While the ACS respects and learns from the experiences of years of quality switch software stacks, it deviates in many aspects from conventional switch software stack to achieve some of the objectives just highlighted.
Traditional switch software is built for several customers with several scenarios and feature requests. Since the ACS focuses on feature development based on Microsoft priorities, it has a Lean Stack. The thin software stack focuses on software needed for our Datacenter Networks and strives to fix, test and remediate network device software bugs faster than the current run rate. The ACS is also a Modular Stack as opposed to one monolithic image. The advantages of a lean and modular stack are plenty. It makes validation easier with less probability for hidden, high priority bugs and reduces new feature request time lag.
ACS strives for Easier Configuration and Management by integrating with Microsoft’s monitoring and diagnostics system. By deviating from the traditional enterprise interactive model of command line interfaces, it allows for switches to be managed just as servers are with weekly software rollouts and roll backs thus ensuring a mature configuration and deployment model.
ACS believes in the power of Open Networking. ACS together with the open, standardized SAI interface allows us to exploit new hardware faster and enables us to ride the tide of ASIC innovation while simultaneously being able to operate on multiple platforms. Running on Linux, ACS is able to make use of its vibrant ecosystem. ACS allows to use and extend Open Source, Microsoft, and Third Party applications. The main functional blocks from top to the bottom of the ACS stack are shown in the figure below.
Applications: These include open source applications such as Quagga, Microsoft specific applications that could relate to an entire configuration management system like Autopilot or a feature like SWAN, and also third party applications.
Switch State Service (SSS): The SSS is a subset of the global network state. It helps in driving the switch towards its goal state. It avails open source key-value pair stores like Redis to manage all switch states requirements. Having a database layer which is also a SAI object management sublayer helps in the object sharing and dependency among different applications. The database is modular and provides application with a view of the states.
SAI: Before SAI, the underlying complexity of the hardware, with its strict coupling of protocol stack software, denied us the freedom to choose the best combination of hardware and software for our networking needs. SAI allows software to program multiple switch chips without any changes, thus making the base router platform simple, consistent, and stable. A standardized API also allows network hardware vendors to develop innovative hardware architectures to achieve great speeds while keeping the programming interface consistent. Additionally, SAI also enables open and easier software development of features, stacks, and applications. As of July 2015, SAI has been officially accepted into the Open Compute Project (OCP). Read more about it here.
Vendor provided hardware and software: This comprises of the actual ASICs, its drivers, the software development kit (SDKs) which talk northbound to the SAI.
The ACS with SAI was demonstrated at the SIGCOMM conference in August 2015. It showcased the ACS, four ASIC vendors (Mellanox, Broadcom, Cavium, and the Barefoot software switch), six implementations of SAI (Broadcom, Dell, Mellanox, Cavium, Barefoot, and Metaswitch), and three applications stacks (Microsoft, Dell, and Metaswitch).
It showcased the ACS’ lean and modular stack. It unleashed the power of a standardized SAI, an ASIC agnostic interface by having one software application talk to the various ASICs. Additionally, the ACS also interworked with Dell’s and Metaswitch’s application stacks. The features were demonstrated on a real world clos topology that Microsoft uses in its datacenters. The features encompassed basic layer3 router functionality to complicated ones such as QoS.
We’re talking about ACS publicly as we believe this approach of disaggregating the switch software from the switch hardware will continue to be a growing trend in the networking industry and we would like to contribute our insights and experiences of this journey starting here.