4 min read
SONiC, as an open-source operating system for network devices, has been growing rapidly in the last five years. According to Gartner Market Guide for Data Center Switching published early this year, they predict, “By 2025, 40 percent of organizations that operate large datacenter networks (more than 200 switches) will run SONiC in production environments.” And, “due to this rapidly expanding customer interest and commercial ecosystem, there is a strong possibility that, during the next three to six years, SONiC will become analogous to Linux as a server operating system, allowing enterprises to standardize on a NOS that is supported across hardware vendors.”
We have been working with many partners on innovations extending SONiC to new scenarios in the past year. Let’s look at what was showcased in the OCP Global Summit this month, and the opportunities SONiC enables.
Enable high-reliability dual ToR support with smart cable
High availability is a never-ending pursuit for network engineers. Delivering packets for customers without any glitch is a simple ask, however challenging to promise due to all sorts of possible failures on the path. Research shows the critical role of network infrastructure—each switch has a 2 percent chance of suffering a failure within three months of deployments, with 32 percent of failures attributed to hardware faults and 27 percent to unplanned power outages. The classical way to improve the reliability of a path is to add redundancy to reduce the impact of hardware failure. This year the SONiC community developed an innovative way to provide dual ToR (Top of Rack) connectivity to customer VMs. This SONiC-based approach does not require adding more NICs to the existing servers and avoids using the traditional MLAG (Multi-Chassis Link Aggregation) mechanism that is prone to split-brain failure. The secret goes inside the cable. Instead of the conventional Y cable, the new smart cable contains a microcontroller and a hitless MUX. The intelligence sits in the SONiC ToR switches. They manage the MUX inside the smart cable, determine the traffic path for the server, and handle failover rapidly. Measurements show this approach gives dual connectivity through a smart cable and SONiC switches with a failover time of less than 1µs. This capability is available in the SONiC 20201230 release. Microsoft, Broadcom, Credo, and many companies have contributed to this.
Figure 1: Dual ToR support through smart cable and SONiC switches.
DASH enables limitless networking
The programmable hardware ecosystem (smart NIC, smart ToR, smart appliance) has been booming in the last two years and will continue to grow. The performance and customizability of such devices are outstanding. This year, the SONiC community has launched a new workstream—DASH (Disaggregated APIs for SONiC Hosts)—to capitalize this for limitless networking. The initial goal is to improve the L4 performance and connection scale of Software Defined Networking operations by 10 times to 100 times over software implementation solutions. DASH leverages modern high-speed SmartNIC hardware to accelerate the flow processing, changing the game of implementing the data plane of SDN. The first set of overlay and underlay SAI APIs for VNET-to-VNET connectivity have been defined, and the test design is under active discussion. We foresee many applications that will benefit from DASH innovation, for example, encryption gateways with high-speed inline encryption and key management, load balancers, service tunneling, and more. The open-source nature of SONiC enables the flexibility to customize for individual use cases. We standardize APIs through SAI (Switch Abstraction Interface) to ensure interoperability across various programmable hardware. The solution inherits comprehensive monitoring, diagnostic capacity, reliability such as hitless upgrades and management of containers from SONiC for free. Nvidia, Pensando, Intel, and many partners are actively contributing to the program.
Figure 2: Seven initial DASH scenarios.
PINS—extending SONiC with programmability
There are two ways to run the network—distributed or centralized; each has its strength. In the distributed model, each switch has the intelligence to discover neighbors, build the routing table, and react to topology changes. Such a network can scale and self-heal rapidly if there is a failure. In the centralized model, a dedicated external control system builds up the topology and programs onto switch nodes in the network. The network behavior is deterministic, easy to debug, and enables optimal traffic engineering. In the past year, the PINS (P4 Integrated Network Stack) community and the SONiC community have been working together to integrate SONiC with PINS. This enables a customer to build an SDN (Software Defined Network) centrally controlled network with SONiC switches. The SDN controller programs the network devices, SONiC switches, through P4 with behavior model or contract defined by P4Runtime, which enables extensions to the runtime. This initiative allows SONiC users to have many choices to build their network with a rich hardware ecosystem. The minimum viable product will be in the SONiC 20211130 release, with more L2/L3 functionalities coming in the following releases.
There are many other great creative works from the SONiC community for qualification, for example, SAI Challenger from PLVision, SRv6 for Telco Segment by Intel and Alibaba, SAI test framework for interoperability by Intel and Microsoft, and Automation for deployment by Broadcom. Check them out at the OCP Global Summit.
For more information on Microsoft’s role in the open-source hardware community and our showcase at OCP Global Summit 2021, check out the blog: Learn how Microsoft Azure is accelerating hardware innovations for a sustainable future.
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
Gartner, Market Guide for Data Center Switching, Andrew Lerner, Jonathan Forest, Evan Zeng, 8 March 2021