This guest post is from the Project Springfield team in Microsoft’s Artificial Intelligence and Research group. Project Springfield delivers pioneering artificial intelligence for finding security issues as a cloud service. Learn how the team used Azure to meet and exceed scaling challenges on a tight timeline.
The Project Springfield engineering team, led by William Blum, had built the first release of “Project Springfield,” which helped customers find “million-dollar” security bugs by combining pioneering “whitebox fuzzing” technology from Microsoft Research with the elasticity of the cloud. Customers could upload their software to Project Springfield, which created a fuzzing lab in Azure. Each fuzzing lab tested with a portfolio of methods, looked for crashes from the test cases, and then picked the highest value issues to report. The power of Azure enabled this compute intensive process to scale up and scale down as customers’ demands changed, while simultaneously collecting data from every run to improve the service. To do this, Project Springfield had to dynamically create large numbers of virtual machine and network resources and manage them on behalf of the customer.
We had built the initial product on Azure, using the classic Azure management interface to dynamically provision virtual machines and networking resources. Now it was time to prepare for a new wave of customers – which meant scaling up the service by orders of magnitude. Scaling with the classic Azure would be a challenge. For example, each fuzzing lab used up a different cloud service on Project Springfield, yet there was limit of just 200 cloud services per subscription. That meant if the customers, in aggregate, ever needed to test more than 200 pieces of software at a time, Project Springfield would need to partition fuzzing labs across subscriptions even if each subscription otherwise had enough virtual machines available to serve customers. There had to be a better way.
We found that better way by re-architecting the service with the Azure management interface Azure Resource Manager as well as Service Fabric, Microsoft's micro-service-based application platform. With Azure Resource Manager, virtual machines, virtual networks, and load balancers are all treated as different resources. These resources can be combined in an Azure Resource Manager template, which is a JSON object defining what resources we needed and how they fit together. All the resources Project Springfield needs for a security testing lab are specified by a single template. When a customer needs a new lab, Azure can read the template and then dynamically create all the resources needed from the template. With Service Fabric, we could easily port our backend worker roles to micro services and dynamically scale up and scale down backend resources based on customer needs. The payoff was that instead of being locked into a single inflexible bundle, we could dynamically reshape the way resources were deployed.
Re-architecting the service around the new deployment concepts introduced by ARM required some work. The work paid off as we found that Azure’s infrastructure-as-a-service capabilities gave us better control and finer granularity over the configuration of our network and compute resources. Once adjusted to a new way of thinking, we could see how to make Project Springfield even more efficient and deliver value. For example, we realized that by using Azure Network Security Groups, we could enable each customer to set different IP address restrictions on who could access their Project Springfield resources – a key feature for enterprise users.
Even better, betting on Azure made us “future proof.” As Azure launched new features, such as support for Red Hat Linux, Windows Server Containers and more, we could see how they would let Project Springfield meet customer needs. With Azure Resource Manager, these are now just different kinds of resources in a resource manager template. That gave the team a single consistent way for managing fuzzing labs and laid the foundation for eventually offering different types of fuzzing labs for different customer needs.
By using the new capabilities of Azure, such as Azure Resource Manager, we achieved our scale goals in four months. That meant we could bring on customers and partners for trials of Project Springfield as fast as we could call them, without worrying that we would run out of capacity. What’s more, building on Azure set the team up for success as new capabilities came online. At Microsoft Ignite 2016, OSIsoft & Deschutes Brewery, EY, and Leviathan Security Group stood on stage and told the world about the value they saw in Project Springfield. Within a week over a thousand people signed up for trials! That’s a win by any standard.