Microsoft Azure Security expands variant hunting capacity at a cloud tempo | Azure Blog

Azure has been a leader in the development and implementation of variant hunting as a method for identifying and addressing potential security threats.

In the first blog in this series, we discussed our extensive investments in securing Microsoft Azure, including more than 8500 security experts focused on securing our products and services, our industry-leading bug bounty program, our 20-year commitment to the Security Development Lifecycle (SDL), and our sponsorship of key Open-Source Software security initiatives. We also introduced some of the updates we are making in response to the changing threat landscape including improvements to our response processes, investments in Secure Multitenancy, and the expansion of our variant hunting efforts to include a global, dedicated team focused on Azure. In this blog, we’ll focus on variant hunting as part of our larger overall security program.

Variant hunting is an inductive learning technique, going from the specific to the general. Using newly discovered vulnerabilities as a jumping-off point, skilled security researchers look for additional and similar vulnerabilities, generalize the learnings into patterns, and then partner with engineering, governance, and policy teams to develop holistic and sustainable defenses. Variant hunting also looks at positive patterns, trying to learn from success as well as failure, but through the lens of real vulnerabilities and attacks, asking the question, “why did this attack fail here, when it succeeded there?”

In addition to detailed technical lessons, variant hunting also seeks to understand the frequency at which certain bugs occur, the contributing causes that permitted them to escape SDL controls, the architectural and design paradigms that mitigate or exacerbate them, and even the organizational dynamics and incentives that promote or inhibit them. It is popular to do root cause analysis, looking for the single thing that led to the vulnerability, but variant hunting seeks to find all of the contributing causes.

While rigorous compliance programs like the Microsoft SDL define an overarching scope and repeatable processes, variant hunting provides the agility to respond to changes in the environment more quickly. In the short term, variant hunting augments the SDL program by delivering proactive and reactive changes faster for cloud services, while in the long term, it provides a critical feedback loop necessary for continuous improvement.

Leveraging lessons to identify anti-patterns and enhance security

Starting with lessons from internal security findings, red team operations, penetration tests, incidents, and external MSRC reports, the variant hunting team tries to extract the anti-patterns that can lead to vulnerabilities. In order to be actionable, anti-patterns must be scoped at a level of abstraction more specific than, for example, “validate your input” but less specific than “there’s a bug on line 57.”

Having distilled an appropriate level of abstraction, variant hunting researchers look for instances of the anti-pattern and perform a deeper assessment of the service, called a “vertical” variant hunt. In parallel, the researcher investigates the anti-pattern’s prevalence across other products and services, conducting a “horizontal” variant hunt using a combination of static analysis tools, dynamic analysis tools, and skilled review.

Insights derived from vertical and horizontal variant hunting inform architecture and product updates needed to eliminate the anti-pattern broadly. Results include improvements to processes and procedures, changes to security tooling, architectural changes, and, ultimately, improvements to SDL standards where the lessons rapidly become part of the routine engineering system.

For example, one of the static analysis tools used in Azure is CodeQL. When a newly identified vulnerability does not have a corresponding query in CodeQL the variant hunting team works with other stakeholders to create one. New “specimens”-that is, custom-built code samples that purposely exhibit the vulnerability-are produced and incorporated into a durable test corpus to ensure learnings are preserved even when the immediate investigation has ended. These improvements provide a stronger security safety net, helping to identify security risks earlier in the process and reducing the re-introduction of known anti-patterns into our products and services.

Diagram showing Security Research Findings, Penetration Testing and Deep Security Reviews, and Threat Modeling as inputs into the Variant Hunting process. The outcomes from Variant Hunting are Long-term Controls and Systemic Improvements, Short-term mitigations and controls, and Standardized Controls in SDL.

Azure Security’s layered approach to protecting against server-side threats

Earlier in this series, we highlighted security improvements in Azure Automation, Azure Data Factory, and Azure Open Management Infrastructure that arose from our variant hunting efforts. We would call those efforts “vertical” variant hunting.

Our work on Server-Side Request Forgery (SSRF) is an example of “horizontal” variant hunting. The impact and prevalence of SSRF bugs have been increasing across the industry for some time. In 2021 OWASP added SSRF to its top 10 list based on feedback from the Top 10 community survey-it was the top requested item to include. Around the same time, we launched a number of initiatives, including:

Externally, Azure Security recognized the importance of identifying and hardening against SSRF vulnerabilities and ran the Azure SSRF Research Challenge in the fall of 2021.
Internally, we ran a multi-team, multi-division effort to better address SSRF vulnerabilities using a layered approach.
Findings from the Azure SSRF Research challenges were incorporated to create new detections using CodeQL rules to identify more SSRF bugs.
Internal research drove investment in new libraries for parsing URLs to prevent SSRF bugs and new dynamic analysis tools to help validate suspected SSRF vulnerabilities.
New training has been created to enhance prevention of SSRF vulnerabilities from the start.
Targeted investments by product engineering and security research contributed to the creation of new Azure SDK libraries for Azure Key Vault that will help prevent SSRF vulnerabilities in applications that accept user-provided URIs for a customer-owned Azure Key Vault or Azure Managed HSM.

This investment in new technology to reduce the prevalence of SSRF vulnerabilities helps ensure the security of Azure applications for our customers. By identifying and addressing these vulnerabilities, we are able to provide a more secure platform for our customers on which to build and run their applications.

In summary, Azure has been a leader in the development and implementation of variant hunting as a method for identifying and addressing potential security threats. We have hired and deployed a global team focused exclusively on variant hunting, working closely with the rest of the security experts at Microsoft. This work has resulted in more than 800 distinct security improvements to Azure services since July 2022. We encourage security organizations all over the world to adopt or expand variant hunting as part of your continuous learning efforts to further improve security.

Learn more about Azure security and variant hunting

Read the first blog in this series to learn about Azure’s security approach, which focuses on defense in depth, with layers of protection throughout all phases of design, development, and deployment of our platforms and technologies.
Learn more about the out-of-the-box security capabilities embedded in our cloud platforms.
Register today for Microsoft Secure on March 28 to view our session covering built-in security across the Microsoft Cloud.