Reducing security alert fatigue using machine learning in Azure Sentinel

Publicado em 19 março, 2019

Data Cowboy, Cloud + AI Security

Last week we launched Azure Sentinel, a cloud native SIEM tool. Machine learning (ML) in Azure Sentinel is built-in right from the beginning. We have thoughtfully designed the system with ML innovations aimed to make security analysts, security data scientists, and engineers productive. The focus is to reduce alert fatigue and offer ML toolkits tailored to the security community. The three ML pillars in Azure Sentinel include Fusion, built-in ML, build your own ML.

Fusion

Alert fatigue is real. Security analysts face a huge burden of triage as they not only have to sift through a sea of alerts, but also correlate alerts from different products manually or using a traditional correlation engine.

Our Fusion technology, currently in public preview, uses state of the art scalable learning algorithms to correlate millions of lower fidelity anomalous activities into tens of high fidelity cases. Azure Sentinel integrates with Microsoft 365 solution and correlates millions of signals from different products such as Azure Identity Protection, Microsoft Cloud App Security, and soon Azure Advanced Threat Protection, Windows Advanced Threat Protection, O365 Advanced Threat Protection, Intune, and Azure Information Protection. You can learn how to turn Fusion on by visiting our documentation, “Enable Fusion.”

Screenshot of fusion and two composite alerts

Fusion combines yellow alerts, which themselves may not be actionable, into high fidelity security interesting red cases. We look at disparate products to produce actionable incidents so as to reduce the false positive rate. From our measurement with external customers and internal evaluation, we have a median 90 percent reduction in alert fatigue. This is possible because Fusion can detect complex, multi-stage attacks and differs from traditional correlation engines in the following ways:

Traditional correlation engines

Fusion

Assume that the attacker takes only one path to attain their goal.

Iterative attack simulation - Fusion encodes uncertainty with paths/stages by simulating different attack paths using an iterative arkov chain Monte Carlo simulations.

Assumes the attacker follows a static kill chain, as the attack path is executed.

Probabilistic cloud kill chain – Fusion constantly updates the probability of moving to the next step in kill chain through a custom defined prior probability function.

Assumes that all the information is present in the logs to catch the attacker.

Using advances in graphical methods – we encode uncertainty in completeness/connectivity of information in the kill chain helping us to detect novel attacks.

In the above screenshot, one can see that the Fusion case, and the two composite alerts that went into it.

Organizations are currently using Fusion for the following scenarios to compound anomalies from Identity Protection and Microsoft Cloud App Security products.

  • Anomalous login leading to O365 mailbox exfiltration
  • Anomalous login leading to suspicious cloud app administrative activity
  • Anomalous login leading to mass file deletion
  • Anomalous login leading to mass file download
  • Anomalous login leading to O365 impersonation
  • Anomalous login leading to mass file sharing
  • Anomalous login leading to ransomware in cloud app

Built-in ML

Machine learning is now an essential toolkit in security analytics to detect novel types of attacks that escape the traditional rules based system. However, a scarce ML talent pool makes it difficult for security organizations to staff applied security data scientists. To democratize the ML toolkit tailored to the needs of the security community, we introduce built-in ML which is currently in limited public preview.

Built-in ML is designed for security analysts and engineers, with no prior ML knowledge to reuse ML systems designed by Microsoft’s fleet of security machine learning engineers. The benefits of built-inML systems are that organizations dont have to worry about traditional investments like ML training cross validation, or deployment and quickly identify threats that wouldnt be found with a traditional approach.

Behind the cover, built-in ML uses principles of model compression and elements of transfer learning to make the model developed by Microsoft’s ML engineers ready to use for any organization’s needs. Our models are trained on diverse datasets, and periodically retrained to take concept drift into account.

We are opening our flagship geo login anomaly model for any security analyst to use to detect unusual logins in SSH logs. No ML expertise is necessary, customers bring in their logs to Azure Sentinel and use built-in ML systems to gain analysis instantly.

Build-your-own ML

We recognize that organizations have different levels of investments in machine learning for security use cases. Some organizations may have data scientists who need to go deeper and customize the analysis further. For these organizations, we offer the option of Build-you-own ML to author security analytics.

Azure Sentinel will offer Databricks, Spark, and Jupyter Notebook detection’s authoring environment, in order to take care of data plumbing, provide ML algorithm in templates, code snippets for model training and scheduling, and soon introduce seamless model management, model deployment, workflow scheduler, data versioning capabilities and specialized security analytics libraries. This will free up security data scientists from tedious pipeline and platform work, and focus on productive analytics on a hyper scale ML-security platform.

Additional resources

We will be updating this space with the technical details behind these innovations! If you have questions about turning on built-in ML or using build-your-own ML infrastructure, please reach out to askepd@microsoft.com. We also strongly recommend customers enable Fusion when they use Azure Sentinel. You can learn how to turn Fusion on by visiting our documentation, “Enable Fusion.”