Always-on, real-time threat protection with Azure Cosmos DB - part two

Veröffentlicht am 23 Juli, 2019

Program Manager, Azure Cosmos DB

This two-part blog post is a part of a series about how organizations are using Azure Cosmos DB to meet real world needs, and the difference it’s making to them. In part one, we explored the challenges that led the Microsoft Azure Advanced Threat Protection team to adopt Azure Cosmos DB and how they’re using it. In part two, we’ll examine the outcomes resulting from the team’s efforts.

Built-in scalability, performance, availability, and more

The Azure Advanced Threat Protection team’s decision to use Azure Cosmos DB for its cloud-based security service has enabled the team to meet all key requirements, including zero database maintenance, uncompromised real-time performance, elastic scalability, high availability, and strong security and compliance. “Azure Cosmos DB gives us everything we need to deliver an enterprise-grade security service that’s capable of supporting the largest companies in the world, including Microsoft itself,” says Yaron Hagai, Principal Group Engineering Manager for Advanced Threat Analytics at Microsoft.

Zero maintenance

A managed database service has saved Hagai’s team immense maintenance efforts, allowing Azure Advanced Threat Protection to stay up and running with only a handful of service engineers. “Azure Advanced Threat Protection saves us from having to patch and upgrade servers, worry about compliance, and so on,” says Hagai. “We also get capabilities like encryption at rest without any work on our part, which further enables us to direct our resources to improving the service instead of keeping it up and running.”

Scaling to support customer growth is just as hands-free. “We use Azure CLI scripts to provision and deprovision clusters in multiple Azure regions—it’s all done automatically, so new clusters for new customers can be deployed easily and when needed,” says Hagai. “Scaling is also automatic. Throughput-based splitting has been especially helpful because it lets our databases scale to support customer growth with zero maintenance from the team.”

Real-time performance

Azure Cosmos DB is delivering the performance needed for an important security service like Azure Advanced Threat Protection. “Since we protect organizations after they have been breached, speed of detection is essential to minimizing the damage that might be done,” explains Hagai. “A high-throughout, super-scalable database lets us support lots of complex queries in real-time, which is what allows us to go from breach to alerting in seconds. The performance provided by Azure Cosmos DB is one more thing that makes it the most production-grade document DB in the market, which is another reason we chose it.”

The following graph shows sustained high throughout for the service’s largest tenant, with a heavy bias towards writes, which happen every 10 minutes as Azure Advanced Threat Protection persists in-memory caches of profiles to Azure Cosmos DB.

Graph showing sustained high throughout for the service’s largest tenant

Elastic scalability

Since Azure Advanced Threat Protection launched in March 2018, its usage has grown exponentially in terms of both users protected and paying organizations. “Azure Cosmos DB allows us to scale constantly, without any friction, which has helped us support a 600 percent growth in our customer base over the past year,” says Hagai. “That same scalability allows us to support larger customer installations than we could with Microsoft Advanced Threat Analytics, our on-premises solution. Microsoft’s own internal network is a prime example; it had grown too large to support with a single, on-premises server running Mongo DB, but with Azure Cosmos DB, it’s no problem.”

Scaling up and down to support frequent fluctuations in traffic, as shown in the following graph, is just as painless. “The graph shows traffic for our largest tenant, with the spikes in throughout due to scheduled tasks that produce business telemetry,” he explains. “This is a great example of the auto-scaling benefits of Azure Cosmos DB and how they allow us to automatically scale up individual databases to support a short burst of throughput each day, then automatically scale back down after the telemetries are calculated to minimize our service delivery costs.”

Graph showing traffic for a large tenant with the spikes in throughout due to scheduled tasks that produce business telemetry

Strong security and compliance

Because Azure Advanced Threat Protection is built on Azure Cosmos DB and other Azure services, which themselves have high compliance certifications, it was easy to achieve the same for Azure Advanced Threat Protection. “The access control mechanisms in Azure Cosmos DB allow us to easily secure access and apply advanced JIT policies, helping us keep customer data secure,” says Hagai.

High availability

Although the availability SLA for Azure Cosmos DB is 99.999 percent for multi-region databases, to Hagai, the actual availability they’ve seen in production is even higher. “I had the Azure Cosmos DB team pull some historical availability numbers, and it turns out that the actual availability we’ve seen during April, May, and June of 2019 has been between 99.99995 and 99.99999 percent,” says Hagai. “To us, that’s essentially 100 percent uptime, and another thing we don’t need to worry about.”

Learn more about Azure Advanced Threat Protection and Azure Cosmos DB today.