Advancing reliability

Thought leadership July 2 9 min read

Meet Brain: The AI system behind Azure reliability

Learn how Microsoft is building a digital twin of Azure Service Health and why it changes how hyperscale operates.
Announcements July 1 7 min read

Proving application resilience on Azure with Chaos Studio

Azure Chaos Studio helps organizations validate application resilience by simulating outages, failovers, network disruptions, and infrastructure failures before they impact production.
Thought leadership February 17 6 min read

Azure reliability, resiliency, and recoverability: Build continuity by design

Modern cloud systems are expected to deliver more than uptime.
Announcements July 23, 2025 5 min read

Project Flash update: Advancing Azure Virtual Machine availability monitoring

Flash enables rapid detection of issues originating from the Azure platform, helping teams respond quickly to infrastructure-related disruptions.
Thought leadership March 6, 2025 5 min read

Optimizing incident management with AIOps using the Triangle System

In this blog, we’ll dive into how large language models, generative AI, and the Triangle System help us leverage automation and feedback loops for more efficient incident management.
Thought leadership September 19, 2024 4 min read

Achieve agility and scale in a dynamic cloud world

At Microsoft, we want to give you more choice, flexibility, and resiliency for your cloud solutions, and we encourage you to take advantage of these benefits by adopting a multi-region growth strategy.
Thought leadership August 29, 2024 6 min read

Advancing cloud platform operations and reliability with optimization algorithms

“In today’s rapidly evolving digital landscape, we see a growing number of services and environments (in which those services run) our customers utilize on Azure.
Thought leadership April 16, 2024 4 min read

Microsoft Entra resilience update: Workload identity authentication

Today, we’ll build on our resilience blog post series by going further in sharing how workload identities gain resilience from the regionally isolated authentication endpoints as well as from the backup authentication system.
Announcements April 8, 2024 6 min read

Advancing memory leak detection with AIOps—introducing RESIN

We are introducing RESIN, an end-to-end memory leak detection service designed to holistically address memory leaks in large cloud infrastructure.
Thought leadership February 22, 2024 6 min read

Advancing Microsoft Azure resilience with Chaos Studio

Microsoft Azure Chaos Studio solution helps you measure, understand, improve, and maintain the resilience of your application through hypothesis-driven chaos experiments.
Best practices January 29, 2024 8 min read

Advancing application reliability with performance testing in Azure

As organizations prepare for peak events and unforeseen challenges, performance testing stands as a beacon, guiding them toward reliable, high-performance systems that can weather the storm of user demands.
Thought leadership November 30, 2023 7 min read

Building resilience to your business requirements with Azure

In this blog post, we will discuss some of the design principles and characteristics that we see among the customer leaders we work with closely to enhance their critical workload availability according to their specific business needs.

Explore Microsoft Foundry

The future of AI starts here. Envision your next great AI app with the latest technologies. Get started with Azure.

Learn more about Microsoft Foundry