Get the tools and training that you need to design and operate mission-critical systems with confidence
Reliability is a shared responsibility
Achieve your organisation’s reliability goals for all of your workloads by starting with the resilient foundation of the Azure cloud platform. Design and operate your mission-critical applications with confidence, knowing that you can trust your cloud because Azure prioritises transparency – always keeping you informed and able to act quickly during service issues.
If you’re looking to optimise an existing application on Azure, get started with the Azure Well-Architected Framework, a set of guiding tenets across five core pillars: reliability, security, performance efficiency, cost optimisation and operational excellence.
Start with a reliable foundation on Azure infrastructure
Learn about ongoing Microsoft investments to maintain and improve cloud platform reliability in Azure CTO and Technical Fellow Mark Russinovich’s Advancing Reliability blog series, which focuses on these four key areas:
The Microsoft network connects more than 60 Azure regions, 220 Azure data centres, 170 edge sites and over 165,000 miles of terrestrial and subsea fibre worldwide, which connects to the rest of the Internet at strategic global edge points of presence. Learn more about Microsoft network reliability in this two-part blog post.
Azure continuously delivers innovative new services and security by making constant improvements to its infrastructure. Learn how Azure minimises the impact of these updates on customers by using a rigorous change automation process.
Learn about the no- and low-impact update technologies – including hot patching, memory-preserving maintenance and live migration – that Azure uses to maintain its infrastructure with little or no customer impact or downtime.
Choose the right Azure resilience capabilities for your needs
Find out which Azure high-availability, disaster recovery and backup capabilities to use with your apps. Also, learn how to select the compute, storage and geographic (local, zonal and regional) redundancy options that are right for you.
Enable built-in resilience
Take advantage of optional Azure services and features to achieve your specific reliability goals.
Run critical workloads across data centres with independent power, cooling and networking.
Achieve redundancy within a data centre by collocating or separating resources.
Implement automatic failover, optimise traffic and combine on-premises and cloud systems.
Replicate on-premises and Azure workloads from a primary site to a secondary location.
Back up data with a simple, secure and cost-effective recovery and restoration solution.
Create and store multiple copies of your data with redundancy options for any scenario.
Monitor your cloud so that it isn’t a black box
Ensure long-term reliability with monitoring tools to identify, diagnose and track anomalies – and optimise your reliability and performance.
Identify resource issues and resolve them using a customisable dashboard.
Collect, analyse and act on telemetry data from Azure and on-premises environments.
Get intelligent insights into app usage and diagnose anomalies.
Monitor, diagnose and gain insights into network performance and health.
Optimise apps and systems for reliability with recommendations based on usage telemetry.
Reliability trusted by organisations of all sizes
Keeping trains on time with intelligent maintenance
– Tatsunari Nishibuchi, Subgroup Manager, Systems Group, Social Infrastructure Platforms Promotion Project Group, Mitsubishi Electric Corporation
"The PaaS functions in Azure made it easier to achieve the scalability we wanted. Azure has a wealth of functions, from data collection and storage to data analysis and API management, plus sophisticated operational management and monitoring functions. These features make it possible to develop a multiuser system that has zero downtime."
Serbia’s largest airport soars with automated recovery
– Marko Marković, IT Department Director, AD Aerodrom Nikola Tesla Beograd
"We wanted a business continuity plan for recovery for the business systems we need to run the airport, but without the expense of commissioning and maintaining secondary infrastructure. We also wanted to ensure recovery is fast and automated in the event of any failure."
Clinical alerts delivered at any scale with Stat
– John McConnell, Supervisor of Solution Architecture and Development, University of Vermont Medical Center
"We need 100 percent reliability in mission-critical apps. That's what we get from Azure SignalR Service and the other Microsoft solutions that we used to create Stat. We're very pleased."
Stable insurance app delivers premium customer experience
– Pieter Van Soerland, IT Manager, Zilveren Kruis
"Before we went through the end-of-year period, the stability and performance of the calculator were my main concerns . . . but it ended up running the smoothest out of all the solutions in our application landscape. It's very stable, and web pages even load slightly faster now."
Kodak Alaris boosts productivity by improving ERP resilience
– Joseph Calabrese, IT Operations Manager, Kodak Alaris
"The one thing I don't want is my CIO coming to me because there's a problem with our ERP. The truth is, it never happens anymore—it's a real testament to our ERP's reliability in Azure."
Making SAP even more flexible and resilient with Azure
– Erick Cardenas, IT Administrator, KOT Insurance Company AG
"Cloud computing is easily one of the best IT achievements in the business world. When we decided to move to it with the help of Microsoft Azure, we got efficiency, reliability, flexibility, and speed as much as we wanted. We still think we received nothing but benefits."