Monday, November 16, 2020
In our two-part series on advancing global network reliability through intelligent software, we explain how we’ve approached our network design, and how we’re constantly working to improve both reliability and performance.
Monday, November 9, 2020
In our two-part series on advancing global network reliability through intelligent software, we explain how we’ve approached our network design, and how we’re constantly working to improve both reliability and performance.
Monday, August 31, 2020
Learn how Azure engineering teams have evolved their culture, processes, and frameworks to balance the pace of innovation with assurance of performance and quality.
Monday, August 17, 2020
As part of our Advancing Reliability blog series, we're outlining the investments we’re making to continue improving the outage experience.
Monday, July 27, 2020
This post is designed to get you thinking about how best to validate typical failure conditions, including examples of how we at Microsoft validate our own systems.
Monday, June 29, 2020
As Mark mentioned when he authored the Advancing Reliability blog series, building and operating a global cloud infrastructure at the scale of Azure is a complex task with hundreds of ever-evolving service components, spanning more than 160 datacenters and across more than 60 regions.
Tuesday, June 16, 2020
Scale, resiliency, and performance do not happen overnight—it takes sustained and deliberate investment, day over day, and a performance-first mindset to build products that delight our users.
Tuesday, June 16, 2020
The global health pandemic continues to impact every organization—large or small—their employees, and the customers they serve. Over the last several months, we have seen firsthand the role that…
Wednesday, February 5, 2020
When running IT systems on-premises, you might try to ensure perfect availability by having gold-plated hardware, locking up the server room and throwing away the key. Software wise, IT would traditionally prevent as much change as possible – avoiding applying updates to the OS and/or applications because they’re too critical, and pushing back on change requests from users.
Friday, January 3, 2020
This post continues our reliability series kicked off by my July blog post highlighting several initiatives underway to keep improving platform availability, as part of our commitment to provide a trusted set of cloud services.