Azure Service Health, Management and Governance
Update #2 on Microsoft cloud services continuity
4 min read
Since last week’s update, the global health pandemic continues to impact every organization—large or small—their employees, and the customers they serve. Everyone is working tirelessly to support all our customers, especially critical health and safety organizations across the globe, with the cloud services needed to sustain their operations during this unprecedented time. Equally, we are hard at work providing services to support hundreds of millions of people who rely on Microsoft to stay connected and to work and play remotely.
As Satya Nadella shared, “It’s times like this that remind us that each of us has something to contribute and the importance of coming together as a community”. In these times of great societal disruption, we are steadfast in our commitment to help everyone get through this.
For this week’s update, we want to share common questions we’re hearing from customers and partners along with insights to address these important inquiries. If you have any immediate needs, please refer to the following resources.
Azure Service Health – for tracking any issues impacting customer workloads and understanding Azure Service Health
Microsoft 365 Service health and continuity – for tracking and understanding M365 Service health
Xbox Live – for tracking game and service status
What have you observed over the last week?
In response to health authorities emphasizing the importance of social distancing, we’ve seen usage increases in services that support these scenarios—including Microsoft Teams, Windows Virtual Desktop, and Power BI.
We have seen a 775 percent increase of our cloud services in regions that have enforced social distancing or shelter in place orders.
- We have seen a 775 percent increase in Teams' calling and meeting monthly users in a one month period in Italy, where social distancing or shelter in place orders have been enforced.
- We have seen a very significant spike in Teams usage, and now have more than 44 million daily users. Those users generated over 900 million meeting and calling minutes on Teams daily in a single week. You can read more about Teams data here.
- Windows Virtual Desktop usage has grown more than 3x.
- Government use of public Power BI to share COVID-19 dashboards with citizens has surged by 42 percent in a week.
Have you made any changes to the prioritization criteria you outlined last week?
No. Our top priority remains support for critical health and safety organizations and ensuring remote workers stay up and running with the core functionality of Teams.
Specifically, we are providing the highest level of monitoring during this time for the following:
- First Responders (fire, EMS, and police dispatch systems)
- Emergency routing and reporting applications
- Medical supply management and delivery systems
- Applications to alert emergency response teams for accidents, fires, and other issues
- Healthbots, health screening applications, and websites
- Health management applications and record systems
Given your prioritization criteria, how will this impact other Azure customers?
We’re implementing a few temporary restrictions designed to balance the best possible experience for all of our customers. We have placed limits on free offers to prioritize capacity for existing customers. We also have limits on certain resources for new subscriptions. These are ‘soft’ quota limits, and customers can raise support requests to increase these limits. If requests cannot be met immediately, we recommend customers use alternative regions (of our 54 live regions) that may have less demand surge. To manage surges in demand, we will expedite the creation of new capacity in the appropriate region.
Have there been any service disruptions?
Despite the significant increase in demand, we have not had any significant service disruptions. As a result of the surge in use over the last week, we have experienced significant demand in some regions (Europe North, Europe West, UK South, France Central, Asia East, India South, Brazil South) and are observing deployments for some compute resource types in these regions drop below our typical 99.99 percent success rates.
Although the majority of deployments still succeed, (so we encourage any customers experiencing allocation failures to retry deployments), we have a process in place to ensure that customers that encounter repeated issues receive relevant mitigation options. We treat these short-term allocation shortfalls as a service incident and we send targeted updates and mitigation guidance to impacted customers via Azure Service Health—as per our standard process for any known platform issues.
When these service incidents happen, how do you communicate to customers and partners?
We have standard operating procedures for how we manage both mitigation and communication. Impacted customers and partners are notified through the Service Health experience in the Azure portal and/or in the Microsoft 365 admin center.
What actions are you taking to prevent capacity constraints?
We are expediting the addition of significant new capacity that will be available in the weeks ahead. Concurrently, we monitor support requests and, if needed, encourage customers to consider alternative regions or alternative resource types, depending on their timeline and requirements. If the implementation of these efforts to alleviate demand is not sufficient, customers may experience intermittent deployment related issues. When this does happen, impacted customers will be informed via Azure Service Health.
Have you needed to make any changes to the Teams experience?
To best support our Teams customers worldwide and accommodate new growth and demand, we made a few temporary adjustments to select non-essential capabilities such as how often we check for user presence, the interval in which we show when the other party is typing, and video resolution. These adjustments do not have significant impact on our end users’ daily experiences.
Is Xbox Live putting a strain on overall Azure capacity?
We’re actively monitoring performance and usage trends to ensure we’re optimizing services for gamers worldwide. At the same time, we’re taking proactive steps to plan for high-usage periods, which includes taking prudent measures with our publishing partners to deliver higher-bandwidth activities like game updates during off-peak hours.
How does in-home broadband use impact service continuity and capacity? Any specific work being done with ISPs?
We’ve been in regular communication with ISPs across the globe and are actively working with them to augment capacity as needed. In particular, we’ve been in discussions with several ISPs that are taking measures to reduce bandwidth from video sources in order to enable their networks to be performant during the workday.
We’ll continue to provide regular updates on the Microsoft Azure blog.
This post was updated on March 30, to clarify the first bullet.