Donnerstag, 10. November 2022
“Earlier this year, we introduced Project Flash in the Advancing Reliability blog series, to reaffirm our commitment to empowering Azure customers in monitoring virtual machine (VM) availability in a…
Montag, 3. Oktober 2022
Wir stellen AiDice vor, einen neuartigen Anomalieerkennungsalgorithmus, der gemeinsam von Microsoft Research und Microsoft Azure entwickelt wurde. Dieser Algorithmus erkennt Anomalien in großen Mengen mehrdimensionaler Zeitreihendaten. AiDice erfasst Incidents in kürzester Zeit und liefert Engineers wichtigen Kontext, durch den sie Probleme effektiver diagnostizieren können. Auf diese Weise wird Endkunden eine ideale User Experience geboten.
Montag, 14. Februar 2022
Wir freuen uns, heute den Abschluss der ersten beiden Projektmeilensteine anzukündigen: die Vorschau von VM-Verfügbarkeitsdaten in Azure Resource Graph und die private Vorschau einer VM-Verfügbarkeitsmetrik in Azure Monitor.
Montag, 22. November 2021
Das wichtigste Leistungsversprechen für unsere Identitätsdienste ist, dass alle Benutzer*innen stets unterbrechungsfrei auf die benötigten Apps und Dienste zugreifen können. Diesem Versprechen kommen wir durch einen mehrschichtigen Ansatz nach, wodurch wir eine auf 99,99 % verbesserte Authentifizierungsuptime für Azure Active Directory (Azure AD) erzielt haben.
Donnerstag, 30. September 2021
Microsoft’s cloud supply chain is essential to deliver the infrastructure—servers, storage, and networking gear—that enables cloud reliability and growth. Our vision is for cloud capacity to be available like a utility so that customers can seamlessly turn it on when and where they need it.
Montag, 2. August 2021
Now, in addition to getting a fast notification when a VM’s availability is impacted, customers can expect a root cause to be added at a later point once our automated Root Cause Analysis (RCA) system identifies the failing Azure platform component that led to the VM failure.
Montag, 12. Juli 2021
We created the Azure Well-Architected Framework to help improve the quality of your workloads, and reliability is one of its five core pillars so for the latest post in our series, I have asked Cloud Advocate David Blank-Edelman to run through how best to approach using the framework to guide your conversations and design decisions in this space.
Mittwoch, 7. Juli 2021
All service engineering teams in Azure are already familiar with postmortems as a tool for better understanding what went wrong, how it went wrong, and the customer impact of the related outage. For today’s post in our Advancing Reliability blog series, we share insights into our journey as we work towards advancing our postmortem and resiliency threat modeling processes.
Mittwoch, 30. Juni 2021
The continuous monitoring of health metrics is a fundamental part of this process, and this is where AIOps plays a critical role. In the post that follows, we introduce how AI and machine learning are used to empower DevOps engineers, monitor the Azure deployment process at scale, detect issues early, and make rollout or rollback decisions based on impact scope and severity.
Montag, 7. Juni 2021
There are many factors that can affect critical environment infrastructure availability—the reliability of the infrastructure building blocks, the controls during the datacenter construction stage, effective health monitoring and event detection schemes, a robust maintenance program, and operational excellence to ensure that every action is taken with careful consideration of related risk implications.