Přeskočit navigaci

Historie stavu Azure

Produkt:

Oblast:

Datum:

červen 2018

20.6

Azure Data Factory Services - Multiple Regions

Summary of impact: Between 12:25 and 15:00 UTC on 20 Jun 2018, customers using Data Factory, Data Factory V2, Data Movement, SSIS Integration Runtime, and Data Movement & Dispatch may have experienced errors, including but not limited to pipeline execution errors, copy activity and store procedure activity errors, manual and scheduled SSIS activity failures, and data movement and dispatch failures. This incident is now mitigated. Customers still seeing issues with their self-hosted Integration Runtime (IR) needed to manually restart this service instances to mitigate these issues. These customers were communicated to separately on the Azure Portal.

Preliminary root cause: Engineers have determined that a backend service was not renewed correctly during a recent deployment which caused the issues.

Mitigation: Engineers rolled back the deployment of the problematic component service to mitigate this incident.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences. 

19.6

RCA - Service availability issue in North Europe

Summary of impact: From 17:44 on 19 Jun 2018 to 04:30 UTC on 20 Jun 2018, customers using Azure services in North Europe may have experienced connection failures when attempting to access resources hosted in the region. Customers leveraging a subset of Azure services may have experienced residual impact for a sustained period post-mitigation of the underlying issue. 

Preliminary root cause: On 19 Jun 2018, Data Center Critical Environments systems in one of our datacenters in the North Europe region experienced an increase in outside air temperature. During the process of maintaining the inside operating temperatures within operational specifications, we experienced a control systems failure in a limited area of the datacenter that triggered an unexpected rise in humidity levels within 2 of our colos. This unexpected rise in humidity levels in the operational areas caused multiple Top of Rack (TORs) network devices and hard disk drives supporting two Storage scale units in the region to experience hardware component failures. These failures caused significant latency and/or intra scale unit communications issues between the servers which led to availability issues for customers with data hosted on the affected scale units.

Mitigation: The control systems failure/humidity rise was quickly resolved by Data Center engineers restoring inside operational conditions, however the server node reboots and hard drive failures caused Storage, Compute and downstream services to require additional structured recovery operations to fully restore services. Once the network communication to the affected Storage scale units was restored, Storage availability was restored for most customers. A limited subset of customers experienced an extended recovery while engineers worked to restore failed disk drives.

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):

 - Engineers continue to analyze detailed event data to determine if additional environmental systems modifications are required.

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey:



14.6

Azure Bot Services - Mitigated

Summary of impact: Between 07:15 and 10:06 UTC on 14 Jun 2018, a subset of customers using Azure Bot Service may have experienced difficulties while trying to connect to bot resources.

Preliminary root cause: Engineers identified a code defect with a recent deployment task as the potential root cause.

Mitigation: Engineers performed a rollback of the recent deployment task to mitigate the issue.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.

13.6

RCA - Multiple Azure Services availability issues and Service Management issues for a subset of Classic Azure resources - South Central US

Summary of impact: Between 15:57 UTC and 19:00 UTC on 13 Jun 2018 a subset of customers in South Central US may have experienced difficulties connecting to resources hosted in this region. Engineers have determined that this was caused by an underlying Storage availability issue. Other services that leverage Storage in this region may also be experienced impact related to this. These services may include: Virtual Machines, App Service, Visual Studio Team Services, Logic Apps, Azure Backup, Application Insights, Service Bus, Event Hub, Site Recovery, Azure Search, and Media Services. In addition, customers may have experienced failures when performing Service Management operations on their resources. Communications for Service Management operations issue is published to the Azure Portal.

Root cause and mitigation: One storage scale unit in the South Central US region experienced a significant increase in load, which caused increased resource utilization on the backend servers in the scale unit. The increased resource utilization caused several backend roles to become temporary unresponsive resulting in timeouts and other errors, which caused VMs and other storage dependent services to be impacted. The Storage service utilizes automatic load balancing to help automatically mitigate this type of incident, however in this case automatic load balancing was not sufficient and engineer intervention was required to mitigate the incident. To stabilize the scale unit and backend roles, engineers rebalanced the load. Impacted services recovered shortly thereafter. Engineers are continuing to investigate the cause for the increase in load and implement additional steps to help prevent a reoccurrence.  

Next steps:We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
1. Improve load balancing system to better manage the load across servers -pending
2. Improve resource usage management to better handle low resource situations – pending
3. Improve server restart time after an incident that results in degraded Storage availability – in progress
4. Improve fail over strategy to help ensure that impacted services can more quickly recover - in progress

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey

12.6

Visual Studio App Center

Summary of impact: Between 03:37 and 08:06 UTC on 12 Jun 2018, a subset of Visual Studio App Center customers may have experienced impact to build operations such as failure to start. 

Preliminary root cause: Engineers determined that a recent deployment task impacted instances of a backend service which became unhealthy, preventing requests from completing.

Mitigation: Engineers rolled back the recent deployment task to mitigate the issue and have advised that actions taken will also provide mitigation for the limited subset of customers using Visual Studio Team Services that may have experienced delays and failures of VSTS build using Hosted Agent pools in Central US.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.

10.6

RCA - Multiple Azure Services impacted in West Europe

Summary of impact: Between approximately 02:24 and 06:20 UTC on 10 Jun 2018, a subset of customers in the West Europe region may have experienced difficulties connecting to their resources due to a storage issue in this region. Multiple Azure Services with a dependency on Storage and/or Virtual Machines also experienced secondary impact to their resources for some customers. Impacted services included: Storage, Virtual Machines, SQL Databases, Backup, Azure Site Recovery, Service Bus, Event Hub, App Service, Logic Apps, Automation, Data Factory, Log Analytics, Stream Analytics, Azure Map, Azure Search, Media Services.

Root cause and mitigation: In the West Europe region, there were two storage scale units which were affected by an unexpected increase in inter-cluster network latency. This network latency caused the geo-replication delays and triggered increased resource utilization on the storage scale units. The increased resource utilization caused several backend roles to become temporary unresponsive resulting in timeouts and other errors causing VMs and other storage dependent services to be impacted. To stabilize the backend roles, engineers rebalanced the load which stabilized the scale units. Our service typically does take care of the load balancing automatically, but in this case, it was not sufficient and required engineer involvement.

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
- Improve load balancing system to better manage the load across servers - in progress.  
- Resource usage management to handle low resource situations better- in progress. 

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey

květen 2018

31.5

RCA - App Service - Service Management Issues

Summary of impact: Between 21:14 to 22:50 UTC on May 30, 2018 and 06:00 UTC to 08:45 UTC on 31 May 2018, a subset of customers using App Service, Logic Apps, and Functions may have experienced intermittent errors viewing App Services resources, or may have seen the error message, "This resource is unavailable" when attempting to access resources via the Azure portal. In addition, a subset of customers may have received failure notifications when performing service management operations - such as create, update, delete - for resources. Existing runtime resources were not impacted.

Root cause and mitigation: Due to removal of some resource types, the platform triggered a subscription syncing workflow in the Resource Manager. This caused an extremely high volume of update calls to be issued across multiple App Service subscriptions. The high call volume caused a buildup of subscription update operation objects in the backend database and resulted in sub-optimal performance of high frequency queries. The database usage reached a critical point and led to resource starvation for other API calls in the request pipeline. As a result, some customer requests encountered long processing latencies or failures. Engineers stopped the workflow to mitigate the issue.

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):

1. Ensure that subscription update operation objects are cleaned up immediately [complete]

2. Better optimization implemented for high frequency queries [complete]

3. Additional monitoring to maintain effective throttling from resource manager [pending]

30.5

RCA - Virtual Machines - Service Management Issues - USGov Virginia

Summary of impact: Between 05:30 and 17:46 EST on 30 May 2018 a subset of customers in USGov Virginia may have been unable to manage some Virtual Machines hosted in the region. Restart attempts may have failed or machines may have appeared to be stuck in a starting state. Other dependent Azure services experienced downstream impact. Services included were Media Services, Redis Cache, Log Analytics, and Virtual Networks.

Root cause and mitigation: Engineers determined that one of the platform services was using high CPU due to an unexpected scenario. Engineers were aware of the potential issue with this platform service, and already had a fix in place; unfortunately this incident occurred before engineers rolled out the fix to the environment. This issue caused the core platform service, responsible for service management requests, to became unhealthy, preventing requests from completing. Engineers used platform tools to disable the high CPU consuming service. In addition, engineers applied a service configuration change to expedite recovery and return the core platform service to a healthy state.

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
1. Apply a fix patch on the service that was using high CPU and complete its rollout to all environments [Completed].
2. Actively monitor resource utilization (CPU, Memory and Hard Drive) from all platform services that could impact core platform degradation [Completed].
3. Define strict resource utilization thresholds for platform services [In Progress].

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey:

25.5

SQL DB - West Europe - Mitigated

Summary of impact: Between 05:00 UTC on 25 May 2018 and 1:36 UTC on 26 May 2018, a subset of customers using SQL Database in West Europe may have experienced intermittent increases in latency, timeouts and/or connectivity issues when accessing databases in this region. New connections to existing databases in this region may also have resulted in an error or timeout, and existing connections may have been terminated.

Preliminary root cause: Engineers identified that this issue was related to a recent deployment.

Mitigation: Engineers rolled back the deployment, performed a configuration change and redeployed to mitigate the issue.

23.5

RCA - App Service, Logic Apps, Functions - Service Management Issues

Summary of impact: Between 18:20 and 21:15 UTC on 23 May 2018, a subset of customers using App Service, Logic Apps, and Azure Functions may have experienced latency or timeouts when viewing resources in the Azure Portal. Some customers may also have seen errors when performing service management operations such as resource creation, update, delete, move and scaling within the Azure Portal or through programmatic methods. Application runtime availability was not impacted by this event.

Root cause and mitigation: The root cause of this issue was that some instances of a backend service in App Services, which are responsible for processing service management requests, experienced a CPU usage spike. This spike was due to a large number of data intensive queries being simultaneously processed across multiple instances, causing high load in the backend service. These queries required processing of very large data sets which consumed the higher than normal CPU, and prevented other requests from completing.
The issue was automatically detected by our internal monitoring and engineers deployed a hotfix to throttle the CPU intensive queries and free up resources. The incident was fully mitigated by 21:15 UTC on 23 May 2018.

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
1. Implement permanent throttling of queries to prevent causing system load – Completed.
2. Improve throughput and overall performance by adding processing optimization for CPU intensive queries – In Progress
3. Enhance incident notification to provide quicker and more accurate customer communications – In Progress.

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey:

22.5

Visual Studio Team Service - Availability and Performance Issues

Summary of impact: Between 14:55 and 16:55 UTC on 22 May 2018, a subset of customers using Visual Studio Team Services may have experienced degraded performance and latency when accessing accounts or navigating through workspaces. Additional information can be found at the VSTS Blog here - .

Preliminary root cause: Engineers determined that the underlying root cause was a recent configuration settings change which conflicted with prior settings.

Mitigation: Engineers rolled back the configuration settings to mitigate the issue.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.

22.5

RCA - Azure Active Directory - Issues Authenticating to Azure Resources via Microsoft Account (MSA)

Summary of impact: Between 02:19 and 03:35 UTC on 22 May 2018, a subset of customers attempting to login to their Microsoft Account (MSA) using Azure Active Directory (AAD) may have experienced intermittent difficulties when attempting to authenticate into resources which are dependent on AAD. In addition, Visual Studio Team Services (VSTS) customers may have experienced errors attempting to login to VSTS portal through MSA using https://<Accountname>.visualstudio.com. Additional experiences that rely on Microsoft Account sign in may have had intermittent errors.

Root cause and mitigation: Engineers determined that a recent maintenance task introduced a misconfiguration setting which impacted instances of a backend cache service. This resulted in some authentication requests failing to complete. This was immediately detected and engineers initiated a manual fail over of the service to healthy data centers to mitigate the impact. 

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
1. Increase Isolation between service tiers to remove risk to service from component level destabilization and reduce fault domain – in progress.
2. Automate failovers to mitigate customer impact scenarios faster – in progress.

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey:

17.5

RCA - Japan East - Service Management Operations Issues with Virtual Machines

Summary of impact: Between 23:06 UTC on 17 May 2018 and 01:30 UTC on 18 May 2018, a subset of customers using Virtual Machines in Japan East may have received failure notifications when performing service management operations - such as create, update, delete - for resources hosted in this region.

Root cause and mitigation: The Azure Load Balancing service was in the process of upgrading the operating system when we encountered a hardware issue on a small number of scale units in the region. These scale units were running older hardware that had compatibility problems where the network hardware would stop forwarding packets after a period of time and would require a reimage of the OS to continue working. The time between failures was related to the load on the physical network. The load balancer was not directly servicing customer traffic, instead it was supporting a subset of the Azure management services and as a result these services were unavailable, resulting in failures to service management operations that interacted with these services. To mitigate the issue the operating system was rolled back to the previous version. 

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):

  • Roll out a fix to the network hardware which caused Azure Load Balancing instance availability to crash. In the interim a temporary mitigation has been applied to prevent this from resurfacing in any other region. 
15.5

Visual Studio App Center - Distribution Center

Summary of impact: Between 17:15 and 20:00 UTC on 15 May 2018, a subset of customers using Visual Studio App Center may have received error notifications when attempting to use the distribute feature in Distribution Center.

Mitigation: Engineers performed a manual restart of a backend service to mitigate the issue.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.

3.5

RCA - Multiple Services - West Central US

Summary of impact: Between 19:47 and 22:05 UTC on 03 May 2018, customers in West Central US may have experienced difficulties connecting to resources hosted in this region. The incident was caused by a configuration change deployed to update Management Access Control Lists (ACLs). The change was deployed to network switches in a subset of clusters in the West Central US Region. The configuration was rolled back to mitigate the incident.

Root cause and mitigation: Azure uses Management ACLs to limit access to the management plane of network switches to a small number of approved network management services and must occasionally update these ACLs. Management ACLs should normally have no effect on the flow of customer traffic through the router. In this incident, the Management ACLs were incorrectly applied in some network switches due to the differences in how ACLs are interpreted between different operating system versions for the network switches. In the impacted switches, this led to the blocking of critical routing Border Gateway Protocol (BGP) traffic, which led to the switches being unable to forward traffic into a subset of the storage clusters in this region. This loss of connectivity to a subset of storage accounts resulted in impact to VMs and services with a dependence on those storage accounts. Engineers responded to the alerts and mitigated the incident by rolling back the configuration changes. 

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
1. Enhancing configuration deployment testing / validation processes and automation to account for variances in network switch operating systems - in progress
2. Reviewing and enhancing the deployment methods, procedures, and automation by incorporating additional network health signals - in progress
3. Monitoring and alerting improvements for Border Gateway Protocol (BGP) - in progress

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey:

duben 2018

26.4

RCA - App Service - Canada Central

Summary of impact: Between 02:45 to 3:28 UTC and 11:53 to 13:30 UTC on April 26, 2018, customers using App Service in Canada Central may have intermittently received HTTP 500-level response codes, experience timeouts or high latencies when accessing App Service deployments hosted in this region. 

Root cause and mitigation: The root cause for the issue was that there was a significant increase in HTTP traffic to certain sites deployed to this region. The rate of requests was so much higher than usual that it exceeded the capacity of the load balancers in that region. Load balancer throttling rules were applied for mitigation initially. However, after a certain threshold, existing throttling rules were unable to keep up with the continued increase in request rate. A secondary mitigation was applied to load balancer instances to further throttle the incoming requests. This fully mitigated the issue.

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
1. Adding aggressive automated throttling to handle unusual increases in request rate
2. Adding network layer protection to prevent malicious spikes in traffic

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey

26.4

Traffic Manager - Connectivity Issues

Summary of impact: Between 12:46 and 18:01 UTC on 26 Apr 2018, a subset of customers using Traffic Manager may have encountered sub-optimal traffic routing or may have received alerts relating to degraded endpoints. Customers were provided a workaround during the incident.

Preliminary root cause: A configuration issue with a backend network route caused issues with Traffic Manager probes reaching customer endpoints while checking the endpoint health status which led to those endpoints being marked as unhealthy and traffic routed away from them.

Mitigation: Engineers made mapping updates to the network route which mitigated the issue.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.

19.4

Service Bus - West Europe

Summary of impact: Between approximately 12:00 and 14:44 UTC on 19 Apr 2018, a subset of customers using Service Bus in West Europe may have experienced intermittent timeouts or errors when connecting to Service Bus queues and topics in this region.

Preliminary root cause: This issue is related to a similar issue that occurred on the 18th of April in the same region. Engineers determined that the underlying root cause was a backend service that had become unhealthy on a single scale unit, causing intermittent accessibility issues to Service Bus resources.

Mitigation: While the original incident self-healed, engineers have additionally performed a change to the service configuration to reroute traffic from the affected scale unit to mitigate the issue. In addition, a manual backend scale out was performed.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.

17.4

Content Delivery Network Connectivity

Summary of impact: Between approximately 18:30 and 20:50 UTC on 17 Apr 2018, a subset of customers using Verizon CDN may have experienced difficulties connecting to resources within the European region. Additional Azure services, utilizing Azure CDN, may have seen downstream impact.

Preliminary root cause: Engineers determined that a network configuration change was made to Verizon CDN, causing resource connectivity issues.

Mitigation: Verizon engineers mitigated the issue by rerouting traffic to another IP.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.

15.4

RCA - Issues Performing Service Management Operations - Australia East/Southeast

Summary of impact: Between 21:00 UTC on 15 Apr 2018 and 03:20 UTC on 16 Apr 2018, customers in Australia Southeast may have been unable to view resources managed by Azure Resource Manager (ARM) via the Azure Portal or programmatically and may have been unable to perform service management operations. After further investigation, customers using ARM in Australia East were not impacted by this issue. Service availability for those resources was not affected.

Customer impact: Customers ability to view their existing resources was impacted.

Root cause and mitigation: Customers in Australia Southeast were not able to view the resources managed by Azure Resource Manager (ARM) either through the Azure Portal or programmatically due to a bug in the storage account which only impacted ARM service availability. A storage infrastructure configuration change as part of a new deployment resulted in an authentication failure. ARM system did not recognize the failed calls to the storage account and therefore automatic failover was not executed. Engineers rolled back the configuration change in the deployment to restore successful request processing. This action negated the need for manual failover of the ARM service.

Next steps: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to): 
1. Apply the mitigation steps to all the scale units [completed]
2. Release the fix to address the storage bug [completed]
3. Update alerts and processes to detect failed storage accounts [pending] 

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey:

9.4

Azure Active Directory B2C - Multiple Regions

Summary of impact: Between 19:57 and 22:05 UTC on 09 Apr 2018, customers using Azure Active Directory B2C in multiple regions may have experienced client side authorization request failures when connecting to resources. Customers attempting to access services may have received a client side error - "HTTP Error 503. The service is unavailable" - when attempting to login.

Preliminary root cause: Engineers have identified a recent configuration update as the preliminary root cause for the issue.

Mitigation: Engineers rolled back the recent configuration update to mitigate the issue. Some service instances had become unresponsive, and were manually rebooted so that they could pick up the change and the issue could be fully mitigated.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.

6.4

RCA - Azure Active Directory - Authentication Errors

Summary: Between 08:18 and 11:25 UTC on 06 Apr 2018, a subset of customers may have experienced difficulties when attempting to authenticate into resources with Azure Active Directory (AAD) dependencies, the primary impact being experienced for resources located in Asia, Oceania, and European regions. This stemmed from incorrect data mappings in two scale units which caused degraded authentication service for impacted customers, impacting approximately 2.5% of tenants. Downstream impact was reported by some Azure services during the impact period. Customers may have experienced for the following services:

Backup: Failures for the registration of new containers and backup/restore operations
StorSimple: New device registration failures and StorSimple management/communication failures
Azure Bot Service: Bots reporting as unresponsive
Visual Studio Team Services: Higher execution times and failures while getting AAD tokens in multiple regions
Media Services: Authentication failures
Azure Site Recovery: New registrations and VM replications may also have failed
Virtual Machines: Failures when starting VMs. Existing VMs were not impacted
We are aware that other Microsoft services, outside of Azure, were impacted. Those services will communicate to customers via their appropriate channels.

Root cause and mitigation: Due to a regression introduced in a recent update in our data storage service that was applied to a subset of our replicated data stores, data objects were moved to an incorrect location in a single replicated data store in each of the two impacted scale units. These changes were then replicated to all the replicas in each of the two scale units. After the changes replicated, Azure AD frontend services were no longer able to access the moved objects, causing authentication and provisioning requests to fail. Only a subset of Azure AD scale units were impacted due to the nature of the defect and the phased update rollout of the data storage service. During the impact period, authentication and provisioning failures were contained to the impacted scale units. As a result, approximately 2.5% of tenants will have experienced authentication failures.

Timeline:
08:18 UTC - Authentication failures when authenticating to Azure Active Directory detected across a subset of tenants in Asia-Pacific and Oceania.
08:38 UTC - Automated alerts notified Engineers about the incident in APAC and Oceania regions.
09:11 UTC - Authentication failures when authenticating to Azure Active Directory detected across a subset of tenants in Europe.
09:22 UTC - Automated alerts notified engineers about the incident in Europe. As part of the earlier alerts, Engineers already investigating.
10:45 UTC - Underlying issue was identified and engineers started evaluating mitigation steps.
11:21 UTC - Mitigation steps applied to impacted scale units.
11:25 UTC - Mitigation and service recovery confirmed.

Next steps: We understand the impact this has caused to our customers, we apologize for this and are committed to making the necessary improvements to the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
1. Isolate and deprecate replicas running the updated version of the data store service [Complete]
2. A fix to eliminate the regression is being developed and will be deployed soon [In Progress]
3. Improve telemetry to detect unexpected data movement of data objects to incorrect location [In Progress]
4. Improve resiliency by updating data storage service to prevent impact should similar changes occur in the data object location [In Progress]

Provide feedback: Please help us improve the Azure customer communications experience by taking our survey