Azure Site Recovery, Best practices, Management and Governance

Monitoring enhancements for VMware and physical workloads protected with Azure Site Recovery

By Mayuri Gupta Program Manager II, R&D Compute MDR IDC (Hyd)

Monitoring enhancements for VMware and physical workloads protected with Azure Site Recovery • 2 min read

Posted on May 1, 2019
2 min read

Azure Site Recovery has enhanced the health monitoring of your workloads by introducing various health signals on the replication component, Process Server. The Process Server (PS) in a hybrid DR scenario is a vital component of data replication. It handles replication caching, data compression, and data transfer. Once the workloads are protected issues can be triggered due to multiple factors including high data change rate (churn) at source, network connectivity, available bandwidth, under provisioning the Process Server, or protecting large number of workloads with a single Process Server. These may lead to bad state of the PS and have a cascading effect on replication of VMs.

Troubleshooting these issues is now made easier with additional health signals from the Process Server. It is quick to identify which Process Server is being used by a virtual machine, and easy to relate the health between the two. Notifications are raised on multiple parameters of PS – free space utilization, memory usage, CPU utilization, and achieved throughput. Both warning and critical alerts are released so that action can be taken at the right time. This helps users avoid running into large scale issue which may impact multiple machines connected to a PS.

View of the PS blade

Warning and critical events are raised as per the below thresholds set by Azure Site Recovery. Supplemental alerts include issues related to PS services and PS heartbeat. On the portal all these health events are collated on PS blade for deep dive monitoring with up to 72 hours of data points in the events table. Note that throughput is measured in terms of achievable RPO.

Parameter	Warning Threshold	Critical Threshold
CPU utilization	80%	95%
Memory usage	80%	95%
Free Space	30%	25%
Achievable RPO	>30 mins	>45 mins

A clear relation between the PS and its replicated items is established on the replicated item blade. This helps in faster issue identification and resolution for ongoing replication.

A view of the replicated item blade.

All these health signals roll up to consolidated Process Server health. This visible parameter helps in choosing a PS when new machines need to be protected, or when load balancing between existing PSes is required. At the time of Process Server selection the warning health status deters the user’s choice by raising warning, while critical health completely blocks the PS selection. The signals are powerful as the scale of the workloads grows. This guidance ensures that the apt number of virtual machines are connected to a Process Server, and that related issues can be avoided.

Enable Replication Workflow with Healthy Process Server (Left) and with Critical Process Server (Right)

Process Server health signals for CPU utilization, memory usage and free space are available from 9.24 version onwards. Throughput related alerts will be available in the subsequent releases.

Monitoring enhancements for VMware and physical workloads protected with Azure Site Recovery

Related links and additional content

Explore

Related posts

Expanding our DR scenarios to new zonal capabilities with Azure Site Recovery

Ahorre costos y maximice el valor con las últimas innovaciones de la infraestructura de Azure

Minimize disruption with cost-effective backup and disaster recovery solutions on Azure

Ocho maneras de optimizar los costos y maximizar el valor con la infraestructura de Microsoft Azure

Join the conversation

Destacadas

IA y Machine Learning

Análisis

Compute

Contenedores

Bases de datos

DevOps

Herramientas para desarrolladores

Híbrido y multinube

Identidad

Integración

Internet de las cosas

Administración y Gobernanza

Multimedia

Migración

Realidad mixta

Movilidad

Redes

Seguridad

Almacenamiento

Web

Windows Virtual Desktop

Casos de uso

Desarrollo de aplicaciones

Inteligencia artificial

Migración y modernización en la nube

Datos y análisis

Nube e infraestructura híbridas

Internet de las cosas

Seguridad y gobernanza

Tipo de organización

Recursos

Related links and additional content

Explore

Related posts

Join the conversation