Azure Site Recovery, Best practices, Management and Governance

Monitoring enhancements for VMware and physical workloads protected with Azure Site Recovery

By Mayuri Gupta Program Manager II, R&D Compute MDR IDC (Hyd)

Monitoring enhancements for VMware and physical workloads protected with Azure Site Recovery • 2 min read

Posted on May 1, 2019
2 min read

Azure Site Recovery has enhanced the health monitoring of your workloads by introducing various health signals on the replication component, Process Server. The Process Server (PS) in a hybrid DR scenario is a vital component of data replication. It handles replication caching, data compression, and data transfer. Once the workloads are protected issues can be triggered due to multiple factors including high data change rate (churn) at source, network connectivity, available bandwidth, under provisioning the Process Server, or protecting large number of workloads with a single Process Server. These may lead to bad state of the PS and have a cascading effect on replication of VMs.

Troubleshooting these issues is now made easier with additional health signals from the Process Server. It is quick to identify which Process Server is being used by a virtual machine, and easy to relate the health between the two. Notifications are raised on multiple parameters of PS – free space utilization, memory usage, CPU utilization, and achieved throughput. Both warning and critical alerts are released so that action can be taken at the right time. This helps users avoid running into large scale issue which may impact multiple machines connected to a PS.

View of the PS blade

Warning and critical events are raised as per the below thresholds set by Azure Site Recovery. Supplemental alerts include issues related to PS services and PS heartbeat. On the portal all these health events are collated on PS blade for deep dive monitoring with up to 72 hours of data points in the events table. Note that throughput is measured in terms of achievable RPO.

Parameter	Warning Threshold	Critical Threshold
CPU utilization	80%	95%
Memory usage	80%	95%
Free Space	30%	25%
Achievable RPO	>30 mins	>45 mins

A clear relation between the PS and its replicated items is established on the replicated item blade. This helps in faster issue identification and resolution for ongoing replication.

A view of the replicated item blade.

All these health signals roll up to consolidated Process Server health. This visible parameter helps in choosing a PS when new machines need to be protected, or when load balancing between existing PSes is required. At the time of Process Server selection the warning health status deters the user’s choice by raising warning, while critical health completely blocks the PS selection. The signals are powerful as the scale of the workloads grows. This guidance ensures that the apt number of virtual machines are connected to a Process Server, and that related issues can be avoided.

Enable Replication Workflow with Healthy Process Server (Left) and with Critical Process Server (Right)

Process Server health signals for CPU utilization, memory usage and free space are available from 9.24 version onwards. Throughput related alerts will be available in the subsequent releases.

Monitoring enhancements for VMware and physical workloads protected with Azure Site Recovery

Related links and additional content

Explore

Related posts

Expanding our DR scenarios to new zonal capabilities with Azure Site Recovery

Générez des économies et optimisez la valeur grâce aux nouvelles innovations liées à l’infrastructure Azure

Minimize disruption with cost-effective backup and disaster recovery solutions on Azure

Huit façons d’optimiser les coûts et de maximiser la plus-value avec l’infrastructure Microsoft Azure

Join the conversation

Sélection

IA + Machine Learning

Analyse

Calcul

Conteneurs

Bases de données

DevOps

Outils de développement

Hybride + multicloud

Identité

Intégration

Internet des Objets

Gestion et gouvernance

Données multimédias

Migration

Réalité mixte

Mobile

Mise en réseau

Sécurité

Stockage

Web

Bureau virtuel Windows

Cas d'utilisation

Développement d’applications

IA

Migration et modernisation cloud

Données et analyse

Cloud hybride et infrastructure

Internet des Objets

Sécurité et gouvernance

Type d’organisation

Ressources

Related links and additional content

Explore

Related posts

Join the conversation