Trace Id is missing
Skip to main content

With a move to the cloud, IT operations team redesigns the infrastructure monitoring model

See how taking a decentralized approach to monitoring empowers both IT and business app teams to bring more value to their roles.

The challenge: Adapting to a cloud operating model

Back when the Manageability Platforms team, part of Microsoft Core Services and Engineering (CSEO), was managing infrastructure in a centralized monitoring and alerting environment, the model was clear: they provided a self-contained service that the business app teams consumed. Then Microsoft moved to the cloud, and the model no longer worked.

As the company shifted internal systems and apps to Azure, business app teams created their own virtual machines (VMs) and wanted full control over them, but the Manageability Platforms team was still accountable for monitoring and managing them. Attempts to divide up responsibilities didn't solve the problem. The Manageability Platforms team realized that with the move to the cloud, the best way to fulfill their mission of enabling the development teams was to get out of the day-to-day monitoring business and champion a new, decentralized monitoring model.

Two people having a conversation

The challenge: Adapting to a cloud operating model

"The move to Azure redefined the relationship between business app teams and us … It literally broke how we worked, including our accountability model, and we had to redesign our support services to accommodate the new self-service cloud model."

Dana Baxter, Principal Service Engineer, Manageability Platforms

Changing the culture of control

Initially, the Manageability Platforms team tried creating its own pool of Azure subscriptions for the business app teams to use. They quickly found that were still a bottleneck for teams that wanted to administer things on their own. With automation and self-service capabilities becoming available in Azure Monitor, the Manageability Platforms team saw the opportunity to replace their centralized model and hand over responsibility for monitoring to the business app teams. At the same time, they knew they'd have to drive a challenging cultural shift to overcome resistance.

But first, they had to make sure that the operations part of the new DevOps model was in order. They cleaned up a mess of old alerts from about 100 down to 15 and then created a toolkit on GitHub to help the business app teams monitor their own infrastructure. The toolkit established guardrails that helped Manageability Platforms ease their own discomfort with having to relinquish control over something they'd owned for years. Their final push involved a major, multifaceted communication and training effort across the organization.

The challenge: Adapting to a cloud operating model

"Our KPIs used to be all about alerts, trouble tickets, time to resolution, and so on. Today they're around things like inventory, security patching, compliance, and other components of enterprise manageability."

Dana Baxter, Principal Service Engineer, Manageability Platforms

Decentralized monitoring empowers both teams

The transition to a decentralized, self-service approach to enterprise monitoring and reporting wasn't easy, but it was worth the effort. Now, reporting and dashboard tools that are enabled by Azure Monitor and Power BI make it easy for business app teams to monitor any part of their environment. With the ability to quickly tailor their own dashboards and alerts to align with how they build and manage their apps, they configure the monitoring environment that best meets their needs. Instead of providing a day-to-day monitoring service that the development teams consume, today the Manageability Platforms team members have become valued consultants in their partnership with development. Most importantly, they're free to focus on more strategic, forward-looking projects—such as security patching, inventory, and compliance—that bring more value to the business.

Take a closer look at the journey the team took to get to a cloud operating model.
Read the full story