This post was authored by Matías Quaranta, Azure MVP, Autocosmos.
In this blog post, I’m going to show you how I migrated from ELK to Azure Log Analytics and lowered our operation costs by more than ninety percent and reduced our maintenance time.
Background
The need for logging is probably as old as computers and its importance has grown hand in hand with the complexity of distributed architectures.
It is common nowadays for applications and platforms to span multiple servers, service instances, languages and even technologies. Keeping the status and logs of every part becomes a challenge.
In Autocosmos we work entirely on Azure, our whole architecture runs on a myriad of different Azure services, ranging from the most common ones like Azure App Services, Azure Redis Cache and Azure SQL Database to Azure Search, Azure DocumentDB and Microsoft Cognitive Services. We are a technology team that focuses on creating and deploying the best products and platforms for the Automotive and Car enthusiasts in Latin America by leveraging Azure’s SaaS and PaaS offerings.
A few years ago when we decided to implement a centralized logging platform, there weren’t a lot of options that could manage our diverse log output and structure, so we ended up implementing an ELK (Elasticsearch-Logstash-Kibana) Stack running on Azure Virtual Machines. I have to admit, it worked quite well and we were able to absorb JSON logs through Logstash and visualize through Kibana.
Pain points
Like I mentioned, we are a focused technology team of developers and engineers, and even though the ELK Stack worked, we were forced to maintain the virtual Linux environments, watch for Elasticsearch cluster health and a lot of tasks that we, as developers, really didn’t care for. We felt it was becoming a time sink to maintain our logging architecture and it wasn’t truly adding value to our products.
Adding to that, Azure VMs are an IaaS offering, which makes any kind of scaling and network balancing a tedious and complex task for a developer. And all this effort just for our logging architecture.
Who would have thought that a videogame would open the door for a solution to our problems?
Unexpected opportunities
After reading how Halo 5 managed its logging pipeline, we got in touch with the amazing team behind Azure Log Analytics and shared with them our current scenario.
For those not familiar with Azure Log Analytics, it’s a service part of Microsoft Operations Management Suite but has a separate pricing (including a free tier) and allows for collection, storing and analysis of log data from multiple sources, which includes Windows and Linux environment sources (on-premises or cloud).
To our surprise, they were already working on an implementation that would allow for custom log ingestion that wasn’t exclusively coming from a declared agent or source, through an HTTP API and using JSON objects.
This was exactly the solution we needed! We were already using JSON for our ELK Stack, so it was as easy as redirecting our log flow to Azure Log Analytics HTTP Data Collector API. Our whole architecture started directing its logs, taking advantage of the freedom of HTTP and JSON and the fact that Azure Log Analytics parses and understands each JSON attribute separately, we could just send our Front End logs, our DocumentDB logs, Cognitive Services logs. We could even send logs from inside Azure Functions, though they were different sources with different information, it just worked!
It became the backbone of all our operational insights.
Implementing JSON logging by HTTP
Every service and application that is part of our solution is running ASP.NET Core, so we created a .Net wrapper for the Azure Log Analytics HTTP Data Collector API in the form of a Nuget Package open to contributions on a GitHub public repository. This wrapper lets us send JSON payloads from any part of our architecture and even supports object serialization.
We use logging for post-mortem analysis, achievable by directly sending the Exception when an internal call on our APIs or Webjob function fails and then using the search experience in Azure Log Analytics to find the root of the problem:
Performance can be also measured with logs; we track our DocumentDB Request Units used on each call (using the exposed headers) to measure how our hourly quota is used and alert us if we are reaching a point where we need to scale:
Search experience
Azure Log Analytics is not just a log storage, it includes a powerful Search feature that lets you delve deep into your log data with a well-documented query syntax and shows each JSON attribute parsed and filterable:
You can even create custom Dashboards with widgets and tiles of your choice and build the visualizations best-suited for your scenario:
Finally, you can configure Alerts based on Searches and time frames and tied them with Webhooks, Azure Automation Runbooks or Email notifications.
Conclusion
Implementing Azure Log Analytics meant that not only we reduced our maintenance time, we also lowered our operation costs (no more VMs!) by more than ninety percent. All our time now devoted to what matters most for our business, creating and building the best products and platforms for Car enthusiasts in Latin America.