343 Industries, the studio responsible for the Halo franchise, is passionate about delivering phenomenal gaming experiences to its players. To do so, Halo 5: Guardians uses extensive telemetry from both servers and consoles to track service quality in such areas as matchmaking quality, network latency and other statistics in order to ensure an optimal game experience.
As you can imagine, these types of events can generate massive amounts of data. For a highly scalable publish-subscribe (pub-sub) service, 343 Industries chose Azure Event Hubs which could scale to meet the sizeable throughput generated by the tens of billions of telemetry events each day.
With a telemetry pipeline in place using Azure Event Hubs, the Halo team needed a way to unlock insights from their telemetry through dashboard visualizations and real time log search. Getting the data is only a part of the problem; having the ability to search through billions of logs to determine root cause of a reported game session problem or to visually gauge the extensiveness of service interruption is another.
Enter Log Analytics (part of Microsoft Operations Management Suite)
Xbox consoles generate all kinds of telemetry about games, sessions, etc. Halo 5’s dedicated servers and backend services also generate substantial data. 343 Industries needed a way to correlate events across all the myriad sources (dedicated servers, consoles and backend services) to get a complete view of what was taking place during a game, in order to troubleshoot any reported issues. Log Analytics, a component of Microsoft Operations Management Suite, natively supports log search and data visualizations across various data sources, but didn’t have the ability to ingest from Azure Event hubs at the time.
Microsoft Log Analytics met with 343 Industries and Halo 5’s requirements were as follows:
- Halo 5 needed near-real-time log search ability. 343 Industries wanted to limit the time from when an event was generated to when it is searchable in Microsoft Log Analytics to under a minute.
- Scale to support Halo 5’s the tens of billions of telemetry events generated daily
- Must support Azure Event hubs
- Support Microsoft Bond-based event message format. Bond is a framework that supports serialization and deserialization for high-scale services.
To meet these requirements and take advantage of Azure Event hub’s pub-sub model, Microsoft Log Analytics’ utilizes the Event Hub Ingestion component.
Overview of Microsoft Log Analytics Event Hub Ingestion service
The following diagram illustrates the data flow of events from their source location to Microsoft Log Analytics.
- Bond-based telemetry event data from consoles and game servers sent to Azure Event hub(s)
- Event hub Ingestion Service subscribes to Azure Event hub’s consumer groups and listens for incoming event messages
- Pulls event message schema from Halo 5’s blob schema store. The event message schema was stored in Azure Blob store and not included in the event message. This option was chosen to keep the event message as compact as possible to conserve client band-width
- The event message data is then posted to and ingested by Microsoft Log Analytics Search cluster. If for any reason the Event hub ingestion service is unable to successfully post, the event message data will be sent to Retry Queue. Queued retry event messages will then reattempt to post the data to Search cluster
This is pretty cool, why should it matter to me?
We wrote this post for a couple of reasons. First, we wanted to showcase the story of tandem Azure-based PaaS and SaaS services working in conjunction to deliver an end-to-end operational intelligence solution needed to support one of THE most popular game franchises.
Secondly, we wanted to illustrate that Microsoft Log Analytics can meet your log search, statistical metrics aggregation and data analysis needs, by allowing you to visually plot and track any service impacting issues at a massive scale.
Finally, while the Microsoft Log Analytics’ Event Hub Ingestion service was built specifically to support the Halo 5: Guardians release, we are releasing a more generalized solution so that Microsoft Log Analytics customers can take advantage of the ability to ingest all types of data programmatically. Follow me on Twitter @jochan_msft for more updates.