Best Practices for Designing Large-Scale Services on Windows Azure

Editor’s Note: Today’s post comes from Jason Roth, Principal Programming Writer. He provides an overview of a new whitepaper from our Customer Advisory Team, covering best practices for designing large-scale services on Windows Azure.

We recently released a new white paper: Best Practices for the Design of Large-Scale Services on Windows Azure Cloud Services. This paper is a compilation of design patterns and guidelines that are based on actual customer engagements. It pulls together the best strategies and design patterns that have consistently proven successful for real-world Windows Azure applications.

First Understand the Platform

As you read through the paper, you’ll notice that there are three main sections:

  • Design Concepts
  • Exploring Windows Azure
  • Best Practices

You might be tempted to skip to the best practices directly, but you should be aware that those best practices derive from the information in the first two sections. Every application is unique. It is important to first understand the Windows Azure platform and its general design principles. This helps both in selecting the right optimizations as well as in achieving the correct implementation.

Good Design — Worth the Effort

Any large-scale application design takes careful thought, planning, and potentially complex implementation. For Windows Azure, one of the most fundamental design principles is scale-out. Rather than invest in increasingly more powerful (and expensive) hardware, a scale-out strategy responds to increasing demand by adding more machines or service-instances.

Many of the best practices involve achieving scale-out for each Windows Azure service. For example, in Windows Azure, it is not possible to scale-up the server that is running your SQL Database. Instead you have to design your application to be able to make use of additional SQL Database instances. This involves some type of partitioning strategy for your data.

Of course, the challenge is to pick the right partitioning strategy and to coordinate work between partitions successfully. This paper attempts to provide you both with the technical understanding of the choices you’re making as well as practical suggestions that have worked with past customer scenarios.

Note that SQL Database is just a very obvious example where partitioning improves scalability. But to maximize the strengths of the platform, other roles and services must scale out in a similar way. For example, storage accounts have an upper bounds on the rate of transactions, virtual machines have an upper bounds on CPU and memory; maximum scale is achieved by designing for the use of multiple storage accounts and for services whose components scale out across virtual machines of set sizes.

Although scalability is a driving force behind design, there are other critically important design considerations. The paper stresses that you must plan for telemetry and diagnostic data collection, which becomes increasingly important as your solution becomes more componentized and partitioned. Availability and business continuity are two other major areas of focus throughout the paper. Scalability is irrelevant when your service goes down or irretrievably loses data.

Best Practices & Platform Evolution

Windows Azure is constantly evolving, improving, and adding new services. In recent releases, there have been new features, such as Windows Azure Virtual Network and Infrastructure as a Service (IaaS). These new capabilities provide even more options for large-scale applications. However, this paper focusses on version 1.6 and does not cover some of the latest additions to the platform.

To understand the reason for this decision, you have to re-examine the goal for this work. This paper intends to provide design guidance that has succeeded in real customer implementations. As these engagements can take months to plan, test, and iterate, it will take some time before the paper can be updated with some of the newer services and capabilities. But all of the design principles in the paper are still applicable, and the same type of thinking can be applied to any of the new capabilities of Windows Azure.

Going forward, we are working on additional papers, code examples, and samples that demonstrate how to practically implement some of these best practices.

Not a Checklist

Everyone loves checklists. The thought is: if you can check all of the boxes, then you know that you will be successful. With Best Practices for the Design of Large-Scale Services on Windows Azure Cloud Services, try not to see the information as a checklist. Your application is unique. Perhaps, at this moment, your application is at a “medium-high” scale. It is possible that once you understand the platform and best practices, only some of the recommendations will be critical for you in the short-term. But look ahead, and plan for the possibility that you might require some or all of the other design strategies in the future.

Check out the whitepaper and use the comments section below to share your feedback.