Performance in the new Azure SQL Database Service Tiers

在 五月 19, 2014 上貼文

Principal Group Program Manager, US-DS-PM
On April 24th we introduced new service tier previews, Basic and Standard, for SQL Database as well as new business continuity features. In this blog post we will take a closer look at the new performance approach in SQL Database. Let’s start with the need for change. The focus on performance, and specifically predictable performance in the new service tiers, is driven by strong customer feedback on SQL Database performance in Web and Business. The performance in Web and Business tiers has historically been unpredictable and opportunistic, which creates problems for customers running business-critical applications. We understand from customers that predictable performance is critical. Customers tell us it is too hard to build a system that can reliably execute on a platform if you cannot rely on the platform’s performance over time. As a response to this feedback, we introduced Basic and Standard to join Premium, which has been in preview since July 2013 and primarily focused on delivering predictable performance. Customer feedback on Premium performance has been overwhelmingly positive; Basic and Standard extend this benefit to customers at additional price points. Customers now have performance as a lever they can control and this aligns with the cloud computing industry. Giving customers this lever is the primary driver to move from the Web & Business service model of “price/storage” to the new model of “price/performance plus features.” As you test your applications in the new service tiers please keep in mind that we are currently in preview, which includes continuous fine-tuning of the new performance experience. We use previews to listen closely to your feedback and we will use this feedback to polish the new service tiers. If you run performance tests please plan to re-run these periodically to make sure you have the best understanding of the various performance levels available across the new service tiers. We will do our best to post incremental performance blogs as we progress through the previews. In fact, based on customer feedback we have just made improvements to the throughput in our Basic and Standard tiers which is coming online during the week of Monday May 19th:
  • Basic: 1 DTU –> 5 DTU
  • S1: 5 DTU –> 15 DTU
  • S2: 25 DTU –> 50 DTU

What do we mean by Performance?

Database performance is a complex subject where a large number of metrics are commonly used to describe the state of affairs. From an application perspective what users really care about are:
  • Response time
    • The time elapsed between the start and end of a specific query execution.
  • Throughput
    • The total work or execution rate that the system can run at any point in time. A throughput rate is typically only regarded as valid if the individual executions are finishing within a specified response time.
There are many factors that affect response time and throughput but it is mainly split into two categories, application design and hardware resources. Below are some example questions that are often asked when performance troubleshooting a database application:
  • Application design
    • Have the right set of indices been created to provide the best possible platform for queries to execute?
    • Is the right transaction isolation level used, is the system unnecessarily waiting for locked resources?
    • Are round-trips minimized by for example using multi-statement batches or stored procedures?
  • Hardware Resources
    • Are there enough resources available to execute the workload?
    • How much resources do I need for different types of workloads that happen over time?
With regards to performance, the new SQL Database service tiers can help resolve problems related to hardware resources by allowing customers to upgrade to a performance level that provides more hardware resources. If you are using a scale-out design pattern (sharding) you can use the new performance levels to vary the resources provided to different shards depending on your workload distribution (for example assign a higher performance level to a shard containing a highly active customer). We are actively working on detailed sharding guidance to help you with this—stay tuned! In addition, a cloud-design best practice is to continue optimizing the application design. These investments can help reduce the need for using a higher performance level and reduce the cost to run the database workload.

What about Hardware Specs?

The resources required to run a database workload boil down to various types of hardware resources including CPU, disk IO operations (reads/writes), and memory. Within each of these resource dimensions there are tremendous differences. For example different types of cores have different clock cycles and cache sizes, IOs are performed in different sizes depending on what the database system is doing. Trying to estimate how much of each of these resources your application actually needs as well as comparing your utilization over time can be a difficult and time-consuming task. In an on-premises world this is an accepted consequence of the control you have over the hardware on which the database is run. The attraction of moving to the cloud, and particularly using SQL Database, is that you can focus on building great applications without needing to manage hardware or patch and maintain database software. Customers have told us they don’t want the hassle, they just want throughput when they need it. This is the vision with our new focus on performance. With the new service tiers, we have applied a new design principle which helps ensure predictable performance. Each performance level (Basic, S1, S2, P1, P2, P3) corresponds to the allocation of a set of resources (e.g., CPU, memory, IO). The maximum amount of resources a database can use within each level is defined. The design principle is that the performance of a database should be approximately what it would achieve if it were running on a dedicated computer with the same set of resources as provided for its performance level. This design principle is fundamental to delivering predictable performance. To help understand the difference in resources across levels, we have introduced the concept of a Database Throughput Unit (DTU). The DTU is a blended measure of the resources allocated at each performance level. For example, P1 has a DTU rating of 100 and S1 has a DTU rating of 15, i.e., S1 has ~1/7th of the resources of a P1. Following the design principle described above, you can expect that a database running on S1 will have a level of performance that is similar to a what it would get if it were running on a dedicated computer with a CPU that is ~1/7th as fast as a P1 with ~1/7th as much memory and ~1/7th as much IO capacity. 1 From a user and troubleshooting perspective, we expose the percent of the current performance level that your database workload is using in each resource dimension, removing the need to understand things such as different types of IOs, disk queue lengths etc. This makes it easy to understand your resource consumption relative to your current performance level. We expose telemetry for your usage across CPU, reads & writes in the sys.resource_stats view in the master database, memory will be added later in the preview. This view contains 5 minute averages (total consumption / 5 minutes) for each dimension and is helpful to gauge your resource requirements over time. This telemetry is also available through the portal. For more details on querying sys.resource_stats as well as estimating DTUs used you can read this MSDN article which provides information regarding the new service tiers. We know providing a great telemetry story for performance troubleshooting is important and you can expect us to make further enhancements to this during the preview. clip_image002_thumb.png

Database Workload Lifecycle

The act of “using a database” is not a single action, it is comprised of many different types of actions. This may sound obvious but it is important when you calculate which performance tier is right for your database and consequently what you spend to run your database workload. The answer may not be as simple as “Standard S2 @ $200/month” (using GA prices). One of the great things that the cloud and especially SQL Database brings us is the ability for elastic scale. Consider the following, “Use Basic, Mon-Thu and Sat & Sun for sparse queries, Fridays use S2 to manage running larger batch jobs within a specified time window and one day a month use P1 to quickly perform a large ETL operation:
  • (Basic price / 30 * 25) = $4.16
  • (S2 price / 30 * 4) = $26.7
  • (P1 price / 30 * 1) = $31 $61.86/month
Like the example above the workload life cycle for many databases will contain different sub-workloads with varying requirements, for example:
  • Initial load of the database (for example, when you make the move to SQL Database)
  • Normal usage (for example, user activity of a single department using the app during normal working days)
  • Peak usage (for example, once a month handling reporting for all employees in the entire company)
  • Large Extraction Transformation & Load (ETL or Import/Export) operations
  • Physical database maintenance operations (such as creating indexes)
If you categorize your workload into different requirements you may be able to significantly reduce your bill by not always running in the performance level required for your peak loads. Your application code can change the performance level when needed through a simple Update Database API call specifying a new service objective. The change itself is an asynchronous operation and you can use the Get Database API to monitor the status of the change. Your database will be online and available throughout the change.

Web & Business vs. Basic, Standard & Premium

A common question we get is how performance in the new service tiers compares to Web & Business. This is hard to answer in a straightforward way. The challenge is that the Web & Business and the new service tiers are fundamentally different. In Web & Business the service model is only based on the amount of storage used and does not take any other hardware resources into account. The system is optimized to guarantee the availability of storage and not any other resources used to execute a database workload which is fundamentally flawed. The performance level and availability of resources in Web & Business depends on various factors like other customers’ workloads on the same machine as well as the system protecting itself from over-utilization. It is a gamble whether you will be able to utilize these resources or not and this problem would become more pronounced over time if we weren’t addressing it with the new tiers. Like I said in the beginning, customers have clearly told us that this unpredictable and varying performance experience is not what they want. The new service tiers Basic, Standard and Premium, focus on providing a high level of predictability across multiple price points. This means that the system is optimized to provide not only storage but also the other hardware resources required to execute the database workload. The fact that these service tiers run on fundamentally different design principles makes the comparison one of apples vs. oranges. To help get some insight into Web & Business resource consumption, we provide the percent of resources used in sys.resource_stats (mentioned above). The reference point in the telemetry for the resource percentage calculation for Web & Business is set to ½ of a Standard S2 as this is a common price-point for W/B databases today. Please note, this is telemetry reference point only and not an indication of the amount of resources available in Web & Business which, as mentioned above, is undefined. However, to get a precise understanding of which performance level your databases needs you should upgrade it to one of the new service tiers, run your workload and watch sys.resource_stats. The current limitation that Basic & Standard service tiers are only available in new logical servers will be removed over the course of preview and this will reduce the need to import your Web and Business databases to test the new service tiers.


We hope that this blog post has given some more insights into performance in the new service tiers. As mentioned in the introduction, we are evaluating feedback as well as the quality of our performance objectives throughout the preview and we will be making adjustments to the performance levels as we learn and gather feedback. Here is some additional reading: Thank you for using SQL Database and trying out the previews! /Tobias & the SQL Database team