Key causes of performance differences between SQL managed instance and SQL Server

5月 29, 2019 に投稿済み

Program Manager, Azure SQL Database

Migrating to a Microsoft Azure SQL Database managed instance provides a host of operational and financial benefits you can only get from a fully managed and intelligent cloud database service. Some of these benefits come from features that optimize or improve overall database performance. After migration many of our customers are eager to compare workload performance with what they experienced with on-premises SQL Server, and sometimes they're surprised by the results. In many cases, you might get better results on the on-premises SQL Server database because a SQL Database managed instance introduces some overhead for manageability and high availability. In other cases, you might get better results on a SQL Database managed instance because the latest version of the database engine has improved query processing and optimization features compared to older versions of SQL Server.

This article will help you understand the underlying factors that can cause performance differences and the steps you can take to make fair comparisons between SQL Server and SQL Database.

If you're surprised by the comparison results, it's important to understand what factors could influence your workload and how to configure your test environments to ensure you have a fair comparison. Some of the top reasons why you might experience lower performance on a SQL Database managed instance compared to SQL Server are listed below. You can mitigate some of these by increasing and pre-allocating file sizes or adding cores; however, the others are prerequisites for guaranteed high availability and are part of the PaaS service.

Simple or bulk recovery model

The databases placed on the SQL Database managed instance are using a full database recovery model to provide high availability and guarantee no data loss. In this scenario, one of the most common reasons why you might get worse performance on a SQL Database managed instance is the fact that your source database uses a simple or bulk recovery model. The drawback of the full recovery model is that it generates more log data than the simple/bulk logged recovery model, meaning your DML transaction processing in the full recovery model will be slower.

You can use the following query to determine what recovery model is used on your databases:

select name, recovery_model_desc from sys.databases

If you want to compare the workload running on SQL Server and SQL Database managed instances, for a fair comparison make sure the databases on both sides are using the full recovery model.

Resource governance and HA configuration

SQL Database managed instance has built-in resource governance that ensures 99.99% availability, and guarantees that management operations such as automated backups will be completed even under high workloads. If you don’t use similar constraints on your SQL Server, the built-in resource governance on SQL Database managed instance might limit your workload.

For example, there's an instance log throughput limit (up to 22MBs on the general purpose and up to 48MBs on the business critical tier) that ensures you can't load more data than the instance can backup. In this case, you might see higher INSTANCE_LOG_GOVERNOR wait statistics that don’t exist in your SQL Server instance. These resource governance constraints might slow down operations such as bulk load or index rebuild because these operations require higher log rates.

In addition, the secondary replicas in business critical tier instances might slow down the primary database if they can't catch-up the changes and apply them, so you might see additional HADR_DATABASE_FLOW_CONTROL or HADR_THROTTLE_LOG_RATE_SEND_RECV wait statistics.

If you're comparing your SQL Server workload running on local SSD storage to the business critical tier, note that the business critical instance is an Always On availability group cluster with three secondary replicas. Make sure that your source SQL Server has an HA implementation similarly using Always On availability groups with at least one synchronous commit replica. If you're comparing the business critical tier with a single SQL Server instance writing to the local disk, this would be an unrealistic comparison due to the absence of HA on your source instance. If you are using async always on replicas, you would have HA with better performance, but in this case you are making the trade-off between the possibility of data-loss in favor of performance, and you will get the better results on the SQL Server instance.

Automated backup schedule

One of the main reasons why you would choose the SQL Database managed instance is the fact that it guarantees you will always have backups of your databases, even under heavy workloads. The databases in a SQL Database managed instance have scheduled full, incremental, and log backups. Full backups are taken every seven days, incremental every twelve hours, and log backups are taken every five to ten minutes. If you have multiple databases on the instance there's a high chance there is at least one backup currently running.

Since the backup operations are using some instance resources (CPU, disk, network), they can affect workload performance. Make sure the databases on the system that you compare with the managed instance have similar backup schedules. Otherwise, you might need to accept that you're getting better results on your SQL Server instance because you're making a trade-off between database recovery and performance, which is not possible on a SQL Database managed instance.

If you're seeing unexpected performance differences, check if there is some ongoing full/differential backup either on the SQL Database managed instance or SQL Server instance that can affect performance of the currently running workload, using the following query:

SELECT r.command, query = a.text, start_time, percent_complete,
      eta = dateadd(second,estimated_completion_time/1000, getdate())
FROM sys.dm_exec_requests r
    CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) a

If you see currently running full or incremental backup during the short-running benchmark, you might pause your workload and resume it once the backup finishes.

Connection and App to Database proximity

The application accessing the databases and executing the benchmark queries on the SQL Database managed instance and SQL Server instance must be in a similar network proximity range in both cases. If you are placing your application and SQL Server database in the local environment (or running an app like HammerDB from the same machine where the SQL Server is installed) you will get better results on SQL Server compared to the SQL Database managed instance, which is placed on a distributed cloud environment with respect to the application. Make sure that in both cases you're running the benchmark application or query on separate virtual machines in the same region as SQL Database managed instance to get the valid results. If you're comparing an on-premises environment with the equivalent cloud environments, try to measure bandwidth and latency between the app and database and try to ensure they are similar.

SQL Database managed instance is accessed via proxy gateway nodes that accept the client requests and redirect them to the actual database engine nodes. In order to provide the results closer to your environment, enable ProxyOverride mode on your instance using Set-AzSqlInstance PowerShell command to enable direct access from the client to the nodes currently hosting your SQL Database managed instance.

In addition, due to compliance requirements, a SQL Database managed instance enforces SSL/TLS transport encryption which is always enabled. Encryption can introduce overhead in case of a large number of queries. If your on-premises environment does not enforce SSL encryption you will see additional network overhead in the SQL Database managed instance.

Transparent data encryption

The databases on SQL Database managed instance are encrypted by default using Transparent Data Encryption. Transparent Data Encryption encrypts/decrypts every page that is exchanged with the disk storage. This spends more CPU resources, and introduces additional latency in the process of fetching and saving the data pages to or from disk storage. Make sure that both databases on SQL Database managed instance and SQL Server have Transparent Data Encryption either turned on or off, and that database encryption/decryption operations have completed before starting performance testing.

You can use the following query to determine whether the databases are encrypted:

select name, is_encrypted from sys.databases

Another important factor that might affect your performance is encrypted TempDB. TempDB is encrypted if at least one database on your SQL Server or SQL Database managed instance is encrypted. As a result, you might compare two databases that are not encrypted, but due to some other SQL Database managed instance being encrypted (although it's not involved in the workload) the TempDB will also be encrypted. The unencrypted databases will still use encrypted TempDB and any query that creates temporary objects or uses spills would be slower. Note that TempDB will only get decrypted once all user databases on an instance are decrypted, and the instance restarts. Scaling a SQL Database managed instance to a new pricing tier and back is one way to restart it.

Database engine settings

Make sure the database engine setting such as database compatibility levels, trace flags, system configurations (‘cost threshold for parallelism’, ’max degree of parallelism’), database scoped configurations (LEGACY_CARDINALITY_ESTIMATOR, PARAMETER_SNIFFING, QUERY_OPTIMIZER_HOTFIXES, etc.), and database settings (AUTO_UPDATE_STATISTICS, DELAYED_DURABILITY) on the SQL Server and SQL Database managed instances are the same on both databases.

The following sample queries can help you to identify setting on SQL Server and Azure SQL Database managed instance:

select compatibility_level, snapshot_isolation_state_desc, is_read_committed_snapshot_on,

  is_auto_update_stats_on, is_auto_update_stats_async_on, delayed_durability_desc 
from sys.databases;

select * from sys.database_scoped_configurations;

dbcc tracestatus;

select * from sys.configurations;

Compare the results of these queries on the SQL Database managed instance and SQL Server and try to align the differences if you identify some.

Note: The list of trace flags and configurations might be very long so we recommend filtering them or lookng only on the trace flags you've changed or know are affecting performance. Some of the trace flags are pre-configured on SQL Database managed instance as part of PaaS configurations and they are not affecting performance.

You might experiment with changing the compatibility level to a higher value, turning on the legacy cardinality estimator, or the automatic tuning feature on the SQL Database managed instance, which might give you better results than your SQL Server database.

Also note that SQL Database managed instance might provide better performance even if you align all parameters because it has the latest improvements, or fixes that are not bound to compatibility level, or some features, like forcing last good plan, that might improve your workload.

Hardware and environment specification

SQL Database managed instance runs on standardized hardware with pre-defined technical characteristics that are probably different than your environment. Some of the characteristics you might need to consider when comparing your environment with the environment where the SQL Database managed instance is running are:

  1. Number of cores should be the same both on SQL Server and the SQL Database managed instance. Note that a SQL Database managed instance uses 2.3-2.4 GHz processors, which might be different than your processor speed. It might consume more or less CPU for the same operation due to the CPU differences. If possible, check if hyperthreading is used on the SQL Server environment when comparing to the Gen4 and Gen5 hardware generations on a SQL Database managed instance. One on Gen4 hardware does not use hyperthreading, while on Gen5 it does. If you are comparing SQL Server running on a bare-metal machine with a SQL Database managed instance or SQL Server running on a virtual machine you'll probably get better results on a bare-metal instance.
  2. Amount of memory including memory/core ratio (5.1 GB/core on Gen5, 7 GB/core on Gen4). Higher memory/core ratio provides bigger buffer pool cache and increases cache hit ratio. If your workload does not perform well on a managed interface with the memory/core ratio 5, then you probably need to choose a virtual machine with the appropriate memory/core ratio instead of a SQL Database managed instance.
  3. IO characteristics – You need to be aware that performance of the storage system might be very different compared to your on-premises environment. A SQL Database managed instance is a cloud database and relies on Azure cloud infrastructure.
    • The general purpose tier uses remote Azure Premium disks where IO performance depends on the file sizes. If you reach the log limit that depends on the file size, you might notice WRITE_LOG waits and less IOPS in file statistics. This issue might occur on a SQL Database managed instance if the log files are small and not pre-allocated. You might need to increase the size of some files in the general purpose tier to get better performance (see this Tech Community article Storage performance best practices and considerations for Azure SQL Managed Instance General Purpose tier).
    • A SQL Database managed instance does not use instant file initialization, so you might see additional PREEMPTIVE_OS_WRITEFILEGATHER wait statistics since the date files are filled with zero bytes during file growth.
  4. Local or remote storage types – Make sure you're considering local SSD versus remote storage while doing the comparison. The general purpose tier uses remote storage (Azure Premium Storage) that can't match your on-premises environment if it uses local SSD or a high-performance SAN. In this case you would need to use the business critical tier as a target. The general purpose tier can be compared with other cloud databases like SQL Server on Azure Virtual Machines that also use remote storage (Azure Premium Storage). In addition, beware that remote storage used by a general purpose instance is still different than remote storage used by a SQL Virtual Machine because:
    • The general purpose tier uses a dedicated IO resource per each database file that depends on the size of the individual files, while SQL Server on Azure Virtual Machine uses shared IO resources for all files where IO characteristics depend on the size of the disk. If you have many small files, you will get better performance on a SQL Virtual Machine, while you can get better performance on a SQL Database managed instance if the usage of files can be parallelized because there are no noisy neighbors who are sharing the same IO resources.
    • SQL Virtual Machines use a read-caching mechanism that improves read speed.

If your hardware specs and resource allocation are different, you might expect different performance results that can be resolved only by changing the service tier or increasing file size. If you are comparing a SQL Database managed instance with SQL Server on Azure Virtual Machines, make sure that you are choosing a virtual machine series that has memory/cpu ratio similar to SQL Database managed instance, such as DS series.

Azure SQL Database managed instance provides a powerful set of tools that can help you troubleshoot and improve performance of your databases, in addition to built-in intelligence that could automatically resolve potential issues. Learn more about monitoring and tuning capabilities of Azure SQL Database managed instance in the following article: