How cloud speed helps SQL Server DBAs

Data di pubblicazione: 12 dicembre, 2017

Partner Architect, SQL

A few years ago, the Microsoft SQL Server product team introduced a new cloud Platform-as-a-Service (PaaS), Azure SQL Database, which shares the SQL Server code base. Running a cloud-first service required significant changes to the legacy SQL Server engineering model which took years of investment in order to fully enable. With these engineering model changes came big benefits which positively impacted both Azure SQL Database and SQL Server.

Even if you are a SQL Server database administrator who isn’t using Azure SQL Database today, you’ll still be seeing benefits from Microsoft’s investments in the cloud. This blog post will review how engineering model transformations, driven by cloud requirements, resulted in several improvements in how we build, ship and service SQL Server. 

Features arrive faster

In the earlier days of SQL Server (2005 through 2012), SQL Server had roughly three-year long engineering cycles. For each planned release of SQL Server, a significant amount of planning would go into the up-front design, using a waterfall-like software development process coordinated across different teams. This included the generation of functional specification documentation by program managers, design specifications by developers and automated testing code developed by testers. 

Once SQL Server finally shipped, customers could take years to upgrade or adopt the associated new features. With this legacy engineering model and the extended periods between the original planning and actual customer adoption, it would take years overall to understand if the feature “landed” properly and met the needs of the originally intended scenario. For any new feature being discussed, the SQL Server engineering team had to think several years ahead of the market. 

The development of a Platform-as-a-Service, Azure SQL Database, shifted the SQL Server engineering team’s focus into working within significantly shorter time frames. The SQL Server engineering team made a few key changes in order to make this happen:

  • The build and test loop for SQL Server was automated using thousands of machines to run tests in parallel. Tests are run all the time. This took the build and test process from weeks in the legacy engineering model to hours for the average case in the new model.
  • The SQL Server engineering team realized that the original SQL Server code surface area was very large and thus difficult to deploy in its monolithic state. Therefore, the team looked for ways to break apart the architecture into overall smaller micro-services wherever possible. This change in architecture allowed separate deployments and servicing for each component.
  • Features are now required to be built more incrementally and delivered via monthly Community Technical Previews (CTPs).

These changes allowed us to shorten the release cycle and get features to Azure SQL Database and SQL Server faster than ever before. Most recently, the SQL Server engineering team managed to ship SQL Server 2017 along with new cross-platform support only 15 months after the release of SQL Server 2016. Contrast this with the legacy three to five-year SQL Server shipping cycles of the past. 

Features are tested sooner and require customer validation

Leading up to the general availability of SQL Server 2017, Microsoft provided monthly community technical previews (CTPs) that gave the public early and ongoing access for testing new features. The ability to provide production-quality builds was very much driven by the continuous release process used in Azure SQL Database today. This early access to new features also resulted in early customer feedback that in turn was immediately used by the SQL Server engineering team. The SQL Server engineering team requires testing of features by customers during the engineering cycle in order to measure whether the features are working correctly before they are declared complete.

Features ship when ready

In older versions of SQL Server, some improvements risked being “crammed” into a SQL Server release prior to shipment.  This was because if you missed a shipment window, the feature may not light up in the product for another three years.  Now with the ability of the SQL Server engineering team to frequently ship production-quality releases this is no longer a problem (or temptation). If a feature isn’t ready, it is held back until the issues are addressed and then the feature is included in a later monthly release. 

Feature development is iterative

Planning is still a critical part of the engineering process, but the SQL Server engineering team now does it a bit differently. For a new feature idea, architects, program managers and engineering managers on the team look at a problem space and then explore basic solutions that can be shipped within a reasonable timeframe. Any proposed feature must also have associated key customers to help justify the effort and engineering funding. The SQL Server engineering team then iterates over the customer needs and then releases versions until finished and ready to ship. 

An example of this is the automatic plan correction feature, which is a new automatic tuning feature in SQL Server 2017 that identifies plan regressions and fixes them by applying a previous good plan. This feature was first deployed to Azure SQL Database and had a significant amount of real-world testing by internal customers and opt-in private preview customers. This resulted in a high volume of feedback and several changes before it ultimately shipped in SQL Server 2017 in its current form.

Feedback from early versions of the Minimum Viable Product (MVP) are used to refine (or reset) what is built. Customer feedback is used throughout the engineering cycle and is used to justify new features and changes to existing features. With more agility, we can release both small and large features that provide incremental value and not just focus on “marketing big box” features.

Friction-free upgrades

Azure SQL Database is upgraded on an ongoing basis. Fixes are streamed over time across branches, and these upgrades can include bug fixes and new improvements. With this continual change, there was a requirement to make automatic upgrades as seamless and friction free as possible. As a result, the SQL Server engineering team stopped deprecating and removing most features, instead moving to a policy of maintaining backward compatibility to allow seamless and silent upgrades. This deprecation policy applies both to SQL Server and Azure SQL Database. 

The SQL Server engineering team seeks to maintain backwards compatibility as a strong goal so that the service can upgrade transparently. If there is a case where compatibility is broken (such as a change required to maintain security in the service), engineering will proactively reach out to impacted customers and find workarounds with them or use the compatibility level to enable customers to test their application sufficiently and opt-in to the new code when ready.

On the subject of compatibility level, any new query execution plan-affecting features and fixes accrue under the next database compatibility level. The intent is to minimize regression risk. For example, the new adaptive query processing feature family introduced in SQL Server 2017 requires compatibility level 140 or higher. Upgrading to SQL Server 2017 maintains your existing user database’s compatibility level until you explicitly decide to change it. If an engine feature isn’t plan-affecting (SQL Graph, for example), we don’t tie it to the compatibility level and it can surface automatically as an available feature without having to be explicitly enabled.

Additionally, features like automatic plan correction and Query Store can act as a backstop and insurance policy post-upgrade, enabling a quick and efficient way to handle regressions.

Telemetry drives quality

The SQL Server engineering team makes use of telemetry to:

  1. Identify candidate-scenarios for future improvements
  2. Measure feature adoption
  3. Improve the product quality by surfacing issues more quickly

All telemetry data is generalized and scrubbed to protect customer data and this telemetry is then used to manage the service at scale.  With millions of databases in Azure SQL database and zero operational staff or DBAs, ~600 TBs of telemetry is collected per day from Azure SQL Database, helping the SQL Server engineering team run automated alerting and SLA infrastructure. 

Additionally, the SQL Server engineering team watches for and investigates all crash dumps coming in across the millions of databases running in Azure SQL Database. 

The servicing learnings from Azure SQL Database directly accrue to SQL Server’s quality in the following ways:

  • The SQL Server engineering team proactively fixes and deploys such fixes into Azure SQL Database, in-market SQL Server versions, and future in-development versions of SQL Server.
  • Fixes get pushed into Cumulative Updates (CUs) of SQL Server at a much more aggressive rate than in the past.

The SQL Server engineering team encourages people to run on the latest CUs in SQL Server 2016 and higher because of these engineering model investments. Customers will now get fixes that customers would have, in the past, had to have requested as hotfixes (and then wait for the engineering team to scramble and fix).  Now such fixes happen proactively and without opening support tickets.

From a SQL Server DBA perspective, keeping up with the latest CUs can help prevent known issues from happening and also help with performance, availability and the overall health of the SQL Server environment.

Removing the need for SP1

Azure SQL Database receives most feature and bug fixes first, with code changes being applied incrementally across millions of databases. The SQL Server engineering team also leverages many parallel machines to run tests faster and more often. The end-result is that regressions and bugs are caught and fixed much earlier in the engineering cycle. These fixes are then rolled into SQL Server via upcoming cumulative updates (CUs).

The extensive testing and production-ready builds led to the announcement of changes described in the Modern Servicing Model blog post. Starting with SQL Server 2017, localized cumulative updates (CUs) are the primary delivery method for fixes, with delivery every month for the first 12 months after release of SQL Server 2017 and then every quarter for the remaining 4 years of the full 5-year mainstream cycle. Because cumulative updates are production-ready and as well-tested as past service pack releases, starting with SQL Server 2017, annual service packs are no longer being published. There is no need to wait until SP1 to upgrade to SQL Server 2017!

Two worlds, one engineering model

Running Azure SQL Database has significantly transformed the SQL Server engineering model and the evolution continues. As the build-and-ship process continues to be streamlined and improved, our goal is to provide continuous value and innovation to our customers both in Azure SQL Database and SQL Server. With cloud innovations enriching SQL Server, we believe SQL Server is the best place to build your next on-premises data tier and application.

If you have feedback to share regarding Microsoft’s engineering model or the improvements described here, we would like to hear from you. To contact the SQL Server engineering team with feedback or comments on this subject, please email: SQLDBArchitects@microsoft.com.