Automatic Plan Correction in SQL Server

How much time do you spend troubleshooting performance issues as a Database Administrator or Developer? Have you ever tracked it? As the average total percentage of your day, it may not look like a lot of time, but when the issue is serious, you can spend hours tracking it down and working through root cause analysis. Sometimes the problem goes away and you don’t know the true origin. And even worse? When you have to battle these issues in the middle of the night or on the weekend. Not only are you scrambling to solve a problem, but you’re losing your personal free time. How do we alleviate that? How do we take our time and effort out of the equation, and fix performance at the same time?

The Automatic Tuning feature in SQL Server 2017 Enterprise Edition and Azure SQL Database is the first step in reducing the time data professionals spend troubleshooting and solving performance problems. The feature includes Automatic Plan Correction and Automatic Index Management (only available in Azure SQL Database), which are enabled independently. In this post I want to focus on the Automatic Plan Correction feature. With Automatic Plan Correction, if SQL Server finds that a query has regressed significantly, it will force the last-known good plan for the query to stabilize performance. Essentially, rather than you, the DBA or Developer, getting called on the weekend about system performance, SQL Server will address it for you. Sounds too easy, right? Let’s take a look.

Under the Covers

First, it’s essential to understand that Automatic Plan Correction uses Query Store, so it must be enabled for the database. Second, Automatic Plan Correction is simply automatic plan forcing. While Query Store is a marketed as flight-recorder for your database that tracks query text, plans, runtime statistics and wait statistics, it also allows you to force a plan for a query to allow for consistent performance. Automatic Plan Correction is plan forcing without your intervention.

Enabling Automatic Plan Correction

As mentioned, Query Store must first be enabled for the user database. This can be done in SSMS, with T-SQL, and with REST API for Azure SQL DB. Note that Query Store is enabled by default for databases in Azure, and has been since 2016 Q4.

Enabling Query Store through SSMS

USE [master];
GO
ALTER DATABASE [WideWorldImporters] 
	SET QUERY_STORE = ON;
GO
ALTER DATABASE [WideWorldImporters] 
	SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
GO

Enabling Query Store using T-SQL

The above code is the default T-SQL from SSMS if you script it out. In Azure SQL Database you do not run the USE statement. If you want to change any of the default options, please read my Query Store Settings post on what those options are and considerations for alternate values.

Once Query Store is enabled, you can use the Azure Portal, T-SQL, or EST API to enable Automatic Plan Correction in Azure SQL Database ((and C# and PowerShell are in the works). It can only be enabled with T-SQL in SQL Server 2017.

Enabling Automatic Plan Correction in the Azure Portal

ALTER DATABASE [WideWorldImporters] 
	SET AUTOMATIC_TUNING (
		FORCE_LAST_GOOD_PLAN = ON
	);
GO

Enabling Automatic Plan Correction in the T-SQL

Note that Automatic Plan Correction will be enabled by default for new databases in Azure in the near future. Starting in January of 2018, Automatic Tuning is being enabled for Azure SQL Databases that did not already have it enabled, with notifications sent to administrators so the option can be disabled if desired.

How It Works

With Automatic Plan Correction enabled, SQL Server monitors query performance using Query Store data. It looks for a significant change* in CPU** performance with a 48-hour window***. Notice the asterisks in that sentence… those are on purpose:

*The threshold for what constitutes a significant change is not documented because Microsoft reserves the right to change it.
**The metric used to determine the change in performance (CPU) is not documented because Microsoft reserves the right to change it. Meaning, Microsoft could consider additional dimensions to look at performance if that would be better/perform better than CPU alone.
***The time period across which the query performance data is compared not documented for the same reason, Microsoft reserves the right to change it.
Note: while the aforementioned items not documented, I confirmed with the appropriate individuals at Microsoft that this information could be shared with breaking any NDA. It is extremely important to understand that the values are not fixed and can change, with the expectation that they would change to improve the reliability of the feature.

The lack of documentation and possibility of changes in threshold may be frustrating for some individuals, but here is what’s really important to remember:

Microsoft captures terabytes of operational telemetry data from SQL Azure Databases daily, and that data is critical to the automatic features that are being developed. This data includes things like query_id, query_plan_id, and query_hash, and Microsoft does NOT capture query_text or query_plan (they are not looking at your actual data). Microsoft is not simply archiving that operational telemetry or using it for troubleshooting, they are mining that data and using it to develop algorithms and models to allow SQL Server to make independent, intelligent decisions.

SQL Server can take advantage of the plethora of data in Query Store that details the performance of queries, and automatic plan correction starts with comparing current performance for a query against past performance to determine if there is a regression in performance. Has performance gone down, or gotten worse, and if so, is it by a significant amount?

If there has been a regression in query performance, then SQL Server will force the last known good plan for that query, which is of course pulled from Query Store. But it doesn’t stop there. SQL Server then continues to monitor performance – still using Query Store – to confirm that the forced plan is still a good plan for that query, meaning that the query with the forced plan performs better than the regressed version. If that query isn’t performing better, then it will un-force the plan. A plan can also be un-forced if there is a recompile, or if forcing fails.

This cycle continues; if a query has a forced plan, and then that plan is un-forced for one of the aforementioned reasons, that same plan can be forced again later, or there may be another plan forced for that query at a later time. This is a continual process that occurs as long as you have the Automatic Plan Correction option enabled for the database. Now, what’s interesting is that you can look at the same information that this feature captures and use it force plans manually. That is, in SQL Server 2017 Enterprise Edition and in Azure SQL Database, this data is being collected in the sys.dm_db_tuning_recommendations DMV even when the Automatic Plan Correction feature is not enabled, so you can examine that data and follow its recommendations to force plans for specific queries on your own. Note that if you force a plan using recommendations from sys.dm_db_tuning_recommendations, it will never be automatically un-forced. Further, if you have Automatic Plan Correction enabled, and you manually force a plan, it will never be automatically un-forced. Only plans that are forced with the Automatic Plan Correction feature will be un-forced automatically.

Am I really going to let SQL Server have control?

If you are skeptical and wondering if you can really trust SQL Server to make a plan-forcing decision, here’s what I would encourage you to remember:

This feature was developed with a staggering amount of data captured from almost two million Azure SQL Databases. It is a new feature in SQL Server 2017, but it made it into global availability in 2016 in Azure, so this feature has truly been available for well over a year, and it has been refined.
The engineers have made changes to the algorithm over time, as they have captured more data. It may not find every regression that occurs – because a regression may not be severe enough, but I would bet that many of you would rather have this feature force less often than too often.
On top of that, if a plan is forced but ends up causing an issue, SQL Server’s ability to recover from that and un-force the plan is extremely reliable and happens very quickly.

Summary

You may not be comfortable with the idea of SQL Server addressing performance issues for you. But is that discomfort because you think it will make a poor decision? Or are you concerned about automation taking over your job? Pretty direct question, I know. If it’s the former, then my recommendation is to look at the data captured in sys.dm_db_tuning_recommendations (without enabling Automatic Plan Correction) and see what SQL Server would want to do. Does it line up with what you would do? Does it find regressions that you might miss? If you don’t want to enable the feature because you’re afraid you suddenly won’t have enough to do, I encourage you to read Conor Cunningham’s recent post, How cloud speed helps SQL Server DBAs. Microsoft isn’t trying to code you out of a job. They’re simply trying to handle the low-hanging fruit so that you can focus on more important tasks.