The last several releases of SQL Server have introduced a slew of new features, as well as improvements in existing functionality. One area of the engine which is easy to overlook is statistics. After all, statistics still get created the same way, they still tell you about the distribution of data, they're still used by the Query Optimizer… what's different? The basic function of statistics remains the same – but how they're used by the Query Optimizer does change depending on the Cardinality Estimator you're using. There are also several noteworthy changes related to updating statistics and new functionality has been added around viewing statistics information. Altogether, these changes across the latest releases can cause a variation in SQL Server behavior that you weren't expecting.
Note: This post is most applicable to SQL Server 2012 and higher, but some detail for prior releases is included for reference (and fun).
SQL Server 7.0
- The number of steps in a histogram is limited to 300. In SQL Server 6.5 and earlier a histogram would have the number of steps that could fit on a 2K page, based on the size of the first column in the key.
- The 'automatically update statistics' database option is introduced; previously statistics were only updated manually.
SQL Server 2000
- The number of steps in the histogram is reduced from 300 to 200 (technically 201, if you include the step for NULL, assuming the first column in the key allows NULLs).
SQL Server 2005
- Updates to statistics that use FULLSCAN can run in parallel.
- Trace Flags 2389 and 2390 are introduced in SP1 to help with the Ascending Key issue, described in the post, Ascending Keys and Auto Quick Correct Statistics. A detailed example of this scenario is provided in my post Trace Flag 2389 and the new Cardinality Estimator.
- The instance option 'Automatically Update Statistics Asynchronously' is introduced. Note that in order for this to be in effect, the option 'Automatically Update Statistics' must also be enabled. If you're not clear on what this option does, review the documentation in ALTER DATABASE SET Options. This is a setting that Glenn recommends (as noted in his post referenced below), but I think it's important to be aware of potential issues, as noted in How Automatic Updates to Statistics Can Affect Query Performance.
Note: There is a memory leak related to this setting in SQL Server 2008 through SQL Server 2012; please see Glenn's post Important Hotfix for SQL Server 2008 for more details.
SQL Server 2008
- Filtered statistics are introduced, and these can be created separately from a filtered index. There are some limitations around filtered indexes with regard to the Query Optimizer (see Tim Chapman's post The Pains of Filtered Indexes and Paul White's post Optimizer Limitations with Filtered Indexes) post, and it's important to understand the behavior of the counter that tracks modifications (and thus can trigger automatic updates). See Kimberly's post Filtered indexes and filtered stats might become seriously out-of-date for more details, and I also recommend checking out her stored procedure that analyzes data skew and recommends where you can create filtered statistics to provide more information to the Query Optimizer. I've implemented this for several large customers that have VLTs and skewed distribution across columns frequently used in predicates.
- Two new catalog views, sys.stats and sys.stats_columns, are added to provide easier insight into statistics and included columns. Use these two views instead of sp_helpstats, which is deprecated and provides less information.
SQL Server 2008R2 SP1
- Trace Flag 2371 is made available, which can be used to reduce the number of modifications required for automatic updates to statistics to occur. As a reminder, I'm a fan of updating statistics on a regular basis through a scheduled job and leaving the auto update enabled as a safety.
SQL Server 2008R2 SP2
- The function sys.dm_db_stats_properties is included, which provides the same information found in the header of DBCC SHOW_STATISTICS, as well as a modification column that could be used to track changes and programmatically determine if an update was needed. Remember my preference for using a job to update stats? That job just got a lot smarter with this DMF…now I can look to see how much data has been modified and ONLY update statistics if a certain percentage of data has changed. Slick.
SQL Server 2012
- Updating statistics will not cause plans to be invalidated IF no rows have changed. This is one that surprises a lot of people, and Kimberly has a fun post, What caused that plan to go horribly wrong – should you update statistics?, that walks through her adventure in figuring this out.
SQL Server 2012 SP1
- DBCC SHOW_STATISTICS only requires the SELECT permission – previously it required a user to be a member of sysadmin, or a member of the db_owner or db_ddladmin database role. This can be globally disabled with trace flag 9485.
- Includes sys.dm_db_stats_properties (see 2008R2 SP2 note above)
SQL Server 2012 SP2
- Cumulative Update 1 introduces a fix related to ascending keys not being properly identified even with trace flags 2389 and 2390 in use. This requires trace flag 4139 in addition to CU1, as noted in KB 2952101.
SQL Server 2014
- The new Cardinality Estimator is introduced, implemented by setting the database compatibility mode to 120, or by using trace flag 2312. If you haven't read anything about the new CE I recommend starting with the Cardinality Estimation documentation and then reading Joe Sack's whitepaper, Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator, for in-depth details.
- The behavior from Trace Flags 2389 and 2390 for ascending keys is now implemented via the database compatibility mode. If your databases compatibility mode is set to 120 (or higher in later releases), you do not need to use Trace Flags 2389 and 2390 for SQL Server to identify statistics that have ascending keys.
- Incremental statistics are introduced for partitions, and can be viewed through the new DMF sys.dm_db_incremental_stats_properties. Incremental statistics provide a way to update statistics for a partition without updating them for the entire table. However, the additional statistics information from the incremental statistics is not used by the Query Optimizer, but it is folded into the main histogram for the table.
- CU2 includes the same fix mentioned above for SQL Server 2012 SP2 that also requires trace flag 4139.
SQL Server 2014 SP1
- Trace flag 7471 is back-ported to CU6, originally available in SQL Server 2016 as noted below.
SQL Server 2016
- Trace flag 2371 is no longer needed to reduce the threshold for automatic updates to statistics if the database compatibility mode is set to 130. See Controlling Autostat (AUTO_UPDATE_STATISTICS) behavior in SQL Server.
- Updates to statistics can run in parallel when using SAMPLE, not just FULLSCAN. This is tied to the compatibility mode (must be 130), but can potentially provide faster updates and thus decrease maintenance windows. Aaron Bertrand talks about this enhancement in his post, A potential improvement for statistics updates: MAXDOP.
- Trace flag 7471 is introduced in CU1 to allow multiple UPDATE STATISTICS commands to run concurrently for a single table and Jonathan provides some great examples of how this can be used in his post Improved Support for Parallel Statistics Rebuilds.
SQL Server 2016 SP1
- The query hint option ENABLE_HIST_AMENDMENT_FOR_ASC_KEYS is introduced, along with the FOR HINT argument, which is the equivalent of trace flag 4139.
- The DMF sys.dm_db_stats_histogram is exposed in CU2, which is an alternative to the histogram output from DBCC SHOW_STATISTICS. The information in both is the same, use what's easier for you or better fits the problem you need to solve.
- The option PERSIST_SAMPLE_PERCENT is introduced in CU4, which can be used to force the same sampling rate to be used every time a statistic is updated going forward, and examples of this behavior can be found in the post, Persisting statistics sampling rate.
SQL Server 2017
- The PERSIST_SAMPLE_PERCENT option is available in CU1 (see 2016 SP1 entry for additional info).
- The attributes StatsInfoType and OptimizerStatsUsageType are added to the Query Plan, which lists statistics used during query optimization. This is available in CU3 and is tied to the CE version (120 and higher). This is pretty cool! I haven't had a chance to play with this yet, but to get this information previously you had to use undocumented trace flags.
- CU3 also introduced the MAXDOP option for CREATE STATISTICS and UPDATE STATISTICS, which can be used to override the MAXDOP value for the instance or database.
- One more addition in CU3: the UdfCpuTime and UdfElapsedTime attributes can be found in the Query Plan, which represent execution statistics for scalar-valued, UDFs.
If you are looking to upgrade to a newer release, or if you've recently upgraded, take note as to how these changes impact your solution. We've had many clients contact us after upgrading from 2005/2008/2008R2 to 2014 or 2016, complaining of performance issues. In many cases, adequate testing was not completed prior to the upgrade.
This is something we really focus on when we're helping a client upgrade. Beyond the steps to move a production instance from one version to another with little downtime, we want to make sure that the day after the upgrade is a boring one for DBAs and developers.
We don't simply test the upgrade process, we test what the system looks like after the upgrade. Are the same trace flags from the old environment needed in the new one? What database settings need to be adjusted? Does query performance change – for better or worse? If you don't know the answers to those questions before you upgrade production, then you're setting yourself up for one to many days of fire-fighting, Sev 1 calls, meals at your desk, not enough sleep and who knows what else.
Take the time up front to understand the impact of the new features and changes in functionality listed above, plan the upgrade, and test as much as possible.