Benjamin Nevarez is an independent consultant based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of “SQL Server 2014 Query Tuning & Optimization” and “Inside the SQL Server Query Optimizer” and co-author of “SQL Server 2012 Internals”. With more than 20 years of experience in relational databases, Benjamin has also been a speaker at many SQL Server conferences, including the PASS Summit, SQL Server Connections and SQLBits. Benjamin’s blog can be found at http://www.benjaminnevarez.com and he can also be reached by e-mail at admin at benjaminnevarez dot com and on twitter at @BenjaminNevarez.
A major problem with updating statistics in large tables in SQL Server is that the entire table always has to be scanned, for example when using the
WITH FULLSCAN option, even if only recent data has changed. This is also true when using partitioning: even if only the newest partition had changed since the last time statistics were updated, updating statistics again required to scan the entire table including all the partitions that didn’t change. Incremental statistics, a new SQL Server 2014 feature, can help with this problem.
Using incremental statistics you can update only the partition or partitions that you need and the information on these partitions will be merged with the existing information to create the final statistics object. Another advantage of incremental statistics is that the percentage of data changes required to trigger an automatic update of statistics now works at the partition level which basically means that now only 20% of rows changed (changes on the leading statistics column) per partition are required. Unfortunately the histogram is still limited to 200 steps for the entire statistics object in this version of SQL Server.
Let us look at an example of how can you update statistics at a partition level to explore its behavior at least as of SQL Server 2014 CTP2. First we need to create a partitioned table using the AdventureWorks2012 database:
CREATE PARTITION FUNCTION TransactionRangePF1 (DATETIME) AS RANGE RIGHT FOR VALUES ( '20071001', '20071101', '20071201', '20080101', '20080201', '20080301', '20080401', '20080501', '20080601', '20080701', '20080801' ); GO CREATE PARTITION SCHEME TransactionsPS1 AS PARTITION TransactionRangePF1 TO ( [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY] ); GO CREATE TABLE dbo.TransactionHistory ( TransactionID INT NOT NULL, -- not bothering with IDENTITY here ProductID INT NOT NULL, ReferenceOrderID INT NOT NULL, ReferenceOrderLineID INT NOT NULL DEFAULT (0), TransactionDate DATETIME NOT NULL DEFAULT (GETDATE()), TransactionType NCHAR(1) NOT NULL, Quantity INT NOT NULL, ActualCost MONEY NOT NULL, ModifiedDate DATETIME NOT NULL DEFAULT (GETDATE()), CONSTRAINT CK_TransactionType CHECK (UPPER(TransactionType) IN (N'W', N'S', N'P')) ) ON TransactionsPS1 (TransactionDate); GO
Note: For details about partitioning and the
CREATE PARTITION FUNCTION / SCHEME statements please refer to Partitioned Tables and Indexes in Books Online.
We currently have data to populate 12 partitions. Let us start by first populating only 11.
INSERT INTO dbo.TransactionHistory SELECT * FROM Production.TransactionHistory WHERE TransactionDate < '2008-08-01';
If required, you can use the following statement to inspect the contents of the partitions:
SELECT * FROM sys.partitions WHERE object_id = OBJECT_ID('dbo.TransactionHistory');
Let us create an incremental statistics object using the
CREATE STATISTICS statement with the new
INCREMENTAL clause set to
OFF is the default):
CREATE STATISTICS incrstats ON dbo.TransactionHistory(TransactionDate) WITH FULLSCAN, INCREMENTAL = ON;
You can also create incremental statistics while creating an index using the new
STATISTICS_INCREMENTAL clause of the
CREATE INDEX statement.
You can inspect the created statistics object using
DBCC SHOW_STATISTICS('dbo.TransactionHistory', incrstats);
Among other things, you will notice that the histogram has 200 steps (only the last 3 shown here):
Initial DBCC results
So we already have the maximum of steps in a statistics object. What would happen if you add data to a new partition? Let us add data to partition 12:
INSERT INTO dbo.TransactionHistory SELECT * FROM Production.TransactionHistory WHERE TransactionDate >= '2008-08-01';
Now, we update the statistics object using the following statement:
UPDATE STATISTICS dbo.TransactionHistory(incrstats) WITH RESAMPLE ON PARTITIONS(12);
Note the new syntax specifying the partition, where you can specify multiple partitions, separated by comma. The
UPDATE STATISTICS statement reads the specified partitions and then merges their results with the existing statistic object to build the global statistics. Note the
RESAMPLE clause; this is required as partition statistics need to have the same sample rates to be merged to build the global statistics. Although only the specified partition was scanned, you can see that SQL Server has rearranged the histogram. The last three steps now show data for the added partition. You can also compare the original with the new histogram for other minor differences:
DBCC results after the incremental update
If for any reason you want to disable the incremental statistics you can use the following statement to go back to the original behavior (or optionally just drop the statistics object and create a new one).
UPDATE STATISTICS dbo.TransactionHistory(incrstats) WITH FULLSCAN, INCREMENTAL = OFF;
After disabling the incremental statistics trying to update a partition as shown previously will return the following error message:
UPDATE STATISTICS ON PARTITIONS syntax is not supported for non-incremental statistics.
Finally, you can also enable incremental statistics for your automatic statistics at the database level, if needed. This requires the
INCREMENTAL = ON clause in the
ALTER DATABASE statement and obviously also requires
AUTO_CREATE_STATISTICS set to