Benjamin Nevarez is an independent consultant based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of “SQL Server 2014 Query Tuning & Optimization” and “Inside the SQL Server Query Optimizer” and co-author of “SQL Server 2012 Internals”. With more than 20 years of experience in relational databases, Benjamin has also been a speaker at many SQL Server conferences, including the PASS Summit, SQL Server Connections and SQLBits. Benjamin’s blog can be found at http://www.benjaminnevarez.com and he can also be reached by e-mail at admin at benjaminnevarez dot com and on twitter at @BenjaminNevarez.
A major problem with updating statistics in large tables in SQL Server is that the entire table always has to be scanned, for example when using the WITH FULLSCAN
option, even if only recent data has changed. This is also true when using partitioning: even if only the newest partition had changed since the last time statistics were updated, updating statistics again required to scan the entire table including all the partitions that didn’t change. Incremental statistics, a new SQL Server 2014 feature, can help with this problem.
Using incremental statistics you can update only the partition or partitions that you need and the information on these partitions will be merged with the existing information to create the final statistics object. Another advantage of incremental statistics is that the percentage of data changes required to trigger an automatic update of statistics now works at the partition level which basically means that now only 20% of rows changed (changes on the leading statistics column) per partition are required. Unfortunately the histogram is still limited to 200 steps for the entire statistics object in this version of SQL Server.
Let us look at an example of how can you update statistics at a partition level to explore its behavior at least as of SQL Server 2014 CTP2. First we need to create a partitioned table using the AdventureWorks2012 database:
CREATE PARTITION FUNCTION TransactionRangePF1 (DATETIME)
AS RANGE RIGHT FOR VALUES
(
'20071001', '20071101', '20071201', '20080101',
'20080201', '20080301', '20080401', '20080501',
'20080601', '20080701', '20080801'
);
GO
CREATE PARTITION SCHEME TransactionsPS1 AS PARTITION TransactionRangePF1 TO
(
[PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY], [PRIMARY]
);
GO
CREATE TABLE dbo.TransactionHistory
(
TransactionID INT NOT NULL, -- not bothering with IDENTITY here
ProductID INT NOT NULL,
ReferenceOrderID INT NOT NULL,
ReferenceOrderLineID INT NOT NULL DEFAULT (0),
TransactionDate DATETIME NOT NULL DEFAULT (GETDATE()),
TransactionType NCHAR(1) NOT NULL,
Quantity INT NOT NULL,
ActualCost MONEY NOT NULL,
ModifiedDate DATETIME NOT NULL DEFAULT (GETDATE()),
CONSTRAINT CK_TransactionType
CHECK (UPPER(TransactionType) IN (N'W', N'S', N'P'))
)
ON TransactionsPS1 (TransactionDate);
GO
Note: For details about partitioning and the CREATE PARTITION FUNCTION / SCHEME
statements please refer to Partitioned Tables and Indexes in Books Online.
We currently have data to populate 12 partitions. Let us start by first populating only 11.
INSERT INTO dbo.TransactionHistory
SELECT * FROM Production.TransactionHistory
WHERE TransactionDate < '2008-08-01';
If required, you can use the following statement to inspect the contents of the partitions:
SELECT * FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.TransactionHistory');
Let us create an incremental statistics object using the CREATE STATISTICS
statement with the new INCREMENTAL
clause set to ON
(OFF
is the default):
CREATE STATISTICS incrstats ON dbo.TransactionHistory(TransactionDate)
WITH FULLSCAN, INCREMENTAL = ON;
You can also create incremental statistics while creating an index using the new STATISTICS_INCREMENTAL
clause of the CREATE INDEX
statement.
You can inspect the created statistics object using DBCC
:
DBCC SHOW_STATISTICS('dbo.TransactionHistory', incrstats);
Among other things, you will notice that the histogram has 200 steps (only the last 3 shown here):
RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | |
198 | 2008-07-25 00:00:00.000 | 187 | 100 | 2 |
199 | 2008-07-27 00:00:00.000 | 103 | 101 | 1 |
200 | 2008-07-31 00:00:00.000 | 281 | 131 | 3 |
Initial DBCC results
So we already have the maximum of steps in a statistics object. What would happen if you add data to a new partition? Let us add data to partition 12:
INSERT INTO dbo.TransactionHistory
SELECT * FROM Production.TransactionHistory
WHERE TransactionDate >= '2008-08-01';
Now, we update the statistics object using the following statement:
UPDATE STATISTICS dbo.TransactionHistory(incrstats)
WITH RESAMPLE ON PARTITIONS(12);
Note the new syntax specifying the partition, where you can specify multiple partitions, separated by comma. The UPDATE STATISTICS
statement reads the specified partitions and then merges their results with the existing statistic object to build the global statistics. Note the RESAMPLE
clause; this is required as partition statistics need to have the same sample rates to be merged to build the global statistics. Although only the specified partition was scanned, you can see that SQL Server has rearranged the histogram. The last three steps now show data for the added partition. You can also compare the original with the new histogram for other minor differences:
RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | |
197 | 2008-07-31 00:00:00.000 | 150 | 131 | 2 |
198 | 2008-08-12 00:00:00.000 | 300 | 36 | 9 |
199 | 2008-08-22 00:00:00.000 | 229 | 43 | 7 |
200 | 2008-09-03 00:00:00.000 | 363 | 37 | 11 |
DBCC results after the incremental update
If for any reason you want to disable the incremental statistics you can use the following statement to go back to the original behavior (or optionally just drop the statistics object and create a new one).
UPDATE STATISTICS dbo.TransactionHistory(incrstats)
WITH FULLSCAN, INCREMENTAL = OFF;
After disabling the incremental statistics trying to update a partition as shown previously will return the following error message:
UPDATE STATISTICS ON PARTITIONS syntax is not supported for non-incremental statistics.
Finally, you can also enable incremental statistics for your automatic statistics at the database level, if needed. This requires the INCREMENTAL = ON
clause in the ALTER DATABASE
statement and obviously also requires AUTO_CREATE_STATISTICS
set to ON
.
Thanks Benjamin for the useful your article :)
P.S: It seems strange for the RANGE_HI_KEY values after updating statistics. Same All ^^
Hi Jungsun,
Thanks for noticing that! Yes, somehow I copied the incorrect dates for the second histogram. The second histogram must show the last step from the first histogram (RANGE_HI_KEY 2008-07-31) plus three steps using dates from the new partition (RANGE_HI_KEY 2008-08-12, 2008-08-22 and 2008-09-03). Post has been fixed.
Regards,
Ben
Is this a SQL 2014 Enterprise edition only?
Since partitioning is Enterprise only then, by extension, yes.