Guest Posts

SQL Server 2014 Incremental Statistics

February 14, 2014 by in SQL Statistics | 6 Comments
SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Jonathan Kehayias is a Principal Consultant with SQLskills and the youngest MCM ever.

Jonathan’s Posts

Benjamin Nevarez is an independent consultant based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of “SQL Server 2014 Query Tuning & Optimization” and “Inside the SQL Server Query Optimizer” and co-author of “SQL Server 2012 Internals”. With more than 20 years of experience in relational databases, Benjamin has also been a speaker at many SQL Server conferences, including the PASS Summit, SQL Server Connections and SQLBits. Benjamin’s blog can be found at http://www.benjaminnevarez.com and he can also be reached by e-mail at admin at benjaminnevarez dot com and on twitter at @BenjaminNevarez.


A major problem with updating statistics in large tables in SQL Server is that the entire table always has to be scanned, for example when using the WITH FULLSCAN option, even if only recent data has changed. This is also true when using partitioning: even if only the newest partition had changed since the last time statistics were updated, updating statistics again required to scan the entire table including all the partitions that didn’t change. Incremental statistics, a new SQL Server 2014 feature, can help with this problem.

Using incremental statistics you can update only the partition or partitions that you need and the information on these partitions will be merged with the existing information to create the final statistics object. Another advantage of incremental statistics is that the percentage of data changes required to trigger an automatic update of statistics now works at the partition level which basically means that now only 20% of rows changed (changes on the leading statistics column) per partition are required. Unfortunately the histogram is still limited to 200 steps for the entire statistics object in this version of SQL Server.

Let us look at an example of how can you update statistics at a partition level to explore its behavior at least as of SQL Server 2014 CTP2. First we need to create a partitioned table using the AdventureWorks2012 database:

CREATE PARTITION FUNCTION TransactionRangePF1 (DATETIME)
AS RANGE RIGHT FOR VALUES 
(
   '20071001', '20071101', '20071201', '20080101', 
   '20080201', '20080301', '20080401', '20080501', 
   '20080601', '20080701', '20080801'
);
GO

CREATE PARTITION SCHEME TransactionsPS1 AS PARTITION TransactionRangePF1 TO 
(
  [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], 
  [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], 
  [PRIMARY], [PRIMARY], [PRIMARY]
);
GO

CREATE TABLE dbo.TransactionHistory 
(
  TransactionID        INT      NOT NULL, -- not bothering with IDENTITY here
  ProductID            INT      NOT NULL,
  ReferenceOrderID     INT      NOT NULL,
  ReferenceOrderLineID INT      NOT NULL DEFAULT (0),
  TransactionDate      DATETIME NOT NULL DEFAULT (GETDATE()),
  TransactionType      NCHAR(1) NOT NULL,
  Quantity             INT      NOT NULL,
  ActualCost           MONEY    NOT NULL,
  ModifiedDate         DATETIME NOT NULL DEFAULT (GETDATE()),
  CONSTRAINT CK_TransactionType 
    CHECK (UPPER(TransactionType) IN (N'W', N'S', N'P'))
) 
ON TransactionsPS1 (TransactionDate);
GO

Note: For details about partitioning and the CREATE PARTITION FUNCTION / SCHEME statements please refer to Partitioned Tables and Indexes in Books Online.

We currently have data to populate 12 partitions. Let us start by first populating only 11.

INSERT INTO dbo.TransactionHistory
SELECT * FROM Production.TransactionHistory
WHERE TransactionDate < '2008-08-01';

If required, you can use the following statement to inspect the contents of the partitions:

SELECT * FROM sys.partitions
  WHERE object_id = OBJECT_ID('dbo.TransactionHistory');

Let us create an incremental statistics object using the CREATE STATISTICS statement with the new INCREMENTAL clause set to ON (OFF is the default):

CREATE STATISTICS incrstats ON dbo.TransactionHistory(TransactionDate) 
  WITH FULLSCAN, INCREMENTAL = ON;

You can also create incremental statistics while creating an index using the new STATISTICS_INCREMENTAL clause of the CREATE INDEX statement.

You can inspect the created statistics object using DBCC:

DBCC SHOW_STATISTICS('dbo.TransactionHistory', incrstats);

Among other things, you will notice that the histogram has 200 steps (only the last 3 shown here):

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS
198 2008-07-25 00:00:00.000 187 100 2
199 2008-07-27 00:00:00.000 103 101 1
200 2008-07-31 00:00:00.000 281 131 3

Initial DBCC results

So we already have the maximum of steps in a statistics object. What would happen if you add data to a new partition? Let us add data to partition 12:

INSERT INTO dbo.TransactionHistory 
SELECT * FROM Production.TransactionHistory 
WHERE TransactionDate >= '2008-08-01';

Now, we update the statistics object using the following statement:

UPDATE STATISTICS dbo.TransactionHistory(incrstats) 
  WITH RESAMPLE ON PARTITIONS(12);

Note the new syntax specifying the partition, where you can specify multiple partitions, separated by comma. The UPDATE STATISTICS statement reads the specified partitions and then merges their results with the existing statistic object to build the global statistics. Note the RESAMPLE clause; this is required as partition statistics need to have the same sample rates to be merged to build the global statistics. Although only the specified partition was scanned, you can see that SQL Server has rearranged the histogram. The last three steps now show data for the added partition. You can also compare the original with the new histogram for other minor differences:

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS
197 2008-07-31 00:00:00.000 150 131 2
198 2008-08-12 00:00:00.000 300 36 9
199 2008-08-22 00:00:00.000 229 43 7
200 2008-09-03 00:00:00.000 363 37 11

DBCC results after the incremental update

If for any reason you want to disable the incremental statistics you can use the following statement to go back to the original behavior (or optionally just drop the statistics object and create a new one).

UPDATE STATISTICS dbo.TransactionHistory(incrstats) 
  WITH FULLSCAN, INCREMENTAL = OFF;

After disabling the incremental statistics trying to update a partition as shown previously will return the following error message:

Msg 9111, Level 16, State 1
UPDATE STATISTICS ON PARTITIONS syntax is not supported for non-incremental statistics.

Finally, you can also enable incremental statistics for your automatic statistics at the database level, if needed. This requires the INCREMENTAL = ON clause in the ALTER DATABASE statement and obviously also requires AUTO_CREATE_STATISTICS set to ON.