Erin Stellato

Another Reason to Avoid sp_updatestats

November 4, 2014 by in SQL Statistics | 2 Comments
SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

I've blogged previously about why I don't love sp_updatestats. I recently found another reason that it's not my friend. TL;DR: It doesn't update statistics on indexed views. Now, the documentation doesn't claim that it does, so there's no bug here. The MSDN documentation clearly states:

Runs UPDATE STATISTICS against all user-defined and internal tables in the current database.

But… how many of you thought about your indexed views and wondered whether those got updated? I admit I didn't. I forget about indexed views, which is unfortunate because they can be really powerful when used appropriately. They can also be a nightmare to unravel when you're troubleshooting, but I'm not going to argue their use today.  I just want you to be aware that they don't get updated by sp_updatestats, and see what options you have.

Setup

Since the World Series just ended, we're going to use the Baseball database for our testing. You can download it from the SQLskills Resources page. Once restored we'll create a copy of the dbo.Players table, named dbo.PlayerInfo, load a few thousand rows into it, and then create an indexed view that joins our new table to the PitchingPost table:

USE [BaseballData];
GO

CREATE TABLE [dbo].[PlayerInfo](
	[lahmanID] [int] NOT NULL,
	[playerID] [varchar](10) NULL DEFAULT (NULL),
	[managerID] [varchar](10) NULL DEFAULT (NULL),
	[hofID] [varchar](10) NULL DEFAULT (NULL),
	[birthYear] [int] NULL DEFAULT (NULL),
	[birthMonth] [int] NULL DEFAULT (NULL),
	[birthDay] [int] NULL DEFAULT (NULL),
	[birthCountry] [varchar](50) NULL DEFAULT (NULL),
	[birthState] [varchar](2) NULL DEFAULT (NULL),
	[birthCity] [varchar](50) NULL DEFAULT (NULL),
	[deathYear] [int] NULL DEFAULT (NULL),
	[deathMonth] [int] NULL DEFAULT (NULL),
	[deathDay] [int] NULL DEFAULT (NULL),
	[deathCountry] [varchar](50) NULL DEFAULT (NULL),
	[deathState] [varchar](2) NULL DEFAULT (NULL),
	[deathCity] [varchar](50) NULL DEFAULT (NULL),
	[nameFirst] [varchar](50) NULL DEFAULT (NULL),
	[nameLast] [varchar](50) NULL DEFAULT (NULL),
	[nameNote] [varchar](255) NULL DEFAULT (NULL),
	[nameGiven] [varchar](255) NULL DEFAULT (NULL),
	[nameNick] [varchar](255) NULL DEFAULT (NULL),
	[weight] [int] NULL DEFAULT (NULL),
	[height] [int] NULL,
	[bats] [varchar](1) NULL DEFAULT (NULL),
	[throws] [varchar](1) NULL DEFAULT (NULL),
	[debut] [varchar](10) NULL DEFAULT (NULL),
	[finalGame] [varchar](10) NULL DEFAULT (NULL),
	[college] [varchar](50) NULL DEFAULT (NULL),
	[lahman40ID] [varchar](9) NULL DEFAULT (NULL),
	[lahman45ID] [varchar](9) NULL DEFAULT (NULL),
	[retroID] [varchar](9) NULL DEFAULT (NULL),
	[holtzID] [varchar](9) NULL DEFAULT (NULL),
	[bbrefID] [varchar](9) NULL DEFAULT (NULL),
PRIMARY KEY CLUSTERED 
([lahmanID] ASC) ON [PRIMARY]
) ON [PRIMARY];
GO

INSERT INTO [dbo].[PlayerInfo]
           ([lahmanID]
           ,[playerID]
           ,[managerID]
           ,[hofID]
           ,[birthYear]
           ,[birthMonth]
           ,[birthDay]
           ,[birthCountry]
           ,[birthState]
           ,[birthCity]
           ,[deathYear]
           ,[deathMonth]
           ,[deathDay]
           ,[deathCountry]
           ,[deathState]
           ,[deathCity]
           ,[nameFirst]
           ,[nameLast]
           ,[nameNote]
           ,[nameGiven]
           ,[nameNick]
           ,[weight]
           ,[height]
           ,[bats]
           ,[throws]
           ,[debut]
           ,[finalGame]
           ,[college]
           ,[lahman40ID]
           ,[lahman45ID]
           ,[retroID]
           ,[holtzID]
           ,[bbrefID])
SELECT [lahmanID]
           ,[playerID]
           ,[managerID]
           ,[hofID]
           ,[birthYear]
           ,[birthMonth]
           ,[birthDay]
           ,[birthCountry]
           ,[birthState]
           ,[birthCity]
           ,[deathYear]
           ,[deathMonth]
           ,[deathDay]
           ,[deathCountry]
           ,[deathState]
           ,[deathCity]
           ,[nameFirst]
           ,[nameLast]
           ,[nameNote]
           ,[nameGiven]
           ,[nameNick]
           ,[weight]
           ,[height]
           ,[bats]
           ,[throws]
           ,[debut]
           ,[finalGame]
           ,[college]
           ,[lahman40ID]
           ,[lahman45ID]
           ,[retroID]
           ,[holtzID]
           ,[bbrefID]
FROM [dbo].[Players]
WHERE [lahmanID] <= 10000;

CREATE VIEW [PlayerPostSeason]
WITH SCHEMABINDING
AS
	SELECT 
		[p].[lahmanID], 
		[p].[nameFirst], 
		[p].[nameLast], 
		[p].[debut], 
		[p].[finalGame], 
		[pp].[yearID], 
		[pp].[round], 
		[pp].[teamID], 
		[pp].[W], 
		[pp].[L], 
		[pp].[G]
	FROM [dbo].[PlayerInfo] [p]
	JOIN [dbo].[PitchingPost] [pp] ON [p].[playerID] = [pp].[playerID];

CREATE UNIQUE CLUSTERED INDEX [CI_PlayerPostSeason] ON [PlayerPostSeason] ([lahmanID], [yearID], [round]);

CREATE NONCLUSTERED INDEX [NCI_PlayerPostSeason_Name] ON [PlayerPostSeason] ([nameFirst], [nameLast]);

If we check statistics for the clustered and nonclustered indexes, we see they exist:

DBCC SHOW_STATISTICS ('PlayerPostSeason', CI_PlayerPostSeason) WITH STAT_HEADER;
GO
DBCC SHOW_STATISTICS ('PlayerPostSeason', NCI_PlayerPostSeason_Name) WITH STAT_HEADER;
GO

Index view statistics after initial creationIndex view statistics after initial creation

Now we'll insert more rows into PlayerInfo:

INSERT INTO [dbo].[PlayerInfo]
           ([lahmanID]
           ,[playerID]
           ,[managerID]
           ,[hofID]
           ,[birthYear]
           ,[birthMonth]
           ,[birthDay]
           ,[birthCountry]
           ,[birthState]
           ,[birthCity]
           ,[deathYear]
           ,[deathMonth]
           ,[deathDay]
           ,[deathCountry]
           ,[deathState]
           ,[deathCity]
           ,[nameFirst]
           ,[nameLast]
           ,[nameNote]
           ,[nameGiven]
           ,[nameNick]
           ,[weight]
           ,[height]
           ,[bats]
           ,[throws]
           ,[debut]
           ,[finalGame]
           ,[college]
           ,[lahman40ID]
           ,[lahman45ID]
           ,[retroID]
           ,[holtzID]
           ,[bbrefID])
SELECT [lahmanID]
           ,[playerID]
           ,[managerID]
           ,[hofID]
           ,[birthYear]
           ,[birthMonth]
           ,[birthDay]
           ,[birthCountry]
           ,[birthState]
           ,[birthCity]
           ,[deathYear]
           ,[deathMonth]
           ,[deathDay]
           ,[deathCountry]
           ,[deathState]
           ,[deathCity]
           ,[nameFirst]
           ,[nameLast]
           ,[nameNote]
           ,[nameGiven]
           ,[nameNick]
           ,[weight]
           ,[height]
           ,[bats]
           ,[throws]
           ,[debut]
           ,[finalGame]
           ,[college]
           ,[lahman40ID]
           ,[lahman45ID]
           ,[retroID]
           ,[holtzID]
           ,[bbrefID]
FROM [dbo].[Players]
WHERE [lahmanID] > 10000;

And if we check sys.dm_db_stats_properties, we can see the row modifications:

SELECT  
	[sch].[name] AS [Schema],
	[so].[name] AS [ObjectName],
	[so].[type] AS [ObjectType],
    [ss].[name] AS [Statistic],
    [sp].[last_updated] AS [StatsLastUpdated] ,
    [sp].[rows] AS [RowsInTable] ,
    [sp].[rows_sampled] AS [RowsSampled] ,
    [sp].[modification_counter] AS [RowModifications]
FROM [sys].[objects] [so]
JOIN [sys].[stats] [ss] ON [so].[object_id] = [ss].[object_id]
JOIN [sys].[schemas] [sch] ON [so].[schema_id] = [sch].[schema_id]
OUTER APPLY [sys].[dm_db_stats_properties]([so].[object_id],
                                                   [ss].[stats_id]) sp
WHERE [so].[name] = 'PlayerPostSeason';

Rows modified in the indexed view, via sys.dm_db_stats_propertiesRows modified in the indexed view, via sys.dm_db_stats_properties

And just for fun, if we check sys.sysindexes, we can see the modifications there as well:

SELECT  [so].[name], [si].[name], [si].[rowcnt], [si].[rowmodctr]
FROM [sys].[sysindexes] [si]
JOIN [sys].[objects] [so] ON [si].[id] = [so].[object_id]
WHERE [so].[name] = 'PlayerPostSeason';

Rows modified in the indexed view, via sys.sysindexesRows modified in the indexed view, via sys.sysindexes

Now sys.sysindexes is deprecated, but if you remember from my previous post, that's what sp_updatestats uses to see what's been modified. But… the object list for sys.indexes is driven by the query against sys.objects, which, if you remember, filters on user tables ('U') and internal tables ('IT'). It does not include views ('V') in that filter. As such, when we run sp_updatestats and check the output (not included for brevity), there is no mention of our PlayerPostSeason view.

Therefore, if you have indexed views and you're relying on sp_updatestats to update your statistics, your view statistics are not getting updated. However, I would guess that most of you have the Auto Update Statistics option enabled for your databases. This is good, because with this option, view statistics will update if they've been invalidated. We know we've made over 2000 modifications to the indexes on PlayerPostSeason. If we query by a first name that's selective, our query plan should use the NCI_PlayerPostSeason_Name index, and because statistics are out of date, they should get updated. Let's check:

SELECT *
FROM [PlayerPostSeason]
WHERE [nameFirst] = 'Madison';
GO

Query plan from SELECT against nonclustered indexQuery plan from SELECT against nonclustered index

We can see in the plan that the NCI_PlayerPostSeason_Name nonclustered index was used, and if we check statistics:

Statistics after automatic updateStatistics after automatic update

Sure enough, the statistics for the nonclustered index have been updated.  But of course we don't want to rely on auto update to manage statistics, we want to be proactive. We've got two options:

  • Maintenance Task
  • Custom Script

The update statistics maintenance task does update view statistics. This is not specifically called out anywhere in the UI, but if we create a maintenance plan with the update statistics task and run it, the statistics for the indexed view are updated.  The drawback of update statistics maintenance task is that it's a sledge-hammer approach. It updates all statistics, regardless of whether it's needed (it's almost as bad as sp_updatestats). I prefer a custom script, where SQL Server only updates what's been modified. If you're not into rolling your own script, you can use Ola Hallengren's script. It's common to update statistics as part of your index rebuilds and reorgs.  For example, with Ola's script in the SQL Agent job you would have:

sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -Q "EXECUTE [dbo].[IndexOptimize] @Databases = 'BaseballData', @FragmentationLow = NULL, @FragmentationMedium = 'INDEX_REORGANIZE', @FragmentationHigh = 'INDEX_REBUILD', @FragmentationLevel1 = 5, @FragmentationLevel2 = 30, @UpdateStatistics = 'ALL', @OnlyModifiedStatistics = 'Y', @LogToTable = 'Y'" –b

With this option, if statistics have been modified, they will be updated, and if we check the [dbo].[IndexOptimize] stored procedure we can see where Ola checks for modifications:

        -- Has the data in the statistics been modified since the statistics was last updated?
        IF @CurrentStatisticsID IS NOT NULL AND @UpdateStatistics IS NOT NULL AND @OnlyModifiedStatistics = 'Y'
        BEGIN
          SET @CurrentCommand10 = ''
          IF @LockTimeout IS NOT NULL SET @CurrentCommand10 = 'SET LOCK_TIMEOUT ' + CAST(@LockTimeout * 1000 AS nvarchar) + '; '
          IF (@Version >= 10.504000 AND @Version < 11) OR @Version >= 11.03000
          BEGIN
            SET @CurrentCommand10 = @CurrentCommand10 + 'USE ' + QUOTENAME(@CurrentDatabaseName) 
              + '; IF EXISTS(SELECT * FROM sys.dm_db_stats_properties (@ParamObjectID, @ParamStatisticsID) 
                   WHERE modification_counter > 0) BEGIN SET @ParamStatisticsModified = 1 END'
          END
          ELSE
          BEGIN
            SET @CurrentCommand10 = @CurrentCommand10 + 'IF EXISTS(SELECT * FROM ' 
              + QUOTENAME(@CurrentDatabaseName) + '.sys.sysindexes sysindexes 
              WHERE sysindexes.[id] = @ParamObjectID AND sysindexes.[indid] = @ParamStatisticsID 
              AND sysindexes.[rowmodctr] <> 0) BEGIN SET @ParamStatisticsModified = 1 END'
          END

For versions which support the sys.dm_db_stats_properties DMF, Ola checks it for any statistics that have been modified, and for versions that do not support the new sys.dm_db_stats_properties DMF, the sys.sysindexes system table is checked. My only complaint here is that the script behaves the same way as sp_updatestats: if at least one row has been modified, the statistic will be updated.

If you're not into writing your own code for managing stats, then I would recommend sticking with Ola's script. But if you do want to target your updates a bit more, then I'd recommend using sys.dm_db_stats_properties. This DMF is only available for SQL Server 2008R2 SP2 and higher, and SQL Server 2012 SP1 and higher, so if you're on a lower version, you'll need to use sys.indexes. But for those of you with access to sys.dm_db_stats_properties, here's a query to get you started:

SELECT
	[sch].[name] AS [Schema],
	[so].[name] AS [ObjectName],
	[so].[type] AS [ObjectType],
	[ss].[name] AS [Statistic],
	[sp].[last_updated] AS [StatsLastUpdated] ,
	[sp].[rows] AS [RowsInTable] ,
	[sp].[rows_sampled] AS [RowsSampled] ,
	CAST(100 * [sp].[rows_sampled] / [sp].[rows] AS DECIMAL (18, 2)) AS [PercentSampled],
	[sp].[modification_counter] AS [RowModifications] ,
	CAST(100 * [sp].[modification_counter] / [sp].[rows] AS DECIMAL(18, 2)) AS [PercentChange]
FROM [sys].[objects] AS [so]
INNER JOIN [sys].[stats] AS [ss] ON [so].[object_id] = [ss].[object_id]
INNER JOIN [sys].[schemas] AS [sch] ON [so].[schema_id] = [sch].[schema_id]
OUTER APPLY [sys].[dm_db_stats_properties]([so].[object_id], [ss].[stats_id]) AS [sp]
WHERE [so].[type] IN ('U','V')
AND ((CAST(100 * [sp].[modification_counter] / [sp].[rows] AS DECIMAL(18,2)) >= 10.0))
ORDER BY CAST(100 * [sp].[modification_counter] / [sp].[rows] AS DECIMAL(18, 2)) DESC;

Note that with sys.objects we filter on tables and views; you could alter this to include system tables. You can then modify the predicate to only retrieve rows based on the percentage of rows modified, or perhaps a combination of modification percentage and number of rows (for tables with millions or billions of rows, that percentage might be lower than for small tables).

Summary

The take home message here is pretty clear: I don't recommend using sp_updatestats to manage statistics. Statistics are updated when one or more rows have changed (which is an extremely low threshold for updating stats) and stats for indexed views are not updated. This is not a comprehensive and efficient method for managing stats…and the update statistics task in a Maintenance Plan isn't much better. It updates the indexed view statistics, but it updates every statistic, regardless of modifications. A custom script is really the way to go, but understand that Ola Hallengren's script, if you're updating based on modification, also updates when only row has been modified (but it at least gets the indexed views). In the end, for the best control, look to roll your own script for managing statistics. I've given you the base query to start. If you can block off a couple hours to practice your T-SQL writing and then test it out, you'll have a working custom script ready for your databases before the holidays roll around.