Erin Stellato

Understanding What sp_updatestats Really Updates

SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Jonathan Kehayias is a Principal Consultant with SQLskills and the youngest MCM ever.

Jonathan’s Posts

When I was in Chicago a few weeks ago for one of our Immersion Events, an attendee had a statistics question. I won't go into all the details around the issue, but the attendee mentioned that stats were updated using sp_updatestats. This is a method to update statistics that I've never recommended; I have always recommended a combination of index rebuilds and UPDATE STATISTICS to keep statistics up to date. If you’re not familiar with sp_updatestats, it’s a command that is run for the entire database to update statistics. But as Kimberly pointed out to the attendee, sp_updatestats will update a statistic as long as it has had one row modified. Whoa. I immediately opened up Books Online, and for sp_updatestats you'll see this:

sp_updatestats updates only the statistics that require updating based on the rowmodctr information in the sys.sysindexes catalog view, thus avoiding unnecessary updates of statistics on unchanged rows

Now, I admit, I made an assumption on what "…require updating based on the rowmodctr information in the sys.sysindexes catalog view…" meant. I assumed that the update decision would follow the same logic that the Auto Update Statistics option follows, which is:

  • The table size has gone from 0 to >0 rows (test 1).
  • The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
  • The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).

This logic is not followed for sp_updatestats. In fact, the logic is so incredibly simple, it's scary: If one row is modified, the statistic is updated. One row. ONE ROW. What’s my concern? I’m worried about the overhead of updating statistics for a bunch of statistics that don't truly need to be updated.  Let's take a closer look at sp_updatestats.

We'll start off with a fresh copy of the AdventureWorks2012 database which you can download from Codeplex. I’m going to first update rows in a three different tables:

USE [AdventureWorks2012];
GO
SET NOCOUNT ON;
GO

UPDATE [Production].[Product]
SET [Name] = 'Bike Chain'
WHERE [ProductID] = 952;

UPDATE [Person].[Person]
SET [LastName] = 'Cameron'
WHERE [LastName] = 'Diaz';
GO

INSERT INTO Sales.SalesReason
(Name, ReasonType, ModifiedDate)
VALUES('Stats', 'Test', GETDATE());
GO 10000

We modified one row in Production.Product, 211 rows in Person.Person, and we added 10,000 rows to Sales.SalesReason. If the sp_updatestats procedure followed the same logic for updates as the Auto Update Statistics option, then only Sales.SalesReason would update because it had 10 rows to start (whereas the 211 rows updated in Person.Person represent about one percent of the table). However, if we dig into sp_updatestats, we can see that the logic used is different. Note that I’m only extracting the statements from within sp_updatestats that are used to determine what statistics get updated.

A cursor iterates through all user-defined tables and internal tables in the database:

declare ms_crs_tnames cursor local fast_forward read_only for
select name, object_id, schema_id, type from sys.objects o
where o.type = 'U' or o.type = 'IT'
open ms_crs_tnames
fetch next from ms_crs_tnames into @table_name, @table_id, @sch_id, @table_type

Another cursor loops through the statistics for each table, and excludes heaps and hypothetical indexes and statistics. Note that sys.sysindexes is used in sp_helpstats. Sysindexes is a SQL Server 2000 system table and is scheduled to be removed in a future version of SQL Server. This is interesting, as the other method to determine rows updated is the sys.dm_db_stats_properties DMF, which is only available in SQL 2008 R2 SP2 and SQL 2012 SP1.

set @index_names = cursor local fast_forward read_only for
select name, indid, rowmodctr
from sys.sysindexes
where id = @table_id
and indid > 0
and indexproperty(id, name, 'ishypothetical') = 0
order by indid

After a bit of preparation and additional logic, we get to an IF statement which reveals that sp_updatestats filters out statistics that haven’t had any rows updated… confirming that even if only one row has been modified, the statistic will be updated. There’s also a check for @is_ver_current, which is determined by a built-in, internal function.

if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0)))

A couple more checks related to sampling and compatibility level, and then the UPDATE statement executes for the statistic. Before we actually run sp_updatestats, we can query sys.sysindexes to see what statistics will update:

SELECT [o].[name], [si].[indid], [si].[name], [si].[rowmodctr], [si].[rowcnt], [o].[type]
FROM [sys].[objects] [o]
JOIN [sys].[sysindexes] [si] ON [o].[object_id] = [si].[id]
WHERE ([o].[type] = 'U' OR [o].[type] = 'IT')
AND [si].[indid] > 0
AND [si].[rowmodctr] <> 0
ORDER BY [o].[type] DESC, [o].[name];

In addition to the three tables that we modified, there’s another statistic for a user table (dbo.DatabaseLog) and three internal statistics that will be updated:

Statistics that will be updated
Statistics that will be updated

If we run sp_updatestats for the AdventureWorks database, the output lists every table and the statistic(s) updated. The output below is modified to only show updated statistics:

Updating [sys].[fulltext_avdl_1589580701]
[clust] has been updated…
1 index(es)/statistic(s) have been updated, 0 did not require update.

Updating [dbo].[DatabaseLog]
[PK_DatabaseLog_DatabaseLogID] has been updated…
1 index(es)/statistic(s) have been updated, 0 did not require update.

Updating [sys].[fulltext_avdl_1077578877]
[clust] has been updated…
1 index(es)/statistic(s) have been updated, 0 did not require update.

Updating [Person].[Person]
[PK_Person_BusinessEntityID], update is not necessary…
[IX_Person_LastName_FirstName_MiddleName] has been updated…
[AK_Person_rowguid], update is not necessary…
1 index(es)/statistic(s) have been updated, 2 did not require update.

Updating [Sales].[SalesReason]
[PK_SalesReason_SalesReasonID] has been updated…
1 index(es)/statistic(s) have been updated, 0 did not require update.

Updating [Production].[Product]
[PK_Product_ProductID], update is not necessary…
[AK_Product_ProductNumber], update is not necessary…
[AK_Product_Name] has been updated…
[AK_Product_rowguid], update is not necessary…
[_WA_Sys_00000013_75A278F5], update is not necessary…
[_WA_Sys_00000014_75A278F5], update is not necessary…
[_WA_Sys_0000000D_75A278F5], update is not necessary…
[_WA_Sys_0000000C_75A278F5], update is not necessary…
1 index(es)/statistic(s) have been updated, 7 did not require update.

Statistics for all tables have been updated.

The last line of the output is a bit misleading – statistics for all tables haven’t been updated, only the statistics that have had one row or more modified have been updated. And again, the drawback of that is that maybe resources were used that didn’t need to be. If a statistic only has one row modified, should it be updated? No. If it has 10,000 rows updated, should it be updated? Well, that depends. If the table only has 5,000 rows, then absolutely; if the table has 1 million rows, then no, as only one percent of the table has been modified.

The take-away here is that if you’re using sp_updatestats to update your statistics, you are most likely wasting resources, including CPU, I/O, and tempdb. Further, it takes time to update each statistic, and if you have a tight maintenance window you probably have other maintenance tasks that can execute in that time, instead of unnecessary updates. Finally, you’re probably not providing any performance benefits by updating statistics when so few rows have changed. The distribution change is likely insignificant if only a small percentage of rows have been modified, so the histogram and density values don’t end up changing that much. In addition, remember that updating statistics invalidates query plans that use those statistics. When those queries execute, plans be re-generated, and the plan will probably be exactly the same as it was before, because there was no significant change in the histogram. There’s a cost to re-compiling query plans – it’s not always easy to measure, but it shouldn’t be ignored.

A better method to manage statistics – because you do need to manage statistics – is to implement a scheduled job that updates based on the percentages of rows that have been modified. You can use the aforementioned query that interrogates sys.sysindexes, or you can use the query below that takes advantage of the new DMF added in SQL Server 2008 R2 SP2 and SQL Server 2012 SP1:

SELECT [sch].[name] + '.' + [so].[name] AS [TableName] ,
[ss].[name] AS [Statistic],
[sp].[last_updated] AS [StatsLastUpdated] ,
[sp].[rows] AS [RowsInTable] ,
[sp].[rows_sampled] AS [RowsSampled] ,
[sp].[modification_counter] AS [RowModifications]
FROM [sys].[stats] [ss]
JOIN [sys].[objects] [so] ON [ss].[object_id] = [so].[object_id]
JOIN [sys].[schemas] [sch] ON [so].[schema_id] = [sch].[schema_id]
OUTER APPLY [sys].[dm_db_stats_properties]([so].[object_id],
[ss].[stats_id]) sp
WHERE [so].[type] = 'U'
AND [sp].[modification_counter] > 0
ORDER BY [sp].[last_updated] DESC;

Realize that different tables may have different thresholds and you will need to tweak the query above for your databases. For some tables, waiting until 15% or 20% of the rows have been modified may be ok. But for others, you may need to update at 10% or even 5%, depending on the actual values and their skew. There is no silver bullet. As much as we love absolutes, they rarely exist in SQL Server and statistics is no exception. You still want to leave Auto Update Statistics enabled – it’s a safety that will kick in if you miss something, just like Auto Growth for your database files. But your best bet is to know your data, and implement a methodology that allows you to update statistics based on the percentage of rows changed.