Continuing my series of articles on latches, this time I’m going to discuss the DBCC_OBJECT_METADATA latch and show how it can be a major bottleneck for consistency checks prior to SQL Server 2016 under certain circumstances. The issue affects DBCC CHECKDB, DBCC CHECKTABLE, and DBCC CHECKFILEGROUP, but for clarity I’ll just reference DBCC CHECKDB for the rest of this post.
You might wonder why I’m writing about an issue that affects older versions, but there are still a huge number of SQL Server 2014 and older instances out there, so it's a valid topic for my series.
I strongly recommend you read the initial post in the series before this one, so you have all the general background knowledge about latches.
What Is the DBCC_OBJECT_METADATA Latch?
To explain this latch, I need to explain a little bit about how DBCC CHECKDB works.
Among the huge number of consistency checks that DBCC CHECKDB performs is a check of the correctness of nonclustered indexes. Specifically, DBCC CHECKDB makes sure:
- For every nonclustered index record in every nonclustered index, there is exactly one "matching" data record in the base table (either a heap or a clustered index)
- For every data record in a table, there is exactly one "matching" nonclustered index record in each nonclustered index that’s defined for the table, taking into account filtered indexes
Without going down into too much depth in the details of how this is done, for each data record in a table, DBCC CHECKDB constructs each nonclustered index record that should exist for each nonclustered index and makes sure the constructed nonclustered index record exactly matches the actual nonclustered index record. If the nonclustered index has a computed column in it (either as part of the nonclustered index key or as an INCLUDEd column), DBCC CHECKDB has to figure out the computed column value to use when constructing the index records.
As well as the nonclustered index correctness checks, if there’s a persisted computed column in the definition of a table, then for each data record in the table, DBCC CHECKDB has to check that the persisted value is correct, regardless of whether that column is part of a nonclustered index or not.
So how does it figure out the computed column values?
The Query Processor provides a mechanism for calculating computed column values, called the "expression evaluator." DBCC CHECKDB calls that function, providing appropriate metadata information and the data record, and the expression evaluator uses the stored definition of the computed column in the metadata and the values from the data record and returns the value of the computed column for DBCC CHECKDB to use. The internal workings of the expression evaluator are outside the control of the DBCC code, but to be able to use the expression evaluator, a latch must be acquired first; the DBCC_OBJECT_METADATA latch.
How Does the Latch Become a Bottleneck?
Here’s the problem: there's only one acceptable mode in which the DBCC_OBJECT_METADATA latch can be acquired before using the expression evaluator, and that's EX mode (exclusive). And as you’ll know from reading the intro post to the series, only one thread at a time can hold the latch in EX mode.
Pulling all this information together: when a database has persisted computed columns, or nonclustered indexes that have computed columns in them, the expression evaluator has to be used. If the SQL Server edition is Enterprise, DBCC CHECKDB is able to use parallelism and so has multiple threads performing the various checks. And as soon as you have multiple threads trying to acquire a latch in EX mode, that latch becomes a bottleneck. Just how big of a bottleneck it becomes depends on how much the expression evaluator needs to be used, so the more persisted computed columns or nonclustered indexes using computed columns there are, and the larger the number of table rows in those tables there are, the bigger the bottleneck the DBCC _OBJECT_METADATA latch becomes.
But remember, this bottleneck only happens for versions of SQL Server earlier than SQL Server 2016. In SQL Server 2016 Microsoft decided to "fix" the bottleneck by turning off the checks of nonclustered indexes using computed columns by default and only doing them when the WITH EXTENDED_LOGICAL_CHECKS option is used.
Showing the Bottleneck
You can easily reproduce the bottleneck for yourself by running DBCC CHECKDB on a database that has either persisted computed columns or nonclustered indexes with computed columns in, and the Microsoft-provided AdventureWorks database is a great example. You can download backups of AdventureWorks for your version of SQL Server from here. I ran some tests using an AdventureWorks2014 database on an SQL Server 2014 instance (on a 32-core Dell R720), and I enlarged the database to be a few hundred GB using Jonathan’s scripts.
When I ran DBCC CHECKDB, with server MAXDOP set to 0, it took more than 5 hours to run. The LATCH_EX wait type accounted for about 30% of the waits, with each wait being just shy of 1 millisecond, and 99% of the LATCH_EX waits were for the DBCC_OBJECT_METADATA latch.
I looked for nonclustered indexes containing computed columns using the following code:
SELECT
[s].[name] AS [Schema],
[o].[name] AS [Object],
[i].[name] AS [Index],
[c].[name] AS [Column],
[ic].*
FROM sys.columns [c]
JOIN sys.index_columns [ic]
ON [ic].[object_id] = [c].[object_id]
AND [ic].[column_id] = [c].[column_id]
JOIN sys.indexes [i]
ON [i].[object_id] = [ic].[object_id]
AND [i].[index_id] = [ic].[index_id]
JOIN sys.objects [o]
ON [i].[object_id] = [o].[object_id]
JOIN sys.schemas [s]
ON [o].[schema_id] = [s].[schema_id]
WHERE [c].[is_computed] = 1;
That code found six nonclustered indexes in the AdventureWorks2014 database. I disabled all six of the indexes (using ALTER INDEX … DISABLE) and re-ran DBCC CHECKDB and it completed in around 18 minutes. So the DBCC_OBJECT_METADATA latch bottleneck was a major factor in causing DBCC CHECKDB to run more than 16 times slower!
Summary
Unfortunately, disabling nonclustered indexes using computed columns (and then later re-enabling them using ALTER INDEX … REBUILD) is the *only* way to remove the DBCC_OBJECT_METADATA latch bottleneck in versions prior to SQL Server 2016 while still keeping all the other functionality of DBCC CHECKDB. Disabling nonclustered indexes likely isn’t something you’re going to want to do in a production environment unless you have a zero activity maintenance window. This means you’re probably only going to disable those nonclustered indexes to remove the bottleneck if your consistency checks are offloaded onto another server using the backup-copy-restore-CHECKDB method.
Another way of doing it is to use the WITH PHYSICAL_ONLY option when running DBCC CHECKDB but then you miss out on all the in-depth logical checks, so I’m not a big fan of recommending that as the solution.
Respected Paul,
As usual, YOU ARE THE GREATEST