Paul Randal

SQL Server Latches – Other Latches To Know About

September 16, 2021 by in SQL Performance | No Comments
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Jonathan Kehayias is a Principal Consultant with SQLskills and the youngest MCM ever.

Jonathan’s Posts

To finish up my short series of articles on latches, this time I’m going to discuss a few other latches in SQL Server you might see occasionally but don’t merit a complete article by themselves. As usual, I strongly recommend you read the initial post in the series before this one, so you have all the general background knowledge about latches.

The LOG_MANAGER Latch

The LOG_MANAGER latch is used for synchronization during some operations involving the transaction log, and there is one LOG_MANAGER latch per database (as each database has its own log manager). It can only be acquired in exclusive mode and can be a bottleneck during transaction log file growth. The scenario where it will become evident as a problem is:

  • The log file has a small autogrowth set
  • There are many concurrent connections generating transaction log records
  • The log file keeps having to grow

When the log file runs out of space, it will need to grow. The first thread to realize more log space is required acquires the LOG_MANAGER latch in EX mode and proceeds to grow the log file. Many other threads continue to try to generate log records and get into the queue for the LOG_MANAGER latch, so they can grow the log file. When the first thread releases the latch, the next one gets it and realizes the log has grown already, so drops it and continues. And so on and so on. Incidentally, this pattern of bottleneck is called a latch convoy.

You can kind of think of it as exactly the same bottleneck as with the FGCB_ADD_REMOVE latch I discussed earlier in the series, but with log file growth instead of data file growth. However, with the FGCB_ADD_REMOVE latch, usually the instance has instant file initialization enabled, so the file growth is very fast, but with the LOG_MANAGER latch, the log *must* be zero-initialized, and the time wasted in the latch queue is longer.

The solution for this bottleneck has three parts:

  1. Set the log file autogrowth correctly, so the log isn’t growing frequently
  2. Size the log correctly for the workload, so the log shouldn’t grow at all
  3. Make sure the log is clearing correctly, so the log doesn’t need to grow

If these are all in place, you shouldn’t see the LOG_MANAGER latch be a regular bottleneck, and I talk more about this in my post here.

The ACCESS_METHODS_DATASET_PARENT Latch

When either a heap or an index is being accessed, internally there’s an object called a HeapDataSetSession or IndexDataSetSession, respectively. When a parallel scan is being performed, the threads doing the actual work of the scan each have a “child” dataset (another instance of the two objects I just described), and the main dataset, which is really controlling the scan, is called the “parent.”

When one of the scan worker threads has exhausted the set of rows it’s supposed to scan, it needs to get a new range by accessing the parent dataset, which means acquiring the ACCESS_METHODS_DATASET_PARENT latch in exclusive mode. While this can seem like a bottleneck, it’s not really, and there’s nothing you can do to stop the threads performing a parallel scan from occasionally showing a LATCH_EX wait for this latch.

The question you should be asking yourself is: should this query be performing a parallel scan in the first place? It’s entirely possible something happened to force the query plan to include a parallel scan when that may not the most efficient way for the query to run. Examples of things that could cause a plan to change to a parallel scan include:

  • Out-of-date statistics
  • A missing or dropped nonclustered index
  • New code forcing a scan because of an implicit conversion—a data type mismatch between a column and a variable/parameter, which precludes the use of a nonclustered index
  • New code forcing a scan because arithmetic is being performed on a table column instead of a variable/parameter, which again precludes the use of a nonclustered index
  • Data growth occurring and a scan really is the most efficient plan

Or it could be this query requires a scan, in which case LATCH_EX waits for ACCESS_METHODS_DATASET_PARENT are just part of your environment.

The ACCESS_METHODS_HOBT_VIRTUAL_ROOT Latch

Each instance of this latch protects an entry in the Storage Engine metadata for a b-tree, specifically the page ID of the root page of the b-tree (the page at the top of the triangle that we generally think of as an index). I’m specifically saying b-tree and not index, as an index can have multiple partitions, each of which has a b-tree (essentially a portion of the overall index, but with low value and high value key constraints).

Every time a thread needs to traverse a b-tree, it has to start at the root page and work its way down to the leaf level. To read the metadata containing the page ID of the root page, the thread has to acquire the ACCESS_METHODS_HOBT_VIRTUAL_ROOT latch in SH mode, to make sure the page ID isn’t in the process of changing. When a thread needs to change the page ID of the root page, it has to acquire the latch in EX mode.

Why would the root page of a b-tree ever change? As the number of index records in the root page grows, eventually it will fill up and a page split will occur. When that happens, the current root page and the page it splits into become a new level in the b-tree, and a brand-new root page is created, with two index records in, pointing at the old root page and the page it split into. The new root page ID has to be entered into metadata, so the latch is acquired in EX mode. This will happen a few times quickly as an index on an empty table starts to be populated by inserts but isn’t something you’ll see as an ongoing performance bottleneck issue.

Summary

As I’m sure you’ve gathered from this series, understanding latches and latch bottlenecks involves knowing a bit more about what’s going on inside the Storage Engine than for general wait statistics analysis.

I usually advise people *not* to start performance troubleshooting by looking at latch statistics (via sys.dm_os_latch_stats) but instead to always start with wait statistics (see my post here) and only delve into latches if LATCH_EX or LATCH_SH are one of the top handful of waits on the SQL Server instance.

If you have any questions about latches, feel free to drop me a line.