Knee-Jerk Wait Statistics : PAGELATCH

Over the last 18 months I’ve been focusing on knee-jerk reactions to wait statistics analysis and other performance-tuning related topics, and in this post I’m going to continue that and discuss the PAGELATCH_XX waits. The XX at the end of the wait means that there are multiple types of PAGELATCH wait, and the most common examples are:

PAGELATCH_SH – (SHare) waiting for access to a data file page in memory so that the page contents can be read
PAGELATCH_EX or PAGELATCH_UP – (EXclusive or UPdate) waiting for access to a data file page in memory so that the page contents can be modified

When one of these wait types is the most prevalent on a server, the knee-jerk reaction is that the problem is something to do with I/O (i.e. confusion with the PAGEIOLATCH_XX wait type, which I covered in a post back in 2014) and someone tries adding more memory or tweaking the I/O subsystem. Neither of these reactions will have any effect at all, as the data file pages under contention are already in memory in the buffer pool!

In all cases, you can see whether you have a problem with PAGELATCH_XX contention using the sys.dm_os_waiting_tasks script on my blog or using a tool like Performance Advisor, as demonstrated (for a different wait type) in this post.

So what’s the source of the contention? First I’ll explain the background behind these wait types, and then I’ll discuss the two most common causes of PAGELATCH_XX contention.

Background: Latches

Before I go into some of the causes of PAGELATCH_XX waits, I want to explain why they even exist.

In any multi-threaded system, data structures that can be accessed and manipulated by multiple threads need to be protected to prevent scenarios such as:

Two threads updating a data structure concurrently, and some of the updates are lost
A thread updating a data structure concurrently with another thread reading the data structure, so the reading thread sees a mixture of old and new data

This is basic computer science, and SQL Server is no different, so all data structures inside SQL Server need to have multi-threaded access control.

One of the mechanisms that SQL Server uses to do this is called a latch, where holding the latch in exclusive mode prevents other threads from accessing the data structure, and holding the latch in share mode prevents other threads from changing the data structure. SQL Server also uses spinlocks for some data structures and I discussed these in this post back in 2014.

But why is a data file page in memory protected by a latch, you might wonder? Well, a data file page is just a data structure, albeit a special purpose one, and so needs the same access controls as any other data structure. So when one thread needs to modify a data file page it needs to acquire an exclusive or update latch on the page, and if it can’t and needs to wait, the wait type PAGELATCH_EX or PAGELATCH_UP results.

Classic tempdb Contention

PAGELATCH contention in tempdb is typically on allocation bitmaps and occurs with workloads with many concurrent connections creating and dropping small temporary tables (which are stored in tempdb).

When the first row is inserted into a temporary table, two pages must be allocated (a data page and an IAM page, which tracks the data page). These pages need to be marked as allocated in a special allocation page called a PFS page, and by default are allocated from special data extents that are tracked by another allocation page called an SGAM page (details of these can be found in my old blog post here). When the temporary table is dropped, these pages need to be deallocated again, necessitating more changes to the PFS and SGAM pages.

If the temporary tables are small, and the cumulative size of all concurrently created temporary tables is less than 64MB, then all these allocation bitmap changes are centered on the very first PFS and SGAM pages in the tempdb data file (with page ID (1:1) and (1:3) respectively). Updating one of these allocation pages requires latching the page, and only one thread at a time can be changing the page, so all other threads have to wait – with wait type PAGELATCH_UP.

From SQL Server 2005 onwards, temporary tables can be cached when dropped, as long as they’re less than 8MB in size (and in SQL Server 2014 aren’t created in a stored procedure that also has DDL statements on the temporary table). This means that the next thread that executes the same query plan can take the temporary table out of the cache and not have to deal with the initial allocations. This cuts down on contention on the allocation bitmaps, but the temporary table cache isn’t very big, so workloads with hundreds of concurrent temporary table creates/drops will still see lots of contention.

It’s trivial to prevent the contention on the SGAM pages in tempdb by enabling documented trace flag 1118 on the server, which I say should be enabled on all servers across the world, and is actually the unchangeable default behavior in SQL Server 2016.

Preventing contention on the PFS pages in tempdb is a bit more difficult. Assuming that the temporary tables are needed for performance, the trick is to have multiple data files for tempdb so that the allocations are done round-robin among the files, the contention is split over multiple PFS pages, and so the overall contention goes down. There is no right answer for how many data files you should have unfortunately. You can read more about the generally accepted guidance on this in KB article 2154845 and in this blog post.

Insert Hotspot

In user databases, a common cause of high number of PAGELATCH_EX waits is an insert hotspot.

This can occur when a table has a clustered index with an int or bigint cluster key, and a row size that’s small enough so that many tens or more table rows can fit on a data page at the leaf level of the clustered index.

For such a table, if the workload involves many tens or hundreds of concurrent threads inserting into the table, many of the threads will generate rows with identity values (and hence cluster keys) that need to be inserted onto the same leaf-level data page.

Now remember that making any change to a data file page in memory requires an exclusive latch, so each of the threads trying to insert onto the same page must acquire the page’s latch exclusively. While each thread is holding the exclusive latch, the other threads will be waiting for PAGELATCH_EX for that page, essentially making the concurrent inserts into a hugely-bottlenecked synchronous process.

There are a few possible fixes for this problem:

Use a more random key, and recognize that this will lead to index fragmentation so also make use of an index fill factor to help prevent page splits
Spread the inserts out in the table using some kind of artificial partitioning mechanism
Use a longer table row size (this is obviously the least palatable option)

I’ve seen an insert hotspot like this crop up when someone’s tried to remove index fragmentation problems by changing a random GUID cluster key to an int or bigint identity cluster key, but fail to test the new table schema under production loads.

Summary

Just as with other wait types, understanding exactly what PAGELATCH_XX waits mean is key to understanding how to troubleshoot them.

As far as general wait statistics are concerned, you can find more information about using them for performance troubleshooting in:

My SQLskills blog post series, starting with Wait statistics, or please tell me where it hurts
My Wait Types and Latch Classes library here
My Pluralsight online training course SQL Server: Performance Troubleshooting Using Wait Statistics
SQL Sentry Performance Advisor

Until next time, happy troubleshooting!

4 thoughts on “Knee-Jerk Wait Statistics : PAGELATCH”

Viktor says:

October 28, 2015 at 6:30 PM

Hi Paul!

Thanks for all these info, great explanation as usual!
Do you know what's Microsoft's official view on TF1118 these days? We know it's documented in KB articles, CSS Blogs, MSDN articles so it's documented and should be supported (and many people use it), still someone posted recently that a SQL health-check highlighted these along with other TFs as "bad things". Is this because of it being a trace flag per-se? The impact on user databases as you and the KB articles noted might only be slight extra disk space usage potentially only for new tables created or existing mixed extent allocated indexes when rebuilt. If you turn TF1118 off, an index rebuild will allocate to mixed extents again.

Also, do you know if with SQL2016, they use TF1118 behaviour by default only for tempdb in particular, or the behaviour is ON as if TF1118 was enabled globally on the server with a previous version of SQL Server? (thus mixed extent allocation is now removed for data pages completely server-wide?)
Paul Randal says:

October 29, 2015 at 8:06 PM

Hi Viktor – you're welcome :-)

Microsoft's view on TF1118 is that it should be enabled. In fact, it will the unchangeable default for SQL Server 2016.

I think a health check that flags 1118 as a bad thing isn't a very good health check. Trace flags in general should be turned on for specific reasons, and any unrecognized trace flag should be reported, but 1118 is a widely-known and accepted good thing.

1118 always works globally, so mixed extents will effectively be gone for new allocations in SQL Server 2016.

Cheers
Viktor says:

January 25, 2016 at 3:15 PM

Hi Paul,

Thanks for your answer.
Just checked SQL2016 CTP 3.2, and mixed extent allocation seems to be gone for tempdb only, user db allocations still seem to be using them. Is this going to change, or the PG has changed their mind in the meantime, or am I missing something? :)

Cheers.
Paul Randal says:

February 8, 2016 at 10:36 PM

Yup – 1118 behavior is on for tempdb in SQL 2016 – it doesn't matter for other databases.

Comments are closed.