Paul Randal

Knee-Jerk Performance Tuning : Incorrect Use of Temporary Tables

April 6, 2016 by in T-SQL Queries | 17 Comments
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Itzik is a T-SQL trainer, a co-founder of SolidQ, and blogs about T-SQL fundamentals and query tuning.

Itzik’s Posts

In this continuation of my "knee-jerk performance tuning" series, I'd like to discuss four common problems I see with using temporary tables. Any one of these problems can cripple a workload, so they're worth knowing about and looking for in your environment.

Problem 1: Using Temporary Tables Where They're Not Needed

Temporary Tables https://www.flickr.com/photos/tea_time/3890677277/https://www.flickr.com/photos/tea_time/3890677277/

Temporary tables have a variety of uses (probably the most common is to store an intermediate result set for later use), but you have to remember that when you introduce a temporary table into a query, you're interrupting the flow of data through the query processor.

Think of the population of a temporary table as a hard stop, as there's a query (let's call it the producer) to produce the intermediate result set, which is then stored in the temporary table in tempdb, and then the next query (let's call it the consumer) has to read the data from the temporary table again.

I've often found that some parts of a workload actually perform better when the temporary table is completely removed, so the data flows from the producer part of the query to the consumer part of the query without having to be persisted in tempdb, and the query optimizer can produce a more optimal overall plan.

You might now be thinking, "so why would someone use a temporary table if it makes things slower?" – and rightly so! In cases like that, I've found that the use of a temporary table has become institutionalized in the development team; someone found that using a temporary table increased performance many years ago, so temporary tables became the default design choice.

This can be a hard thing to change, especially if you have a senior developer or manager who's convinced that temporary tables should always be used. The simple thing to try is to pick an expensive query (for instance, a long-running one, or one that's executed many times per second) and remove one or more of the temporary tables to see whether performance increases without them. And if so, there's your proof to show the intransigents!

Problem 2: Lack of Filtering When Populating Temporary Tables

Even if you can't remove a temporary table, you may be able to drastically improve performance by making sure that the code that populates the temporary table is correctly filtering the data pulled from source tables.

I've lost count of the number of times I've seen a temporary table being populated with code that starts as SELECT *, includes a few unrestrictive joins, and has no WHERE clause, and then the later query that uses the temporary table only uses a few columns and has a WHERE clause to hugely scope down the number of rows.

I remember one case where a temporary table in a stored procedure was aggregating 15 years' worth of data from the main database, and then only the current year's data was being used. This was repeatedly causing tempdb to grow until it ran out of space on the disk volume, and the stored procedure would then fail.

Whenever you're populating a temporary table, only use the source table columns that are necessary, and only use the rows that are necessary – i.e. push the filter predicates up into the temporary table population code. Not only will this save space in tempdb, it will also save a lot of time from not having to copy unneeded data from the source table (and potentially remove the need to read source database pages from disk in the first place).

Problem 3: Incorrect Temporary Table Indexing

Just like with regular tables, you should only create the indexes that are actually going to be used by the later query code to help the query performance. I've seen plenty of cases where there's a nonclustered index per temporary table column, and single-column indexes that are chosen without analyzing the later code are often quite useless. Now combine useless nonclustered indexes with a lack of filtering when populating the temporary table, and you've got a recipe for enormous bloating of tempdb.

Also, in general, it's faster to create the indexes after the table has been populated. This gives the added bonus that the indexes will have accurate statistics, which can further help the query as the query optimizer will be able to do accurate cardinality estimation.

Having a bunch of nonclustered indexes that aren't used wastes not just disk space, but also the time needed to create them. If this is in code that's executed frequently, removing these unneeded indexes that are created every time the code runs can have a significant effect on overall performance.

Problem 4: tempdb Latch Contention

It's quite common for there to be a latching bottleneck in tempdb that can be traced back to temporary table usage. If there are lots of concurrent connections running code that creates and drops temporary tables, access to the database's allocation bitmaps in memory can become a significant bottleneck.

This is because only one thread at a time can be changing an allocation bitmap to mark pages (from the temp table) as allocated or deallocated, and so all the other threads have to wait, decreasing the workload throughput. Even though there has been a temporary table cache since SQL Server 2005, it's not very large, and there are restrictions on when the temporary table can be cached (e.g. only when it's less than 8MB in size).

Traditional ways to work around this problem have been to use trace flag 1118 and multiple tempdb data files (see this blog post for more info), but another thing to consider is to remove the temporary tables altogether!

Summary

Temporary tables can be very useful, but they're very easily and commonly used incorrectly. Whenever you're writing (or reviewing code) that's using a temporary table, consider the following:

  • Is this temporary table really needed?
  • Is the code that populates the table using the correct filtering to limit the temporary table size?
  • Are indexes created after table population (in general) and are the indexes being used by later code?

Paul White has a couple of great posts (here and here) about temporary object usage and caching that I recommend reading too.

And one last thing, if you decide not to use a temporary table, don't just replace it with a table variable, a common table expression, or a cursor (all of which are common ways that people try to "optimize away" the temporary table) – figure out the most efficient way to (re)write the code – there is no "one size fits all" answer.

Until next time, happy troubleshooting!