In several of my posts over the last year I’ve used the theme of people seeing a particular wait type and then reacting in a “knee-jerk” fashion to the wait being there. Typically this means following some poor Internet advice and taking a drastic, inappropriate action or jumping to a conclusion about what the root cause of the issue is and then wasting time and effort on a wild-goose chase.
One of the wait types where knee-jerk reactions are strongest, and where some of the poorest advice exists, is the CXPACKET wait. It’s also the wait type that is most commonly the top wait on people’s servers (according to my two big wait types surveys from 2010 and 2014 – see here for details), so I’m going to cover it in this post.
What does the CXPACKET wait type mean?
The simplest explanation is that CXPACKET means you’ve got queries running in parallel and you will *always* see CXPACKET waits for a parallel query. CXPACKET waits do NOT mean you have problematic parallelism – you need to dig deeper to determine that.
As an example of a parallel operator, consider the Repartition Streams operator, which has the following icon in graphical query plans:
And here’s a picture that shows what’s going on in terms of parallel threads for this operator, with degree-of-parallelism (DOP) equal to 4:
For DOP = 4, there will be four producer threads, pulling data from earlier in the query plan, the data then goes back out to the rest of the query plan through four consumer threads.
You can see the various threads in a parallel operator that are waiting for a resource using the sys.dm_os_waiting_tasks DMV, in the exec_context_id column (this post has my script for doing this).
There is always a 'control' thread for any parallel plan, which by historical accident is always thread ID 0. The control thread always registers a CXPACKET wait, with the duration equal to the length of time the plan takes to execute. Paul White has a great explanation of threads in parallel plans here.
The only time non-control threads will register CXPACKET waits is if they complete before the other threads in the operator. This can happen if one of the threads gets stuck waiting for a resource for a long time, so look to see what the wait type is of the thread not showing CXPACKET (using my script above) and troubleshoot appropriately. This can also happen because of a skewed work distribution among the threads, and I’ll go into more depth on that case in my next post here (it’s caused by out-of-date statistics and other cardinality estimation problems).
Note that in SQL Server 2016 SP2 and SQL Server 2017 RTM CU3, consumer threads no longer register CXPACKET waits. They register CXCONSUMER waits, which are benign and can be ignored. This is to reduce the number of CXPACKET waits being generated, and the remaining ones are more likely to be actionable.
Given that CXPACKET simply means you’ve got parallelism happening, the first thing to look at is whether you expect parallelism for the query that’s using it. My query will give you the query plan node ID where the parallelism is happening (it pulls out the node ID from the XML query plan if the wait type of the thread is CXPACKET) so look for that node ID and determine whether the parallelism makes sense.
One of the common cases of unexpected parallelism is when a table scan happens where you’re expecting a smaller index seek or scan. You’ll either see this in the query plan or you’ll see lots of PAGEIOLATCH_SH waits (discussed in detail here) along with the CXPACKET waits (a classic wait statistics pattern to look out for). There are a variety of causes of unexpected table scans, including:
- Missing nonclustered index so a table scan is the only alternative
- Out-of-date statistics so the Query Optimizer thinks a table scan is the best data access method to use
- An implicit conversion, because of a data type mismatch between a table column and a variable or parameter, which means a nonclustered index cannot be used
- Arithmetic being performed on a table column instead of a variable or parameter, which means a nonclustered index cannot be used
In all these cases, the solution is dictated by what you find the root cause to be.
But what if there’s no obvious root case and the query is just deemed expensive enough to warrant a parallel plan?
Among other things, the Query Optimizer decides to produce a parallel query plan if the serial plan has a higher cost than the
cost threshold for parallelism, an sp_configure setting for the instance. The cost threshold for parallelism (or CTFP) is set to five by default, which means that a plan doesn’t have to be very expensive to trigger the creation of a parallel plan.
One of the easiest ways to prevent unwanted parallelism is to increase the CTFP to a much higher number, with the higher you set it, the less likely parallel plans will be created. Some people advocate setting CTFP to somewhere between 25 and 50, but as with all tweakable settings, it’s best to test various values and see what works best for your environment. If you’d like a bit more of a programmatic method to help picking a good CTFP value, Jonathan wrote a blog post showing a query to analyze the plan cache and produce a suggested value for CTFP. As examples, we have one client with CTFP set to 200, and another set to the maximum – 32767 – as a way of forcibly preventing any parallelism whatsoever.
You might wonder why the second client had to use CTFP as a sledgehammer method for preventing parallelism when you’d think they could simply set the server ‘max degree of parallelism’ (or MAXDOP) to 1. Well, anyone with any permission level can specify a query MAXDOP hint and override the server MAXDOP setting, but CTFP cannot be overridden.
And that’s another method of limiting parallelism – setting a MAXDOP hint on the query you don’t want to go parallel.
You could also lower the server MAXDOP setting, but that’s a drastic solution as it can prevent everything from using parallelism. It’s common nowadays for servers to have mixed workloads, for instance with some OLTP queries and some reporting queries. If you lower the server MAXDOP, you’re going to hobble the performance of the reporting queries.
A better solution when there’s a mixed workload would be to use CTFP as I described above or to utilize Resource Governor (which is Enterprise-only I’m afraid). You can use Resource Governor to separate the workloads into workload groups, and then set a MAX_DOP (the underscore isn’t a typo) for each workload group. And the good thing about using Resource Governor is that the MAX_DOP cannot be overridden by a MAXDOP query hint.
Don't fall into the trap of thinking that CXPACKET waits automatically mean you’ve got bad parallelism happening, and certainly don’t follow some of the Internet advice I’ve seen of slamming the server by setting MAXDOP to 1. Take the time to investigate why you’re seeing CXPACKET waits and whether it’s something to be addressed or just an artifact of a workload that’s running correctly.
As far as general wait statistics are concerned, you can find more information about using them for performance troubleshooting in:
- My SQLskills blog post series, starting with Wait statistics, or please tell me where it hurts
- My Wait Types and Latch Classes library here
- My Pluralsight online training course SQL Server: Performance Troubleshooting Using Wait Statistics
- SQL Sentry Performance Advisor
In the next article in the series, I'll discuss skewed parallelism and give you a simple way to see it happening. Until then, happy troubleshooting!