More on CXPACKET Waits: Skewed Parallelism

In my previous post, I discussed CXPACKET waits and ways to prevent or limit parallelism. I also explained how the control thread in a parallel operation always registers a CXPACKET wait, and that sometimes non-control threads may also register CXPACKET waits. This can happen if one of the threads is blocked waiting for a resource (so all the other threads finish before it and register CXPACKET waits also), or if cardinality estimates are incorrect. In this post I’d like to explore the latter.

When cardinality estimates are incorrect, the parallel threads doing the query work are given uneven amounts of work to do. The typical case is where one thread is given all the work, or way more work than the other threads. This means that those threads that finish processing their rows (if they were even given any) before the slowest thread register a CXPACKET from the moment they finish until the slowest thread finishes. This problem can lead to a seeming explosion in CXPACKET waits occurring and is commonly called skewed parallelism, because the distribution of work between the parallel threads is skewed, not even.

Note that in SQL Server 2016 SP2 and SQL Server 2017 RTM CU3, consumer threads no longer register CXPACKET waits. They register CXCONSUMER waits, which are benign and can be ignored. This is to reduce the number of CXPACKET waits being generated, and the remaining ones are more likely to be actionable.

Example of Skewed Parallelism

I’ll walk through a contrived example to show how to identify such cases.

First off, I’ll create a scenario where a table has wildly inaccurate statistics, by manually setting the number of rows and pages in an UPDATE STATISTICS statement (don’t do this in production!):

USE [master];
GO

IF DB_ID (N'ExecutionMemory') IS NOT NULL
BEGIN
    ALTER DATABASE [ExecutionMemory] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [ExecutionMemory];
END
GO

CREATE DATABASE [ExecutionMemory];
GO
USE [ExecutionMemory];
GO

CREATE TABLE dbo.[Test] (
    [RowID] INT IDENTITY,
    [ParentID] INT,
    [CurrentValue] NVARCHAR (100),
    CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED ([RowID]));
GO

INSERT INTO dbo.[Test] ([ParentID], [CurrentValue])
SELECT 
    CASE WHEN ([t1].[number] % 3 = 0)
        THEN [t1].[number] – [t1].[number] % 6
        ELSE [t1].[number] END, 
    'Test' + CAST ([t1].[number] % 2 AS VARCHAR(11))
FROM [master].[dbo].[spt_values] AS [t1]
WHERE [t1].[type] = 'P';
GO

UPDATE STATISTICS dbo.[Test] ([PK_Test]) WITH ROWCOUNT = 10000000, PAGECOUNT = 1000000;
GO

So my table only has a few thousand rows in it, but I’ve faked it having 10 million rows.

Now I’ll create a contrived query to select the top 500 rows, which will go parallel as it thinks there are millions of rows to scan.

USE [ExecutionMemory];
GO

SET NOCOUNT ON;
GO

DECLARE @CurrentValue NVARCHAR (100);

WHILE (1=1)
SELECT TOP (500) 
    @CurrentValue = [CurrentValue]
FROM dbo.[Test]
ORDER BY NEWID() DESC;
GO

And set that running.

Viewing the CXPACKET Waits

Now I can look at the CXPACKET waits that are occurring using a simple script to look at the sys.dm_os_waiting_tasks DMV:

SELECT
    [owt].[session_id],
    [owt].[exec_context_id],
    [owt].[wait_duration_ms],
    [owt].[wait_type],
    [owt].[blocking_session_id],
    [owt].[resource_description],
    [er].[database_id],
    [eqp].[query_plan]
FROM sys.dm_os_waiting_tasks [owt]
INNER JOIN sys.dm_exec_sessions [es] ON
    [owt].[session_id] = [es].[session_id]
INNER JOIN sys.dm_exec_requests [er] ON
    [es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
WHERE
    [es].[is_user_process] = 1
ORDER BY
    [owt].[session_id],
    [owt].[exec_context_id];

If I execute this a few times, eventually I see some results showing skewed parallelism (I stripped off the query plan handle link and curtailed the resource description, for clarity, and notice I put in the code to grab the SQL text if you want that too):

session_id	exec_context_id	wait_duration_ms	wait_type	blocking_session_id	resource_description	database_id
`56`	`0`	`1`	`CXPACKET`	`NULL`	`exchangeEvent`	`13`
`56`	`1`	`1`	`CXPACKET`	`56`	`exchangeEvent`	`13`
`56`	`3`	`1`	`CXPACKET`	`56`	`exchangeEvent`	`13`
`56`	`4`	`1`	`CXPACKET`	`56`	`exchangeEvent`	`13`
`56`	`5`	`1`	`CXPACKET`	`56`	`exchangeEvent`	`13`
`56`	`6`	`1`	`CXPACKET`	`56`	`exchangeEvent`	`13`
`56`	`7`	`1`	`CXPACKET`	`56`	`exchangeEvent`	`13`

Results showing skewed parallelism in action

The control thread is the one with exec_context_id set to 0. The other parallel threads are those with exec_context_id higher than 0, and they’re all showing CXPACKET waits apart from one (note that exec_context_id = 2 is missing from the list). You’ll notice that they all list their own session_id as the one that’s blocking them, and that’s correct because all the threads are waiting for another thread from their own session_id to complete. The database_id is the database in whose context the query is being executed, not necessarily the database where the problem is, but it usually is unless the query is using three-part naming to execute in a different database.

Viewing the Cardinality Estimation Problem

With the query_plan column in the query output (which I removed for clarity), you can click on it to bring up the graphical plan and then right-click and select View with SQL Sentry Plan Explorer. This shows as below:

I can immediately see that there’s a cardinality estimate issue, as the Actual Rows for the Clustered Index Scan is only 2,048, compared to 10,000,000 Est (Estimated) Rows.

If I scroll across, I can see the distribution of rows across the parallel threads that were used:

Lo and behold, only a single thread was doing any work during the parallel portion of the plan – the one that didn’t show up in the sys.dm_os_waiting_tasks output above.

In this case, the fix is to update the statistics for the table.

In my contrived example that won’t work, as there haven’t been any modifications to the table, so I’ll re-run the set up script, leaving out the UPDATE STATISTICS statement.

The query plan then becomes:

Where there’s no cardinality problem and no parallelism either – problem solved!

Summary

If you see CXPACKET waits occurring, it’s easy to check for skewed parallelism, using the method described above. All the cases I’ve seen have been due to cardinality estimation issues of one kind or another, and often it’s simply a case of updating statistics.

As far as general wait statistics are concerned, you can find more information about using them for performance troubleshooting in:

My SQLskills blog post series, starting with Wait statistics, or please tell me where it hurts
My Wait Types and Latch Classes library here
My Pluralsight online training course SQL Server: Performance Troubleshooting Using Wait Statistics
SQL Sentry

Until next time, happy troubleshooting!

9 thoughts on “More on CXPACKET Waits: Skewed Parallelism”

Gustavo says:

October 2, 2015 at 4:50 PM

Wouldn't be great to have two different distinct wait types, CXPACKET_COORDINATOR and CXPACKET_WORKER ?
Paul Randal says:

October 3, 2015 at 12:21 AM

Agreed :-)
Randall Petty says:

August 10, 2016 at 6:31 PM

I tested some of these ideas yesterday and posted a result on the question and answer section:
https://answers.sqlperformance.com/questions/3794/still-skewed-parallelism-after-update-stats-with-f.html
Merav says:

July 3, 2017 at 12:56 PM

My apologies in case my questions sounds basic. I am an expert in Oracle Performance tuning, trying to get my way into the Sql Server world.. There are some resemblance of course, yet, there are some straggling also..
I don’t have the nice tool you use, I can only use queries in Management Studio, dm view, etc.

I run the query you gave in the post, but when I click on the query_plan field to open an XML plan, I don't see any actual rows field in the plan, only estimated rows indication.
Why is that, and where can I see the actual plan information (compare between estimated row to actual rows for each step in the plan, and for each thread).
BTW, I am working with sql server 2008.
Paul Randal says:

July 6, 2017 at 12:00 AM

No problem at all – everyone starts with zero knowledge of SQL Server. You only get the estimated plan that way. You'll need to execute the query manually through Management Studio and turn on the Include Actual Execution Plan button.
merav says:

July 6, 2017 at 2:08 AM

Thanks Paul for your answer.
So if I understand your answer correctly, you are saying that SQL Server does not expose the actual plan of currently running SQLs, and the only way to see actual plan information is only when I run the SQL manually by myself?
Another related question please –
Does SQL Server expose in some way the bind variables values used by the application while running the SQL? If so, where can I see it?
Paul Randal says:

July 6, 2017 at 7:46 PM

It does, but not until later versions – you're using a really old, unsupported version.

See https://dba.stackexchange.com/questions/142488/in-sql-server-is-there-a-way-to-determine-the-values-of-the-parameters-passed-t
Lajos Nagy says:

April 6, 2018 at 8:25 AM

69 0 99782 CXPACKET 69 exchangeEvent id=Port1bd4f920900 WaitType=e_waitPortOpen nodeId=0 60
69 0 99782 CXPACKET 69 exchangeEvent id=Port1bd4f920900 WaitType=e_waitPortOpen nodeId=0 60
69 0 99782 CXPACKET 69 exchangeEvent id=Port1bd4f920900 WaitType=e_waitPortOpen nodeId=0 60
69 0 99782 CXPACKET 69 exchangeEvent id=Port1bd4f920900 WaitType=e_waitPortOpen nodeId=0 60
69 0 99782 CXPACKET 69 exchangeEvent id=Port1bd4f920900 WaitType=e_waitPortOpen nodeId=0 60
69 0 99782 CXPACKET 69 exchangeEvent id=Port1bd4f920900 WaitType=e_waitPortOpen nodeId=0 60

I run your query as I wanted to know why I got CXPACKET wait type, but not really can follow what is happening here as the above mentioned facts cannot be applied.
Paul Randal says:

April 6, 2018 at 9:19 PM

I'm afraid it's impossible to tell from just that information and without seeing the query plan too. That might be normal for a producer/consumer operator.

Comments are closed.