Paul Randal

More on CXPACKET Waits: Skewed Parallelism

August 26, 2015 by in SQL Performance, Waits | 9 Comments
SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

In my previous post, I discussed CXPACKET waits and ways to prevent or limit parallelism. I also explained how the control thread in a parallel operation always registers a CXPACKET wait, and that sometimes non-control threads may also register CXPACKET waits. This can happen if one of the threads is blocked waiting for a resource (so all the other threads finish before it and register CXPACKET waits also), or if cardinality estimates are incorrect. In this post I’d like to explore the latter.

When cardinality estimates are incorrect, the parallel threads doing the query work are given uneven amounts of work to do. The typical case is where one thread is given all the work, or way more work than the other threads. This means that those threads that finish processing their rows (if they were even given any) before the slowest thread register a CXPACKET from the moment they finish until the slowest thread finishes. This problem can lead to a seeming explosion in CXPACKET waits occurring and is commonly called skewed parallelism, because the distribution of work between the parallel threads is skewed, not even.

Note that in SQL Server 2016 SP2 and SQL Server 2017 RTM CU3, consumer threads no longer register CXPACKET waits. They register CXCONSUMER waits, which are benign and can be ignored. This is to reduce the number of CXPACKET waits being generated, and the remaining ones are more likely to be actionable.

Example of Skewed Parallelism

I’ll walk through a contrived example to show how to identify such cases.

First off, I’ll create a scenario where a table has wildly inaccurate statistics, by manually setting the number of rows and pages in an UPDATE STATISTICS statement (don’t do this in production!):

USE [master];
GO

IF DB_ID (N'ExecutionMemory') IS NOT NULL
BEGIN
    ALTER DATABASE [ExecutionMemory] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [ExecutionMemory];
END
GO

CREATE DATABASE [ExecutionMemory];
GO
USE [ExecutionMemory];
GO

CREATE TABLE dbo.[Test] (
    [RowID] INT IDENTITY,
    [ParentID] INT,
    [CurrentValue] NVARCHAR (100),
    CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED ([RowID]));
GO

INSERT INTO dbo.[Test] ([ParentID], [CurrentValue])
SELECT 
    CASE WHEN ([t1].[number] % 3 = 0)
        THEN [t1].[number] – [t1].[number] % 6
        ELSE [t1].[number] END, 
    'Test' + CAST ([t1].[number] % 2 AS VARCHAR(11))
FROM [master].[dbo].[spt_values] AS [t1]
WHERE [t1].[type] = 'P';
GO

UPDATE STATISTICS dbo.[Test] ([PK_Test]) WITH ROWCOUNT = 10000000, PAGECOUNT = 1000000;
GO

So my table only has a few thousand rows in it, but I’ve faked it having 10 million rows.

Now I’ll create a contrived query to select the top 500 rows, which will go parallel as it thinks there are millions of rows to scan.

USE [ExecutionMemory];
GO

SET NOCOUNT ON;
GO

DECLARE @CurrentValue NVARCHAR (100);

WHILE (1=1)
SELECT TOP (500) 
    @CurrentValue = [CurrentValue]
FROM dbo.[Test]
ORDER BY NEWID() DESC;
GO

And set that running.

Viewing the CXPACKET Waits

Now I can look at the CXPACKET waits that are occurring using a simple script to look at the sys.dm_os_waiting_tasks DMV:

SELECT
    [owt].[session_id],
    [owt].[exec_context_id],
    [owt].[wait_duration_ms],
    [owt].[wait_type],
    [owt].[blocking_session_id],
    [owt].[resource_description],
    [er].[database_id],
    [eqp].[query_plan]
FROM sys.dm_os_waiting_tasks [owt]
INNER JOIN sys.dm_exec_sessions [es] ON
    [owt].[session_id] = [es].[session_id]
INNER JOIN sys.dm_exec_requests [er] ON
    [es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
WHERE
    [es].[is_user_process] = 1
ORDER BY
    [owt].[session_id],
    [owt].[exec_context_id];

If I execute this a few times, eventually I see some results showing skewed parallelism (I stripped off the query plan handle link and curtailed the resource description, for clarity, and notice I put in the code to grab the SQL text if you want that too):

session_id exec_context_id wait_duration_ms wait_type blocking_session_id resource_description database_id
56 0 1 CXPACKET NULL exchangeEvent 13
56 1 1 CXPACKET 56 exchangeEvent 13
56 3 1 CXPACKET 56 exchangeEvent 13
56 4 1 CXPACKET 56 exchangeEvent 13
56 5 1 CXPACKET 56 exchangeEvent 13
56 6 1 CXPACKET 56 exchangeEvent 13
56 7 1 CXPACKET 56 exchangeEvent 13

Results showing skewed parallelism in action

The control thread is the one with exec_context_id set to 0. The other parallel threads are those with exec_context_id higher than 0, and they’re all showing CXPACKET waits apart from one (note that exec_context_id = 2 is missing from the list). You’ll notice that they all list their own session_id as the one that’s blocking them, and that’s correct because all the threads are waiting for another thread from their own session_id to complete. The database_id is the database in whose context the query is being executed, not necessarily the database where the problem is, but it usually is unless the query is using three-part naming to execute in a different database.

Viewing the Cardinality Estimation Problem

With the query_plan column in the query output (which I removed for clarity), you can click on it to bring up the graphical plan and then right-click and select View with SQL Sentry Plan Explorer. This shows as below:

Plan Tree

I can immediately see that there’s a cardinality estimate issue, as the Actual Rows for the Clustered Index Scan is only 2,048, compared to 10,000,000 Est (Estimated) Rows.

If I scroll across, I can see the distribution of rows across the parallel threads that were used:

Rows distributed over threads

Lo and behold, only a single thread was doing any work during the parallel portion of the plan – the one that didn’t show up in the sys.dm_os_waiting_tasks output above.

In this case, the fix is to update the statistics for the table.

In my contrived example that won’t work, as there haven’t been any modifications to the table, so I’ll re-run the set up script, leaving out the UPDATE STATISTICS statement.

The query plan then becomes:

New, fixed plan

Where there’s no cardinality problem and no parallelism either – problem solved!

Summary

If you see CXPACKET waits occurring, it’s easy to check for skewed parallelism, using the method described above. All the cases I’ve seen have been due to cardinality estimation issues of one kind or another, and often it’s simply a case of updating statistics.

As far as general wait statistics are concerned, you can find more information about using them for performance troubleshooting in:

Until next time, happy troubleshooting!