A few weeks ago, an interesting question was asked on the #SQLHelp hash tag on Twitter about the impact of execution plans on the ASYNC_NETWORK_IO wait type, and it generated some differing opinions and a lot of good discussion.
https://twitter.com/shawndube/status/1225476846537650176
My immediate answer to this would be that someone is misinterpreting the cause and effect of this, since the ASYNC_NETWORK_IO wait type is encountered when the Engine has results to send over TDS to the client but there are no available TDS buffers on the connection to send them on. Generally speaking, this means that the client side is not consuming the results efficiently, but based on the ensuing discussion I became intrigued enough to do some testing of whether or not an execution plan would actually impact the ASYNC_NETWORK_IO waits significantly.
To summarize: Focusing on ASYNC_NETWORK_IO waits alone as a tuning metric is a mistake. The faster a query executes, the higher this wait type will likely accumulate, even if the client is consuming results as fast as possible. (Also see Greg's recent post about focusing on waits alone in general.)
Test Configuration
To run the tests for this, a very simple table was generated based on an example that was provided to me by email from another member of the community, which demonstrated a change in the wait type, but also had an entirely different query between the two tests with an additional table being used in the second test, and it had a comment to turn the results off, which removes the significant portion of this wait type to begin with, so it’s not just a plan change alone.
Note: I would like point out that this is not a negative statement towards anyone at all; the ensuing discussion and further tests that came from the original reproduction that was provided was very educational and it led to further research towards understanding this wait type overall. The original reproduction DID demonstrate a difference, but with additional changes that weren’t part of the original question as posed.
DROP TABLE IF EXISTS [DemoTable];
CREATE TABLE [DemoTable] (
ID INT PRIMARY KEY,
FILLER VARCHAR(100)
);
INSERT INTO [DemoTable] WITH (TABLOCK)
SELECT TOP (250000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), REPLICATE('Z', 50)
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);
GO
Using this table as a base data set for testing different plan shapes using hints, the following queries were used:
SELECT t1.ID, t2.FILLER, t2.FILLER
FROM [DemoTable] t1
INNER HASH JOIN [DemoTable] t2 ON t1.ID = t2.ID;
SELECT t1.ID, t2.FILLER, t2.FILLER
FROM [DemoTable] t1
INNER MERGE JOIN [DemoTable] t2 ON t1.ID = t2.ID;
SELECT t1.ID, t2.FILLER, t2.FILLER
FROM [DemoTable] t1
INNER LOOP JOIN [DemoTable] t2 ON t1.ID = t2.ID;
Since I was running these queries on SQL Server 2019 CU1, the execution plans included the actual wait stats information related to the query execution.
Note: The optimizer would use a Merge Join without the hints being applied for this specific data set and query.
Initial Test Results
For the initial tests I simply used SSMS to run the queries and collected the Actual Execution plan to compare the wait information associated with each query which are shown below. Note that for this size of data, the elapsed times are not significantly different, and neither are the wait times or wait counts for ASYNC_NETWORK_IO.
HASH JOIN
MERGE JOIN
LOOP JOIN
However, this wasn’t where I wanted to stop testing, because my own experience has shown repeatedly that Management Studio is a very inefficient consumer of results from SQL Server and can itself cause ASYNC_NETWORK_IO waits to occur. So, I decided to change how I was testing things and went to a SQLCMD execution of the queries.
Testing with SQLCMD
Since I use SQLCMD a lot for demos while presenting, I created a testscript.sql file with the following contents:
PRINT 'Minimize Screen';
GO
WAITFOR DELAY '00:00:05';
GO
SELECT t1.ID, t2.FILLER, t2.FILLER
FROM [DemoTable] t1
INNER HASH JOIN [DemoTable] t2 ON t1.ID = t2.ID;
GO
SELECT t1.ID, t2.FILLER, t2.FILLER
FROM [DemoTable] t1
INNER MERGE JOIN [DemoTable] t2 ON t1.ID = t2.ID;
GO
SELECT t1.ID, t2.FILLER, t2.FILLER
FROM [DemoTable] t1
INNER LOOP JOIN [DemoTable] t2 ON t1.ID = t2.ID;
GO
This was executed from the command line as follows, and during the 5 second delay the window was minimized to allow the execution to not render and scroll results during processing:
To capture the actual execution plans, I went with an Extended Events session collecting the query_post_execution_showplan event which, in hindsight, on SQL Server 2019 I thought I should have used query_post_execution_plan_profile instead to use the lightweight query execution statistics profiling infrastructure v3 implementation, but this event does not return the WaitStats or QueryTimeStats information unless query_post_execution_showplan is also enabled at the same time. Additionally, since this is an isolated testing machine with no other workload, the impacts of the standard profiling are not really that big of a concern here.
CREATE EVENT SESSION [Actual Plan] ON SERVER
ADD EVENT sqlserver.query_post_execution_showplan
(ACTION(sqlserver.session_id));
HASH JOIN
MERGE JOIN
LOOP JOIN
This actually didn’t work out to be a faster way of executing the query, and performance was actually reduced by using the command line utility to execute the query, even when the window is minimized and not visibly scrolling the results. With the window open the HASH execution time was 15708ms and the ASYNC_NETWORK_IO wait time was 15126ms. However, this demonstrates that for the same exact results the performance of the client consuming the results impacts both the wait time and execution time of the query.
Parallelism Impact?
One of the things that I noticed was that only two of the plans had executed with parallelism, based on the existence of the CXPACKET and LATCH_EX waits in the execution plan XML. So I wondered what kind of impact forcing a serial execution plan would have on the execution of these same queries using OPTION (MAXDOP 1).
HASH JOIN
MERGE JOIN
LOOP JOIN
Notice here that the overall wait counts haven’t decreased significantly. Only the serial loop join plan has a major change in the number of waits or total amount of wait time associated with it, and in isolation this doesn’t mean that it’s a positive benefit, the execution time for the query was not significantly improved and there may be other factors that affected the outcomes of that specific test.
The table below summarizes the ASYNC_NETWORK_IO wait time and count for each of the tests.
PlanType | Rows | WaitCount | WaitTime | ExecTime | AppName | MAXDOP 1 | Parallel |
---|---|---|---|---|---|---|---|
Hash | 250,000 | 6,635 | 4,394 | 4,961 | SSMS | N | Y |
Merge | 250,000 | 6,592 | 3,169 | 3,933 | SSMS | N | N |
Loop | 250,000 | 6,631 | 3,576 | 4,084 | SSMS | N | Y |
Hash | 250,000 | 6,610 | 11,321 | 11,874 | SQLCMD | N | Y |
Merge | 250,000 | 6,602 | 10,837 | 11,597 | SQLCMD | N | N |
Loop | 250,000 | 6,612 | 11,177 | 11,696 | SQLCMD | N | Y |
Hash | 250,000 | 6,379 | 4,047 | 4,619 | SSMS | Y | N |
Merge | 250,000 | 6,608 | 3,699 | 4,479 | SSMS | Y | N |
Loop | 250,000 | 5,385 | 2,083 | 3,918 | SSMS | Y | N |
Summary
While this post’s investigation doesn’t cover every single aspect of plan changes or the ASYNC_NETWORK_IO wait type, it does demonstrate that this wait is not significantly impacted by the execution plan that is used for a query’s execution. I would classify this wait type almost like the CXPACKET wait type when performing analysis of a server as a whole; normal to see for most workloads and unless it’s incredibly skewed and there are other performance problems pointing to slow consumption of results by clients like blocking with the lead blocker waiting for ASYNC_NETWORK_IO, then something to be ignored as just ‘part of the normal waits signature for the workload’.
Unless it's incredibly skewed and there are other performance problems pointing to slow consumption of results by clients like blocking with the lead blocker waiting for ASYNC_NETWORK_IO, then something to be ignored as just 'part of the normal waits signature for the workload–
Please let us know to do if lead blocker is waiting for this wait and results are highly skewed.
If they are skewed in accumulation or the lead blocker is waiting for this wait type then you need to look at the way the application is consuming the results and/or consider the size of the result set that is being returned back.