Paul White

Avoiding Sorts with Merge Join Concatenation

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

The SQL Server query execution engine has two ways to implement a logical 'union all' operation, using the Concatenation and Merge Join Concatenation physical operators. While the logical operation is the same, there are important differences between the two physical operators that can make a tremendous difference to the efficiency of your execution plans.

The query optimizer does a reasonable job of choosing between the two options in many cases, but it is a long way from perfect in this area. This article describes the query tuning opportunities presented by Merge Join Concatenation, and details the internal behaviours and considerations you need to be aware of to make the most of it.

Concatenation

Concatenation Icon

The Concatenation operator is relatively simple: its output is the result of fully reading from each of its inputs in sequence. The Concatenation operator is an n-ary physical operator, meaning it can have '2…n' inputs. To illustrate, let's revisit the AdventureWorks-based example from my previous article, "Rewriting Queries to Improve Performance":

SELECT *
INTO dbo.TH
FROM Production.TransactionHistory;
 
CREATE UNIQUE CLUSTERED INDEX CUQ_TransactionID
ON dbo.TH (TransactionID);
 
CREATE NONCLUSTERED INDEX IX_ProductID
ON dbo.TH (ProductID);

The following query lists product and transaction IDs for six particular products:

SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 870 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 873 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 921 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 712 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 707 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 711;

It produces an execution plan featuring a Concatenation operator with six inputs, as seen in SQL Sentry Plan Explorer:

Concatenation Plan

The plan above features a separate Index Seek for each listed product ID, in the same order as specified in the query (reading top down). The topmost Index Seek is for product 870, the next one down is for product 873, then 921 and so on. None of that is guaranteed behaviour of course, it is just something interesting to observe.

I mentioned before that the Concatenation operator forms its output by reading from its inputs in sequence. When this plan is executed, there is a good chance that the result set will shows rows for product 870 first, then 873, 921, 712, 707, and finally product 711. Again, this is not guaranteed because we did not specify an ORDER BY clause, but it does show how Concatenation operates internally.

An SSIS "Execution Plan"

For reasons that will make sense in a moment, consider how we might design an SSIS package to perform the same task. We could certainly also write the whole thing as a single T-SQL statement in SSIS, but the more interesting option is to create a separate data source for each product, and use an SSIS "Union All" component in place of the SQL Server Concatenation operator:

SSIS Union All Data Flow

Now imagine we need the final output from that data flow in Transaction ID order. One option would be to add an explicit Sort component after the Union All:

SSIS Union All Data Flow With Sort

That would certainly do the job, but a skilled and experienced SSIS designer would realize there is a better option: read the source data for each product in Transaction ID order (utilizing the index), then use an order-preserving operation to combine the sets.

In SSIS, the component that combines rows from two sorted data flows into a single sorted data flow is called "Merge". A redesigned SSIS Data Flow that uses Merge to return the desired rows in Transaction ID order follows:

SSIS Merge Data Flow

Note that we need five separate Merge components because Merge is a binary component, unlike the SSIS "Union All" component, which was n-ary. The new Merge flow produces results in Transaction ID order, without requiring an expensive (and blocking) Sort component. Indeed, if we try to add a Sort on Transaction ID after the final Merge, SSIS shows a warning to let us know the stream is already sorted in the desired fashion:

SSIS Redundant Sort Warning

The point of the SSIS example can now be revealed. Look at the execution plan chosen by the SQL Server query optimizer when we ask it to return the original T-SQL query results in Transaction ID order (by adding an ORDER BY clause):

SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 870 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 873 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 921 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 712 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 707 UNION ALL
SELECT ProductID, TransactionID FROM dbo.TH WHERE ProductID = 711
ORDER BY TransactionID;

SQL Server Merge Concatenation Plan

The similarities to the SSIS Merge package are striking; even down to the need for five binary "Merge" operators. The one important difference is that SSIS has separate components for "Merge Join" and "Merge" whereas SQL Server uses the same core operator for both.

To be clear, the Merge Join (Concatenation) operators in the SQL Server execution plan are not performing a join; the engine merely reuses the same physical operator to implement order-preserving union all.

Writing Execution Plans in SQL Server

SSIS does not have a data flow specification language, nor an optimizer to turn such a specification into an executable Data Flow Task. It is up to the SSIS package designer to realize that an order-preserving Merge is possible, set component properties (such as sort keys) appropriately, then compare performance. This requires more effort (and skill) on the designer's part, but it does provide a very fine degree of control.

The situation in SQL Server is the opposite: we write a query specification using the T-SQL language, then depend on the query optimizer to explore implementation options and choose an efficient one. We do not have the option to construct an execution plan directly. Most of the time, this is highly desirable: SQL Server would no doubt be rather less popular if every query required us to write an SSIS-style package.

Nevertheless (as explained in my previous post), the plan chosen by the optimizer can be sensitive to the T-SQL used to describe the desired results. Repeating the example from that article, we could have written the original T-SQL query using an alternative syntax:

SELECT ProductID, TransactionID
FROM dbo.TH
WHERE ProductID IN (870, 873, 921, 712, 707, 711)
ORDER BY TransactionID;

This query specifies exactly the same result set as before, but the optimizer does not consider an order-preserving (merge concatenation) plan, choosing to scan the Clustered Index instead (a much less efficient option):

Clustered Index Scan

Leveraging Order Preservation in SQL Server

Avoiding unnecessary sorting can lead to significant efficiency gains, whether we are talking about SSIS or SQL Server. Achieving this goal can be more complicated and difficult in SQL Server because we do not have such fine-grained control over the execution plan, but there are still things we can do.

Specifically, understanding how the SQL Server Merge Join Concatenation operator works internally can help us to continue writing clear, relational T-SQL, while encouraging the query optimizer to consider order-preserving (merging) processing options where appropriate.

How Merge Join Concatenation Works

Merge Join Concatenation Icon

A regular Merge Join requires both inputs to be sorted on the join keys. Merge Join Concatenation, on the other hand, simply merges two already-ordered streams into a single ordered stream – there is no join, as such.

This begs the question: what exactly is the 'order' that is preserved?

In SSIS, we have to set sort key properties on the Merge inputs to define the ordering. SQL Server has no equivalent to this. The answer to the question above is a little complicated, so we will take it step by step.

Consider the following example, which requests a merge concatenation of two unindexed heap tables (the simplest case):

DECLARE @T1 AS TABLE (c1 int, c2 int, c3 int);
DECLARE @T2 AS TABLE (c1 int, c2 int, c3 int);

SELECT * FROM @T1 AS T1 
UNION ALL 
SELECT * FROM @T2 AS T2
OPTION (MERGE UNION);

These two tables have no indexes, and there is no ORDER BY clause. What ordering will the merge join concatenation 'preserve'? To give you a moment to think about that, let's first look at the execution plan produced for the query above in SQL Server versions before 2012:

Pre-2012 Plan

There is no Merge Join Concatenation, despite the query hint: prior to SQL Server 2012, this hint only works with UNION, not UNION ALL. To get a plan with the desired merge operator, we need to disable the implementation of a logical UNION ALL (UNIA) using the Concatenation (CON) physical operator. Please note that the following is undocumented and not supported for production use:

DECLARE @T1 AS TABLE (c1 int, c2 int, c3 int);
DECLARE @T2 AS TABLE (c1 int, c2 int, c3 int);

SELECT * FROM @T1 AS T1 
UNION ALL 
SELECT * FROM @T2 AS T2
OPTION (QUERYRULEOFF UNIAtoCON);

That query produces the same plan as SQL Server 2012 and 2014 do with the MERGE UNION query hint alone:

Merge Concat Plan 1

Perhaps unexpectedly, the execution plan features explicit sorts on both inputs to the merge. The sort properties are:

Sort Properties

It makes sense that an order-preserving merge requires a consistent input ordering, but why did it choose (c1, c2, c3) instead of, say (c3, c1, c2) or (c2, c3, c1)? As a starting point, merge concatenation inputs are sorted on the output projection list. The select-star in the query expands to (c1, c2, c3) so that is the order chosen.

Sort by Merge Output Projection List

To further illustrate the point, we can expand the select-star ourselves (as we should!) choosing a different order (c3, c2, c1) while we are at it:

DECLARE @T1 AS TABLE (c1 int, c2 int, c3 int);
DECLARE @T2 AS TABLE (c1 int, c2 int, c3 int);

SELECT c3, c2, c1 FROM @T1 AS T1 
UNION ALL 
SELECT c3, c2, c1 FROM @T2 AS T2
OPTION (MERGE UNION);

The sorts now change to match (c3, c2, c1):

Projection list order

Again, the query output order (assuming we were to add some data to the tables) is not guaranteed to be sorted as shown, because we have no ORDER BY clause. These examples are intended simply to show how the optimizer selects an initial input sort order, in the absence of any other reason to sort.

Conflicting Sort Orders

Now consider what happens if we leave the projection list as (c3, c2, c1) and add a requirement to order the query results by (c1, c2, c3). Will the inputs to the merge still sort on (c3, c2, c1) with a post-merge sort on (c1, c2, c3) to satisfy the ORDER BY?

DECLARE @T1 AS TABLE (c1 int, c2 int, c3 int);
DECLARE @T2 AS TABLE (c1 int, c2 int, c3 int);

SELECT c3, c2, c1 FROM @T1 AS T1 
UNION ALL 
SELECT c3, c2, c1 FROM @T2 AS T2
ORDER BY c1, c2, c3
OPTION (MERGE UNION);

No. The optimizer is smart enough to avoid sorting twice:

Projection list vs Order By

Sorting both inputs on (c1, c2, c3) is perfectly acceptable to the merge concatenation, so no double sort is required.

Note that this plan does guarantee that the order of results will be (c1, c2, c3). The plan looks the same as the earlier plans without ORDER BY, but not all the internal details are presented in user-visible execution plans.

The effect of uniqueness

When choosing a sort order for the merge inputs, the optimizer is also affected by any uniqueness guarantees that exist. Consider the following example, with five columns, but note the different column orders in the UNION ALL operation:

DECLARE @T1 AS TABLE (c1 int, c2 int, c3 int, c4 int, c5 int);
DECLARE @T2 AS TABLE (c1 int, c2 int, c3 int, c4 int, c5 int);

SELECT c5, c1, c2, c4, c3 FROM @T1 AS T1 
UNION ALL 
SELECT c5, c4, c3, c2, c1 FROM @T2 AS T2
OPTION (MERGE UNION);

The execution plan includes sorts on (c5, c1, c2, c4, c3) for table @T1 and (c5, c4, c3, c2, c1) for table @T2:

Two different sort orders

To demonstrate the effect of uniqueness on these sorts, we will add a UNIQUE constraint to column c1 in table T1, and column c4 in table T2:

DECLARE @T1 AS TABLE (c1 int UNIQUE, c2 int, c3 int, c4 int, c5 int);
DECLARE @T2 AS TABLE (c1 int, c2 int, c3 int, c4 int UNIQUE, c5 int);

SELECT c5, c1, c2, c4, c3 FROM @T1 AS T1 
UNION ALL 
SELECT c5, c4, c3, c2, c1 FROM @T2 AS T2
OPTION (MERGE UNION);

The point about uniqueness is that the optimizer knows that it can stop sorting as soon as it encounters a column that is guaranteed to be unique. Sorting by additional columns after a unique key is encountered will not affect the final sort order, by definition.

With the UNIQUE constraints in place, the optimizer can simplify the (c5, c1, c2, c4, c3) sort list for T1 to (c5, c1) because c1 is unique. Similarly, the (c5, c4, c3, c2, c1) sort list for T2 is simplified to (c5, c4) because c4 is a key:

Uniqueness simplification

Parallelism

The simplification due to a unique key is not perfectly implemented. In a parallel plan, the streams are partitioned so that all rows for the same instance of the merge end up on the same thread. This data set partitioning is based on the merge columns, and not simplified by the presence of a key.

The following script uses unsupported trace flag 8649 to generate a parallel plan for the previous query (which is unchanged otherwise):

DECLARE @T1 AS TABLE (c1 int UNIQUE, c2 int, c3 int, c4 int, c5 int);
DECLARE @T2 AS TABLE (c1 int, c2 int, c3 int, c4 int UNIQUE, c5 int);

SELECT c5, c1, c2, c4, c3 FROM @T1 AS T1 
UNION ALL 
SELECT c5, c4, c3, c2, c1 FROM @T2 AS T2
OPTION (MERGE UNION, QUERYTRACEON 8649);

Parallel repartitioning

The sort lists are simplified as before, but the Repartition Streams operators still partition over all columns. If this simplification were implemented consistently, the repartitioning operators would also operate on (c5, c1) and (c5, c4) alone.

Problems with non-unique indexes

The way the optimizer reasons about the sorting requirements for merge concatenation can result in unnecessary sort problems, as the next example shows:

CREATE TABLE #T1 (c1 int, c2 int, c3 int, c4 int, c5 int);
CREATE TABLE #T2 (c1 int, c2 int, c3 int, c4 int, c5 int);
CREATE CLUSTERED INDEX cx ON #T1 (c1);
CREATE CLUSTERED INDEX cx ON #T2 (c1);

SELECT * FROM #T1 AS T1
UNION ALL 
SELECT * FROM #T2 AS T2
ORDER BY c1
OPTION (MERGE UNION);

DROP TABLE #T1, #T2;

Looking at the query and available indexes, we would expect an execution plan that performs an ordered scan of the clustered indexes, using merge join concatenation to avoid the need for any sorting. This expectation is fully justified, because the clustered indexes provide the ordering specified in the ORDER BY clause. Unfortunately, the plan we actually get includes two sorts:

Non-unique clustered index

There is no good reason for these sorts, they only appear because the query optimizer's logic is imperfect. The merge output column list (c1, c2, c3, c4, c5) is a superset of the ORDER BY, but there is no unique key to simplify that list. As a result of this gap in the optimizer's reasoning, it concludes that the merge requires its input sorted on (c1, c2, c3, c4, c5).

We can verify this analysis by modifying the script to make one of the clustered indexes unique:

CREATE TABLE #T1 (c1 int, c2 int, c3 int, c4 int, c5 int);
CREATE TABLE #T2 (c1 int, c2 int, c3 int, c4 int, c5 int);
CREATE CLUSTERED INDEX cx ON #T1 (c1);
CREATE UNIQUE CLUSTERED INDEX cx ON #T2 (c1);

SELECT * FROM #T1 AS T1
UNION ALL 
SELECT * FROM #T2 AS T2
ORDER BY c1
OPTION (MERGE UNION);

DROP TABLE #T1, #T2;

The execution plan now only has a sort above the table with the non-unique index:

One unique index

If we now make both clustered indexes unique, no sorts appear:

CREATE TABLE #T1 (c1 int, c2 int, c3 int, c4 int, c5 int);
CREATE TABLE #T2 (c1 int, c2 int, c3 int, c4 int, c5 int);
CREATE UNIQUE CLUSTERED INDEX cx ON #T1 (c1);
CREATE UNIQUE CLUSTERED INDEX cx ON #T2 (c1);

SELECT * FROM #T1 AS T1
UNION ALL 
SELECT * FROM #T2 AS T2
ORDER BY c1;

DROP TABLE #T1, #T2;

With both indexes unique, the initial merge input sort lists can be simplified to column c1 alone. The simplified list then matches the ORDER BY clause exactly, so no sorts are needed in the final plan:

Both indexes unique

Notice we do not even need the query hint in this last example to get the optimal execution plan.

Final Thoughts

Eliminating sorts in an execution plan can be tricky. In some cases, it can be as simple as modifying an existing index (or providing a new one) to deliver rows in the required order. The query optimizer does a reasonable job overall when appropriate indexes are available.

In (many) other cases however, avoiding sorts can require a much deeper understanding of the execution engine, the query optimizer, and plan operators themselves. Avoiding sorts is undoubtedly an advanced query tuning topic, but also an incredibly rewarding one when everything comes right.