UNION ALL Optimization - SQLPerformance.com

The concatenation of two or more data sets is most commonly expressed in T-SQL using the UNION ALL clause. Given that the SQL Server optimizer can often reorder things like joins and aggregates to improve performance, it is quite reasonable to expect that SQL Server would also consider reordering concatenation inputs, where this would provide an advantage. For example, the optimizer could consider the benefits of rewriting A UNION ALL B as B UNION ALL A.

In fact, the SQL Server optimizer does not do this. More precisely, there was some limited support for concatenation input reordering in SQL Server releases up to 2008 R2, but this was removed in SQL Server 2012, and has not resurfaced since.

SQL Server 2008 R2

Intuitively, the order of concatenation inputs only matters if there is a row goal. By default, SQL Server optimizes execution plans on the basis that all qualifying rows will be returned to the client. When a row goal is in effect, the optimizer tries to find an execution plan that will produce the first few rows quickly.

Row goals can be set in a number of ways, for example using TOP, a FAST n query hint, or by using EXISTS (which by its nature needs to find at most one row). Where there is no row goal (i.e. the client requires all rows), it does not generally matter in which order the concatenation inputs are read: Each input will be fully processed eventually in any case.

The limited support in versions up to SQL Server 2008 R2 applies where there is a goal of exactly one row. In this specific circumstance, SQL Server will reorder concatenation inputs on the basis of expected cost.

This is not done during cost-based optimization (as one might expect), but rather as a last-minute post-optimization rewrite of the normal optimizer output. This arrangement has the advantage of not increasing the cost-based plan search space (potentially one alternative for each possible reordering), while still producing a plan that is optimized to return the first row quickly.

Examples

The following examples use two tables with identical contents: A million rows of integers from one to a million. One table is a heap with no nonclustered indexes; the other has a unique clustered index:

CREATE TABLE dbo.Expensive
(
    Val bigint NOT NULL
);

CREATE TABLE dbo.Cheap
(
    Val bigint NOT NULL, 
    
    CONSTRAINT [PK dbo.Cheap Val]
        UNIQUE CLUSTERED (Val)
);
GO
INSERT dbo.Cheap WITH (TABLOCKX)
    (Val)
SELECT TOP (1000000)
    Val = ROW_NUMBER() OVER (ORDER BY SV1.number)
FROM master.dbo.spt_values AS SV1
CROSS JOIN master.dbo.spt_values AS SV2
ORDER BY
    Val
OPTION (MAXDOP 1);
GO
INSERT dbo.Expensive WITH (TABLOCKX)
    (Val)
SELECT
    C.Val
FROM dbo.Cheap AS C
OPTION (MAXDOP 1);

No Row Goal

The following query looks for the same rows in each table, and returns the concatenation of the two sets:

SELECT E.Val 
FROM dbo.Expensive AS E 
WHERE 
    E.Val BETWEEN 751000 AND 751005

UNION ALL

SELECT C.Val
FROM dbo.Cheap AS C 
WHERE 
    C.Val BETWEEN 751000 AND 751005;

The execution plan produced by the query optimizer is:

UNION ALL without a row goal

The warning on the root SELECT operator is alerting us to the obvious missing index on the heap table. The warning on the Table Scan operator is added by Sentry One Plan Explorer. It is drawing our attention to the I/O cost of the residual predicate hidden within the scan.

The order of the inputs to the Concatenation does not matter here, because we have not set a row goal. Both inputs will be fully read to return all result rows. Of interest (though this is not guaranteed) notice that the order of the inputs follows the textual order of the original query. Observe also that the order of the final result rows is not specified either, since we did not use a top-level ORDER BY clause. We will assume that is deliberate and final ordering is inconsequential to the task at hand.

If we reverse the written order of the tables in the query like so:

SELECT C.Val
FROM dbo.Cheap AS C 
WHERE 
    C.Val BETWEEN 751000 AND 751005

UNION ALL

SELECT E.Val 
FROM dbo.Expensive AS E 
WHERE 
    E.Val BETWEEN 751000 AND 751005;

The execution plan follows the change, accessing the clustered table first (again, this is not guaranteed):

UNION ALL with reversed inputs

Both queries may be expected to have the same performance characteristics, as they perform the same operations, just in a different order.

With a Row Goal

Clearly, the lack of indexing on the heap table will normally make finding specific rows more expensive, compared with the same operation on the clustered table. If we ask the optimizer for a plan that returns the first row quickly, we would expect SQL Server to reorder the concatenation inputs so the cheap clustered table is consulted first.

Using the query that mentions the heap table first, and using a FAST 1 query hint to specify the row goal:

SELECT E.Val 
FROM dbo.Expensive AS E 
WHERE 
    E.Val BETWEEN 751000 AND 751005

UNION ALL

SELECT C.Val
FROM dbo.Cheap AS C 
WHERE 
    C.Val BETWEEN 751000 AND 751005
OPTION (FAST 1);

The estimated execution plan produced on an instance of SQL Server 2008 R2 is:

UNION ALL with a row goal on 2008 R2

Notice that the concatenation inputs have been reordered to reduce the estimated cost of returning the first row. Note also that the missing index and residual I/O warnings have disappeared. Neither issue is of consequence with this plan shape when the goal is to return a single row as quickly as possible.

The same query executed on SQL Server 2016 (using either cardinality estimation model) is:

UNION ALL with a row goal on 2016

SQL Server 2016 has not reordered the concatenation inputs. The Plan Explorer I/O warning has returned, but sadly the optimizer has not produced a missing index warning this time (though it is relevant).

General reordering

As mentioned, the post-optimization rewrite that reorders concatenation inputs is only effective for:

SQL Server 2008 R2 and earlier
A row goal of exactly one

If we genuinely only want one row returned, rather than a plan optimized to return the first row quickly (but which will ultimately still return all rows), we can use a TOP clause with a derived table or common table expression (CTE):

SELECT TOP (1)
    UA.Val
FROM
(
    SELECT E.Val 
    FROM dbo.Expensive AS E 
    WHERE 
        E.Val BETWEEN 751000 AND 751005

    UNION ALL

    SELECT C.Val
    FROM dbo.Cheap AS C 
    WHERE 
        C.Val BETWEEN 751000 AND 751005
) AS UA;

On SQL Server 2008 R2 or earlier, this produces the optimal reordered-input plan:

UNION ALL with TOP on 2008 R2

On SQL Server 2012, 2014, and 2016 no post-optimization reordering occurs:

UNION ALL with TOP on 2012-2016

If we want more than one row returned, for example using TOP (2), the desired rewrite will not be applied on SQL Server 2008 R2 even if a FAST 1 hint is also used. In that situation, we need to resort to tricks like using TOP with a variable and an OPTIMIZE FOR hint:

DECLARE @TopRows bigint = 2; -- Number of rows actually needed

SELECT TOP (@TopRows)
    UA.Val
FROM
(
    SELECT E.Val 
    FROM dbo.Expensive AS E 
    WHERE 
        E.Val BETWEEN 751000 AND 751005

    UNION ALL

    SELECT C.Val
    FROM dbo.Cheap AS C 
    WHERE 
        C.Val BETWEEN 751000 AND 751005
) AS UA
OPTION (OPTIMIZE FOR (@TopRows = 1)); -- Just a hint

The query hint is sufficient to set a row goal of one, while the runtime value of the variable ensures the desired number of rows (2) is returned.

The actual execution plan on SQL Server 2008 R2 is:

UNION ALL with variable and OPTIMIZE FOR on 2008 R2

Both rows returned come from the reordered seek input, and the Table Scan is not executed at all. Plan Explorer shows the row counts in red because the estimate was for one row (due to the hint) whereas two rows were encountered at run time.

Without UNION ALL

This issue is also not limited to queries written explicitly with UNION ALL. Other constructions such as EXISTS and OR can also result in the optimizer introducing a concatenation operator, which may suffer from the lack of input reordering. There was a recent question on Database Administrators Stack Exchange with exactly this issue. Transforming the query from that question to use our example tables:

SELECT
    CASE
        WHEN 
            EXISTS 
            (
                SELECT 1
                FROM dbo.Expensive AS E
                WHERE E.Val BETWEEN 751000 AND 751005
            ) 
            OR EXISTS 
            (
                SELECT 1
                FROM dbo.Cheap AS C 
                WHERE C.Val BETWEEN 751000 AND 751005
            ) 
            THEN 1
        ELSE 0
    END;

The execution plan on SQL Server 2016 has the heap table on the first input:

CASE subquery on 2016

On SQL Server 2008 R2 the order of the inputs is optimized to reflect the single row goal of the semi join:

CASE subquery on 2008 R2

In the more optimal plan, the heap scan is never executed.

Workarounds

In some cases, it will be apparent to the query writer that one of the concatenation inputs will always be cheaper to run than the others. If that is true, it is quite valid to rewrite the query so that the cheaper concatenation inputs appear first in written order. Of course this means the query writer needs to be aware of this optimizer limitation, and prepared to rely on undocumented behaviour.

A more difficult issue arises when the cost of the concatenation inputs varies with the circumstances, perhaps depending on parameter values. Using OPTION (RECOMPILE) will not help on SQL Server 2012 or later. That option may assist on SQL Server 2008 R2 or earlier, but only if the single row goal requirement is also met.

If there are concerns about relying on observed behaviour (query plan concatenation inputs matching the query textual order) a plan guide can be used to force the plan shape. Where different input orders are optimal for different circumstances, multiple plan guides may be used, where the conditions can be accurately coded in advance. This is hardly ideal though.

Final Thoughts

The SQL Server query optimizer does in fact contain a cost-based exploration rule, UNIAReorderInputs, which is capable of generating concatenation input order variations and exploring alternatives during cost-based optimization (not as a single-shot post-optimization rewrite).

This rule is not currently enabled for general use. As far as I can tell, it is only activated when a plan guide or USE PLAN hint is present. This allows the engine to successfully force a plan that was generated for a query that qualified for the input-reordering rewrite, even when the current query does not qualify.

My sense is that this exploration rule is deliberately limited to this use, because queries that would benefit from concatenation input reordering as part of cost-based optimization are considered not sufficiently common, or perhaps because there is a concern that the extra effort would not pay off. My own view is that Concatenation operator input reordering should always be explored when a row goal is in effect.

It is also a shame that the (more limited) post-optimization rewrite is not effective in SQL Server 2012 or later. This might have been due to a subtle bug, but I could not find anything about this in the documentation, knowledge base, or on Connect. I have added a new Connect item here.

Update 9 August 2017: This is now fixed under trace flag 4199 for SQL Server 2014 and 2016, see KB 4023419:

FIX: Query with UNION ALL and a row goal may run slower in SQL Server 2014 or later versions when it's compared to SQL Server 2008 R2