MERGE statement (introduced in SQL Server 2008) allows us to perform a mixture of
DELETE operations using a single statement. The Halloween Protection issues for
MERGE are mostly a combination of the requirements of the individual operations, but there are some important differences and a couple of interesting optimizations that apply only to
Avoiding the Halloween Problem with MERGE
We start by looking again at the Demo and Staging example from part two:
CREATE TABLE dbo.Demo ( SomeKey integer NOT NULL, CONSTRAINT PK_Demo PRIMARY KEY (SomeKey) ); CREATE TABLE dbo.Staging ( SomeKey integer NOT NULL ); INSERT dbo.Staging (SomeKey) VALUES (1234), (1234); CREATE NONCLUSTERED INDEX c ON dbo.Staging (SomeKey); INSERT dbo.Demo SELECT s.SomeKey FROM dbo.Staging AS s WHERE NOT EXISTS ( SELECT 1 FROM dbo.Demo AS d WHERE d.SomeKey = s.SomeKey );
As you may recall, this example was used to show that an
INSERT requires Halloween Protection when the insert target table is also referenced in the
SELECT part of the query (the
EXISTS clause in this case). The correct behaviour for the
INSERT statement above is to try to add both 1234 values, and to consequently fail with a
PRIMARY KEY violation. Without phase separation, the
INSERT would incorrectly add one value, completing without an error being thrown.
The INSERT execution plan
The code above has one difference from that used in part two; a nonclustered index on the Staging table has been added. The
INSERT execution plan still requires Halloween Protection though:
The MERGE execution plan
Now try the same logical insert expressed using
MERGE dbo.Demo AS d USING dbo.Staging AS s ON s.SomeKey = d.SomeKey WHEN NOT MATCHED BY TARGET THEN INSERT (SomeKey) VALUES (s.SomeKey);
In case you are not familiar with the syntax, the logic there is to compare rows in the Staging and Demo tables on the SomeKey value, and if no matching row is found in the target (Demo) table, we insert a new row. This has exactly the same semantics as the previous
INSERT...WHERE NOT EXISTS code, of course. The execution plan is quite different however:
Notice the lack of an Eager Table Spool in this plan. Despite that, the query still produces the correct error message. It seems SQL Server has found a way to execute the
MERGE plan iteratively while respecting the logical phase separation required by the SQL standard.
The hole-filling optimization
In the right circumstances, the SQL Server optimizer can recognize that the
MERGE statement is hole-filling, which is just another way of saying that the statement only adds rows where there is an existing gap in the target table's key.
For this optimization to be applied, the values used in the
WHEN NOT MATCHED BY TARGET clause must exactly match the
ON part of the
USING clause. Also, the target table must have a unique key (a requirement satisfied by the
PRIMARY KEY in the present case). Where these requirements are met, the
MERGE statement does not require protection from the Halloween Problem.
Of course, the
MERGE statement is logically no more or less hole-filling than the original
INSERT...WHERE NOT EXISTS syntax. The difference is that the optimizer has complete control over implementing the
MERGE statement, whereas the
INSERT syntax would require it to reason about the wider semantics of the query. A human can easily see that the
INSERT is also hole-filling, but the optimizer does not think about things in the same way we do.
To illustrate the exact matching requirement I mentioned, consider the following query syntax, which does not benefit from the hole-filling optimization. The result is full Halloween Protection provided by an Eager Table Spool:
MERGE dbo.Demo AS d USING dbo.Staging AS s ON s.SomeKey = d.SomeKey WHEN NOT MATCHED THEN INSERT (SomeKey) VALUES (s.SomeKey * 1);
The only difference there is the multiplication by one in the
VALUES clause – something which does not change the logic of the query, but which is enough to prevent the hole-filling optimization being applied.
Hole-filling with Nested Loops
In the previous example, the optimizer chose to join the tables using a Merge join. The hole-filling optimization can also be applied where a Nested Loops join is chosen, but this requires an extra uniqueness guarantee on the source table, and an index seek on the inner side of the join. To see this in action, we can clear out the existing staging data, add uniqueness to the nonclustered index, and try the
-- Remove existing duplicate rows TRUNCATE TABLE dbo.Staging; -- Convert index to unique CREATE UNIQUE NONCLUSTERED INDEX c ON dbo.Staging (SomeKey) WITH (DROP_EXISTING = ON); -- Sample data INSERT dbo.Staging (SomeKey) VALUES (1234), (5678); -- Hole-filling merge MERGE dbo.Demo AS d USING dbo.Staging AS s ON s.SomeKey = d.SomeKey WHEN NOT MATCHED THEN INSERT (SomeKey) VALUES (s.SomeKey);
The resulting execution plan again uses the hole-filling optimization to avoid Halloween Protection, using a nested loops join and an inner-side seek into the target table:
Avoiding unnecessary index traversals
Where the hole-filling optimization applies, the engine may also apply a further optimization. It can remember the current index position while reading the target table (processing one row at a time, remember) and reuse that information when performing the insert, instead of seeking down the b-tree to find the insert location. The reasoning is that the current read position is very likely to be on the same page where the new row should be inserted. Checking that the row does in fact belong on this page is very fast, since it involves checking only the lowest and highest keys currently stored there.
The combination of eliminating the Eager Table Spool and saving an index navigation per row can provide a significant benefit in OLTP workloads, provided the execution plan is retrieved from cache. The compilation cost for
MERGE statements is rather higher than for
DELETE, so plan reuse is an important consideration. It is also helpful to ensure that pages have sufficient free space to accommodate new rows, avoiding page splits. This is typically achieved through normal index maintenance and the assignment of a suitable
I mention OLTP workloads, which typically feature a large number of relatively small changes, because the
MERGE optimizations may not be a good choice where a large number of are rows processed per statement. Other optimizations like minimally-logged
INSERTs cannot currently be combined with hole-filling. As always, the performance characteristics should be benchmarked to ensure the expected benefits are realized.
The hole-filling optimization for
MERGE inserts may be combined with updates and deletes using additional
MERGE clauses; each data-changing operation is assessed separately for the Halloween Problem.
Avoiding the join
The final optimization we will look at can be applied where the
MERGE statement contains update and delete operations as well as a hole-filling insert, and the target table has a unique clustered index. The following example shows a common
MERGE pattern where unmatched rows are inserted, and matching rows are updated or deleted depending on an additional condition:
CREATE TABLE #T ( col1 integer NOT NULL, col2 integer NOT NULL, CONSTRAINT PK_T PRIMARY KEY (col1) ); CREATE TABLE #S ( col1 integer NOT NULL, col2 integer NOT NULL, CONSTRAINT PK_S PRIMARY KEY (col1) ); INSERT #T (col1, col2) VALUES (1, 50), (3, 90); INSERT #S (col1, col2) VALUES (1, 40), (2, 80), (3, 90);
MERGE statement required to make all the required changes is remarkably compact:
MERGE #T AS t USING #S AS s ON t.col1 = s.col1 WHEN NOT MATCHED THEN INSERT VALUES (s.col1, s.col2) WHEN MATCHED AND t.col2 - s.col2 = 0 THEN DELETE WHEN MATCHED THEN UPDATE SET t.col2 -= s.col2;
The execution plan is quite surprising:
No Halloween Protection, no join between the source and target tables, and it's not often you will see a Clustered Index Insert operator followed by a Clustered Index Merge to the same table. This is another optimization targeted at OLTP workloads with high plan reuse and suitable indexing.
The idea is to read a row from the source table and immediately try to insert it into the target. If a key violation results, the error is suppressed, the Insert operator outputs the conflicting row it found, and that row is then processed for an update or delete operation using the Merge plan operator as normal.
If the original insert succeeds (without a key violation) processing continues with the next row from the source (the Merge operator only processes updates and deletes). This optimization primarily benefits
MERGE queries where most source rows result in an insert. Again, careful benchmarking is required to ensure performance is better than using separate statements.
MERGE statement provides several unique optimization opportunities. In the right circumstances, it can avoid the need to add explicit Halloween Protection compared with an equivalent
INSERT operation, or perhaps even a combination of
DELETE statements. Additional
MERGE-specific optimizations can avoid the index b-tree traversal that is usually needed to locate the insert position for a new row, and may also avoid the need to join the source and target tables completely.
In the final part of this series, we will look at how the query optimizer reasons about the need for Halloween protection, and identify some more tricks it can employ to avoid the need to add Eager Table Spools to execution plans that change data.