The Halloween Problem - Part 2 - SQLPerformance.com

[ Part 1 | Part 2 | Part 3 | Part 4 ]

In the first part of this series, we saw how the Halloween Problem applies to UPDATE queries. To recap briefly, the problem was that an index used to locate records to update had its keys modified by the update operation itself (another good reason to use included columns in an index rather than extending the keys). The query optimizer introduced an Eager Table Spool operator to separate the reading and writing sides of the execution plan to avoid the problem. In this post, we will see how the same underlying issue can affect INSERT and DELETE statements.

Insert Statements

Now we know a bit about the conditions that require Halloween Protection, it is quite easy to create an INSERT example that involves reading from and writing to the keys of the same index structure. The simplest example is duplicating rows in a table (where adding new rows inevitably modifies the keys of the clustered index):

CREATE TABLE dbo.Demo
(
    SomeKey integer NOT NULL,

    CONSTRAINT PK_Demo
        PRIMARY KEY (SomeKey)
);

INSERT dbo.Demo
SELECT SomeKey FROM dbo.Demo;

The problem is that newly inserted rows might be encountered by the reading side of the execution plan, potentially resulting in a loop that adds rows forever (or at least until some resource limit is reached). The query optimizer recognizes this risk, and adds an Eager Table Spool to provide the necessary phase separation:

A more realistic example

You probably don’t often write queries to duplicate every row in a table, but you likely do write queries where the target table for an INSERT also appears somewhere in the SELECT clause. One example is adding rows from a staging table that do not already exist in the destination:

CREATE TABLE dbo.Staging
(
    SomeKey integer NOT NULL
);

-- Sample data
INSERT dbo.Staging
    (SomeKey)
VALUES
    (1234),
    (1234);

-- Test query
INSERT dbo.Demo
SELECT s.SomeKey
FROM dbo.Staging AS s
WHERE NOT EXISTS
(
    SELECT 1
    FROM dbo.Demo AS d
    WHERE d.SomeKey = s.SomeKey
);

The execution plan is:

The problem in this case is subtly different, though still an example of the same core issue. There is no value ‘1234’ in the target Demo table, but the Staging table contains two such entries. Without phase separation, the first ‘1234’ value encountered would be inserted successfully, but the second check would find that the value ‘1234’ now exists and would not attempt to insert it again. The statement as a whole would complete successfully.

This might produce a desirable outcome in this particular case (and might even seem intuitively correct) but it is not a correct implementation. The SQL standard requires that data modification queries execute as if the three phases of reading, writing and checking constraints occur completely separately (see part one).

Searching for all rows to insert as a single operation, we should select both ‘1234’ rows from the Staging table, since this value does not exist in the target yet. The execution plan should therefore try to insert both ‘1234’ rows from the Staging table, resulting in a primary key violation:

Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK_Demo'.
Cannot insert duplicate key in object 'dbo.Demo'.
The duplicate key value is (1234).
The statement has been terminated.

The phase separation provided by the Table Spool ensures that all checks for existence are completed before any changes are made to the target table. If you run the query in SQL Server with the sample data above, you will receive the (correct) error message.

Halloween Protection is required for INSERT statements where the target table is also referenced in the SELECT clause.

Delete Statements

We might expect the Halloween Problem not to apply to DELETE statements, since it shouldn’t really matter if we try to delete a row multiple times. We can modify our staging table example to remove rows from the Demo table that do not exist in Staging:

TRUNCATE TABLE dbo.Demo;
TRUNCATE TABLE dbo.Staging;

INSERT dbo.Demo (SomeKey) VALUES (1234);

DELETE dbo.Demo
WHERE NOT EXISTS 
(
    SELECT 1 
    FROM dbo.Staging AS s 
    WHERE s.SomeKey = dbo.Demo.SomeKey
);

This test seems to validate our intuition because there is no Table Spool in the execution plan:

This type of DELETE does not require phase separation because each row has a unique identifier (an RID if the table is a heap, clustered index key(s) and possibly a uniquifier otherwise). This unique row locator is a stable key – there is no mechanism by which it can change during execution of this plan, so the Halloween Problem does not arise.

DELETE Halloween Protection

Nevertheless, there is at least one case where a DELETE requires Halloween protection: when the plan references a row in the table other than the one which is being deleted. This requires a self-join, commonly found when hierarchical relationships are modelled. A simplified example is shown below:

CREATE TABLE dbo.Test
(
    pk char(1) NOT NULL,
    ref char(1) NULL,

    CONSTRAINT PK_Test
        PRIMARY KEY (pk)
);

INSERT dbo.Test
    (pk, ref)
VALUES
    ('B', 'A'),
    ('C', 'B'),
    ('D', 'C');

There really ought to be a same-table foreign key reference defined here, but let’s ignore that design failing for a moment – the structure and data are nonetheless valid (and it is sadly quite common to find foreign keys omitted in the real world). Anyway, the task at hand is to delete any row where the ref column points to a non-existent pk value. The natural DELETE query matching this requirement is:

DELETE dbo.Test
WHERE NOT EXISTS 
(
    SELECT 1 
    FROM dbo.Test AS t2 
    WHERE t2.pk = dbo.Test.ref
);

The query plan is:

Notice this plan now features a costly Eager Table Spool. Phase separation is required here because otherwise results could depend on the order in which rows are processed:

If the execution engine starts with the row where pk = B, it would find no matching row (ref = A and there is no row where pk = A). If execution then moves on to the row where pk = C, it would also be deleted because we just removed row B pointed to by its ref column. The end result would be that iterative processing in this order would delete all the rows from the table, which is clearly incorrect.

On the other hand, if the execution engine processed the row with pk =D first, it would find a matching row (ref = C). Assuming execution continued in reverse pk order, the only row deleted from the table would be the one where pk = B. This is the correct result (remember the query should execute as if the read, write, and validation phases had occurred sequentially and without overlaps).

Phase separation for constraint validation

As an aside, we can see another example of phase separation if we add a same-table foreign key constraint to the previous example:

DROP TABLE dbo.Test;

CREATE TABLE dbo.Test
(
    pk char(1) NOT NULL,
    ref char(1) NULL,

    CONSTRAINT PK_Test
        PRIMARY KEY (pk),

    CONSTRAINT FK_ref_pk
        FOREIGN KEY (ref)
        REFERENCES dbo.Test (pk)
);

INSERT dbo.Test
    (pk, ref)
VALUES
    ('B', NULL),
    ('C', 'B'),
    ('D', 'C');

The execution plan for the INSERT is:

The insert itself does not require Halloween protection since the plan does not read from the same table (the data source is an in-memory virtual table represented by the Constant Scan operator). The SQL standard does however require that phase 3 (constraint checking) occurs after the writing phase is complete. For this reason, a phase separation Eager Table Spool is added to the plan after the Clustered Index Index, and just before each row is checked to make sure the foreign key constraint remains valid.

If you are starting to think that translating a set-based declarative SQL modification query to a robust iterative physical execution plan is a tricky business, you are beginning to see why update processing (of which Halloween Protection is but a very small part) is the most complex part of the Query Processor.

DELETE statements require Halloween Protection where a self-join of the target table is present.

Summary

Halloween Protection can be an expensive (but necessary) feature in execution plans that change data (where ‘change’ includes all SQL syntax that adds, changes or removes rows). Halloween Protection is required for UPDATE plans where a common index structure’s keys are both read and modified, for INSERT plans where the target table is referenced on the reading side of the plan, and for DELETE plans where a self-join on the target table is performed.

The next part in this series will cover some special Halloween Problem optimizations that apply only to MERGE statements.

[ Part 1 | Part 2 | Part 3 | Part 4 ]

4 thoughts on “The Halloween Problem – Part 2”

Pingback: Paul White: Page Free Space : Halloween Protection – The Complete Series
SomewhereSomehow says:

February 18, 2014 at 7:16 AM

Just noticed in one plan, and wanted to share. For HP It is not even necessary, that table should have any index.
This will also have Eager Spool for HP:
CREATE TABLE dbo.Demo(SomeKey int); INSERT dbo.Demo SELECT SomeKey FROM dbo.Demo;
The same problem of reading one record twice if not splitting phases (good plain illustration in "Ramireddy's blog – A case of avoiding Eager spool").
Paul White says:

February 18, 2014 at 10:51 AM

That's right. I was careful to include heaps as well as indexes in part 1. There are more details on heap-specific issues in part 4. Thanks for the feedback though.
SomewhereSomehow says:

February 18, 2014 at 11:42 AM

Ahh, I see now, the one in "Three-phase update strategy" section? I saw it when read earlier, but missed to link it with today's simple case "insert heap select * from heap" and was quite surprised to see spool in the plan! And even more, when I re-read the "Insert Statements" section and not found an explanation there. Now it's clear, I had to read from the very beginning to refresh the picture!

Good case anyway, for me it was a little bit counterintuitive at first sight (because first HP originated from the index problem and mind tends to consider indexes only), but makes perfect sense when you imagine the process.

I've read part 4, as well, as all the other, long time ago, they are great, and I referring them from time to time, when I forget some details =)
Thanks.

Comments are closed.