Paul White

Indexed View Maintenance in Execution Plans

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

Though they come with many restrictions and some important implementation caveats, indexed views are still a very powerful SQL Server feature when correctly employed in the right circumstances. One common use is to provide a pre-aggregated view of underlying data, giving users the ability to query results directly without incurring the costs of processing the underlying joins, filters, and aggregates every time a query is executed.

Although new Enterprise Edition features like columnar storage and batch mode processing have transformed the performance characteristics of many large queries of this type, there is still no faster way to obtain a result than to avoid all the underlying processing completely, no matter how efficient that processing might have become.

Before indexed views (and their more limited cousins, computed columns) were added to the product, database professionals would sometimes write complex multi-trigger code to present the results of an important query in a real table. This sort of arrangement is notoriously difficult to get right in all circumstances, particularly where concurrent changes to the underlying data are frequent.

The indexed views feature makes all this much easier, where it is sensibly and correctly applied. The database engine takes care of everything needed to ensure data read from an indexed view matches the underlying query and table data at all times.

Incremental Maintenance

SQL Server keeps indexed view data synchronized with the underlying query by automatically updating the view indexes appropriately whenever data changes in the base tables. The cost of this maintenance activity is borne by the process changing the base data. The extra operations needed to maintain the view indexes are silently added to the execution plan for the original insert, update, delete, or merge operation. In the background, SQL Server also takes care of more subtle issues concerning transaction isolation, for example ensuring correct handling for transactions running under snapshot or read committed snapshot isolation.

Constructing the extra execution plan operations needed to maintain the view indexes correctly is not a trivial matter, as anyone who has attempted a "summary table maintained by trigger code" implementation will know. The complexity of the task is one of the reasons that indexed views have so many restrictions. Limiting the supported surface area to inner joins, projections, selections (filters), and the SUM and COUNT_BIG aggregates reduces the implementation complexity considerably.

Indexed views are maintained incrementally. This means the query processor determines the net effect of the base table changes on the view, and applies only those changes necessary to bring the view up to date. In simple cases, it can calculate the necessary deltas from just the base table changes and the data currently stored in the view. Where the view definition contains joins, the indexed view maintenance portion of the execution plan will need to access the joined tables as well, but this can usually be performed efficiently, given appropriate base table indexes.

To simplify the implementation further, SQL Server always uses the same basic plan shape (as a starting point) to implement indexed view maintenance operations. The normal facilities provided by the query optimizer are employed to simplify and optimize the standard maintenance shape as appropriate. We will now turn to an example to help bring these concepts together.

Example 1 – Single Row Insert

Suppose we have the following simple table and indexed view:

CREATE TABLE dbo.T1 
(
    GroupID integer NOT NULL, 
    Value   integer NOT NULL
);
GO
INSERT dbo.T1
    (GroupID, Value)
VALUES
    (1, 1),
    (1, 2),
    (2, 3),
    (2, 4),
    (2, 5);
GO
CREATE VIEW dbo.IV
WITH SCHEMABINDING
AS
SELECT
    T1.GroupID, 
    SumValue = SUM(T1.Value),
    NumRows = COUNT_BIG(*)
FROM dbo.T1 AS T1
WHERE
    T1.GroupID BETWEEN 1 AND 5
GROUP BY 
    T1.GroupID;
GO
CREATE UNIQUE CLUSTERED INDEX cuq
ON dbo.IV (GroupID);

After that script is run, the data in the sample table looks like this:

Sample data

And the Indexed view contains:

Indexed View Contents

The simplest example of an indexed view maintenance plan for this setup occurs when we add a single row to the base table:

INSERT dbo.T1
    (GroupID, Value)
VALUES
    (3, 6);

The execution plan for this insert is shown below:

Single Row Insert Execution Plan

Following the numbers in the diagram, the operation of this execution plan proceeds as follows:

  1. The Table Insert operator adds the new row to the base table. This is the only plan operator associated with the base table insert; all remaining operators are concerned with the maintenance of the indexed view.
  2. The Eager Table Spool saves the inserted row data to temporary storage.
  3. The Sequence operator ensures the top branch of the plan runs to completion before the next branch in the Sequence is activated. In this special case (inserting a single row), it would be valid to remove the Sequence (and the spools at positions 2 and 4), directly connecting the Stream Aggregate input to the output of the Table Insert. This possible optimization is not implemented, so the Sequence and Spools remain.
  4. This Eager Table Spool is associated with the spool at position 2 (it has a Primary Node ID property that provides this link explicitly). The spool replays rows (one row in the present case) from the same temporary storage written to by the primary spool. As mentioned above, the spools and positions 2 and 4 are unnecessary, and feature simply because they exist in the generic template for indexed view maintenance.
  5. The Stream Aggregate computes the sum of Value column data in the inserted set, and counts the number of rows present per view-key group. The output is the incremental data needed to keep the view synchronized with the base data. Note, the Stream Aggregate does not have a Group By element because the query optimizer knows only a single value is being processed. However, the optimizer does not apply similar logic to replace the aggregates with projections (the sum of a single value is just the value itself, and the count will always be one for a single row insert). Computing the sum and count aggregates for a single row of data is not an expensive operation, so this missed optimization is not much to be concerned about.
  6. The join relates each calculated incremental change to an existing key in the indexed view. The join is an outer join because the newly-inserted data might not correspond to any existing data in the view.
  7. This operator locates the row to be modified in the view.
  8. The Compute Scalar has two important responsibilities. First, it determines whether each incremental change will affect an existing row in the view, or whether a new row will have to be created. It does this by checking to see if the outer join produced a null from the view side of the join. Our sample insert is for group 3, which does not currently exist in the view, so a new row will be created. The second function of the Compute Scalar is to calculate new values for the view columns. If a new row is to be added to the view, this is simply the result of the incremental sum from the Stream Aggregate. If an existing row in the view is to be updated, the new value is the existing value in the view row plus the incremental sum from the Stream Aggregate.
  9. This Eager Table Spool is for Halloween Protection. It is required for correctness when an insert operation affects a table that is also referenced on the data access side of the query. It is technically not required if the single-row maintenance operation results in an update to an existing view row, but it remains in the plan anyway.
  10. The final operator in the plan is labelled as an Update operator, but it will perform either an Insert or an Update for each row it receives depending on the value of the "action code" column added by the Compute Scalar at node 8. More generally, this update operator is capable of inserts, updates, and deletes.

There is quite a bit of detail there, so to summarize:

  • The aggregate groups data changes by the unique clustered key of the view. It computes the net effect of the base table changes on each column per key.
  • The outer join connects the per-key incremental changes to existing rows in the view.
  • The compute scalar calculates whether a new row should be added to the view, or an existing row updated. It computes the final column values for the view insert or update operation.
  • The view update operator inserts a new row or updates an existing one as directed by the action code.

Example 2 – Multi-row Insert

Believe it or not, the single-row base table insert execution plan discussed above was subject to a number of simplifications. Although some possible further optimizations were missed (as noted), the query optimizer still managed to remove some operations from the general indexed view maintenance template, and reduce the complexity of others.

Several of these optimizations were allowed because we were inserting just a single row, but others were enabled because the optimizer was able to see the literal values being added to the base table. For example, the optimizer could see that the group value inserted would pass the predicate in the WHERE clause of the view.

If we now insert two rows, with the values "hidden" in local variables, we get a slightly more complex plan:

DECLARE
    @Group1 integer = 4,
    @Value1 integer = 7,
    @Group2 integer = 5,
    @Value2 integer = 8;

INSERT dbo.T1
    (GroupID, Value)
VALUES
    (@Group1, @Value1),
    (@Group2, @Value2);

Multi-row hidden values execution plan

The new or changed operators are annotated as before:

  1. The Constant Scan provides the values to insert. Previously, an optimization for single-row inserts allowed this operator to be omitted.
  2. A explicit Filter operator is now required to check that the groups inserted to the base table match the WHERE clause in the view. As it happens, both new rows will pass the test, but the optimizer cannot see the values in the variables to know this in advance. Additionally, it would not be safe to cache a plan that skipped this filter because a future reuse of the plan could have different values in the variables.
  3. A Sort is now required to ensure the rows arrive at the Stream Aggregate in group order. The sort was previously removed because it is pointless to sort one row.
  4. The Stream Aggregate now has a "group by" property, matching the unique clustered key of the view.
  5. This Sort is required to present rows in view-key, action code order, which is required for correct operation of the Collapse operator. Sort is a fully blocking operator so there is no longer any need for an Eager Table Spool for Halloween Protection.
  6. The new Collapse operator combines an adjacent insert and delete on the same key value into a single update operation. This operator is not actually required in this case, because no deletion action codes can be generated (only inserts and updates). This appears to be an oversight, or perhaps something left in for safety reasons. The automatically-generated parts of an update query plan can become extremely complex, so it is hard know for sure.

The properties of the Filter (derived from the view's WHERE clause) are:

Filter properties

The Stream Aggregate groups by the view key, and computes the sum and count aggregates per group:

Stream Aggregate properties

The Compute Scalar identifies the action to take per row (insert or update in this case), and computes the value to insert or update in the view:

Compute Scalar properties

The action code is given an expression label of [Act1xxx]. Valid values are 1 for an update, 3 for a delete, and 4 for an insert. This action expression results in an insert (code 4) if no matching row was found in the view (i.e. the outer join returned a null for the NumRows column). If a matching row was found, the action code is 1 (update).

Note that NumRows is the name given to the required COUNT_BIG(*) column in the view. In a plan that could result in deletions from the view, the Compute Scalar would detect when this value would become zero (no rows for the current group) and generate a delete action code (3).

The remaining expressions maintain the sum and count aggregates in the view. Notice though that the expression labels [Expr1009] and [Expr1010] are not new; they refer to the labels created by the Stream Aggregate. The logic is straightforward: if a matching row was not found, the new value to insert is just the value computed at the aggregate. If a matching row in the view was found, the updated value is the current value in the row plus the increment computed by the aggregate.

Finally, the view update operator (shown as a Clustered Index Update in SSMS) shows the action column reference ([Act1013] defined by the Compute Scalar):

image

Example 3 – Multi-row Update

So far we have only looked at inserts to the base table. The execution plans for a deletion are very similar, with just a few minor differences in the detailed calculations. This next example therefore moves on to look at the maintenance plan for a base table update:

DECLARE 
    @Group1 integer = 1,
    @Group2 integer = 2,
    @Value integer = 1;

UPDATE dbo.T1
SET Value = Value + @Value
WHERE GroupID IN (@Group1, @Group2);

As before, this query uses variables to hide literal values from the optimizer, preventing some simplifications from being applied. It is also careful to update two separate groups, preventing optimizations that can be applied when the optimizer knows only a single group (a single row of the indexed view) will be affected. The annotated execution plan for the update query is below:

Multi-row update

The changes and point of interest are:

  1. The new Split operator turns each base table row update into a separate delete and insert operation. Each update row is split into two separate rows, doubling the number of rows after this point in the plan. Split is part of the split-sort-collapse pattern needed to protect against incorrect transient unique key violation errors.
  2. The Stream Aggregate is modified to account for incoming rows that can specify either a delete or an insert (due to the Split, and determined by an action code column in the row). An insert row contributes the original value in sum aggregates; the sign is reversed for delete action rows. Similarly, the row count aggregate here counts insert rows as +1 and delete rows as –1.
  3. The Compute Scalar logic is also modified to reflect that the net effect of the changes per group might require an eventual insert, update, or delete action against the materialized view. It is not actually possible for this particular update query to result in a row being inserted or deleted against this view, but the logic required to deduce that is beyond the optimizer's current reasoning abilities. A slightly different update query or view definition could indeed result in a mixture of insert, delete, and update view actions.
  4. The Collapse operator is highlighted purely for its role in the split-sort-collapse pattern mentioned above. Note that it only collapses deletes and inserts on the same key; unmatched deletes and inserts after the Collapse are perfectly possible (and quite usual).

As before, the key operator properties to look at to understand the indexed view maintenance work are the Filter, Stream Aggregate, Outer Join, and Compute Scalar.

Example 4 – Multi-row Update with Joins

To complete the overview of indexed view maintenance execution plans, we will need a new example view that joins several tables together, and includes a projection in the select list:

CREATE TABLE dbo.E1 (g integer NULL, a integer NULL);
CREATE TABLE dbo.E2 (g integer NULL, a integer NULL);
CREATE TABLE dbo.E3 (g integer NULL, a integer NULL);
GO
INSERT dbo.E1 (g, a) VALUES (1, 1);
INSERT dbo.E2 (g, a) VALUES (1, 1);
INSERT dbo.E3 (g, a) VALUES (1, 1);
GO
CREATE VIEW dbo.V1
WITH SCHEMABINDING
AS
SELECT 
    g = E1.g, 
    sa1 = SUM(ISNULL(E1.a, 0)), 
    sa2 = SUM(ISNULL(E2.a, 0)), 
    sa3 = SUM(ISNULL(E3.a, 0)), 
    cbs = COUNT_BIG(*) 
FROM dbo.E1 AS E1
JOIN dbo.E2 AS E2
    ON E2.g = E1.g
JOIN dbo.E3 AS E3
    ON E3.g = E2.g
WHERE
    E1.g BETWEEN 1 AND 5
GROUP BY
    E1.g;
GO
CREATE UNIQUE CLUSTERED INDEX cuq
ON dbo.V1 (g);

To ensure correctness, one of the indexed view requirements is that a sum aggregate cannot operate on an expression that might evaluate to null. The view definition above uses ISNULL to meet that requirement. A sample update query that produces a pretty comprehensive index maintenance plan component is shown below, together with the execution plan it produces:

UPDATE dbo.E1 
SET g = g + 1, 
    a = a + 1;

Join maintenance plan

The plan looks quite large and complicated now, but most of the elements are exactly as we have already seen. The key differences are:

  1. The top branch of the plan includes a number of extra Compute Scalar operators. These could be more compactly arranged, but essentially they are present to capture the pre-update values of the non-grouping columns. The Compute Scalar to the left of the Table Update captures the post-update value of column "a", with the ISNULL projection applied.
  2. The new Compute Scalars in this area of the plan compute the value produced by the ISNULL expression on each source table. In general, projections on the joined tables in the view will be represented by Compute Scalars here. The sorts in this area of the plan are present purely because the optimizer chose a merge join strategy for cost reasons (remember, merge requires join-key sorted input).
  3. The two join operators are new, and simply implement the joins in the view definition. These joins always appear before the Stream Aggregate that computes the incremental effect of the changes on the view. Note that a change to a base table can result in a row that used to meet the join criteria no longer joining, and vice versa. All these potential complexities are handled correctly (given the indexed view restrictions) by the Stream Aggregate producing a summary of the changes per view key after the joins have been performed.

Final Thoughts

That last plan represents pretty much the full template for maintaining an indexed view, though the addition of nonclustered indexes to the view would add additional operators spooled off the output of the view update operator as well. Aside from an extra Split (and a Sort and Collapse combination if the view's nonclustered index is unique), there is nothing very special about this possibility. Adding an output clause to the base table query can also produce some interesting extra operators, but again, these are not particular to indexed view maintenance per se.

To summarise the complete overall strategy:

  • Base table changes are applied as normal; pre-update values may be captured.
  • A split operator may be used to transform updates into delete/insert pairs.
  • An eager spool saves base table change information to temporary storage.
  • All tables in the view are accessed, except the updated base table (which is read from the spool).
  • Projections in the view are represented by Compute Scalars.
  • Filters in the view are applied. Filters may be pushed into scans or seeks as residuals.
  • Joins specified in the view are performed.
  • An aggregate computes net incremental changes grouped by clustered view key.
  • The incremental change set is outer joined to the view.
  • A Compute Scalar calculates an action code (insert/update/delete against the view) for each change, and computes the actual values to be inserted or updated. The computational logic is based on the output of the aggregate and the result of the outer join to the view.
  • Changes are sorted into view key and action code order, and collapsed to updates as appropriate.
  • Finally, the incremental changes are applied to the view itself.

As we have seen, the normal set of tools available to the query optimizer are still applied to the automatically-generated parts of the plan, meaning that one or more of the steps above may be simplified, transformed, or removed entirely. However, the basic shape and operation of the plan remains intact.

If you have been following along with the code examples, you can use the following script to clean up:

DROP VIEW dbo.V1;
DROP TABLE dbo.E3, dbo.E2, dbo.E1;
DROP VIEW dbo.IV;
DROP TABLE dbo.T1;