The Eager Index Spool and The Optimizer

Introduction

An Eager Index Spool reads all rows from its child operator into an indexed worktable, before it starts returning rows to its parent operator. In some respects, an eager index spool is the ultimate missing index suggestion, but it is not reported as such.

Cost assessment

Inserting rows into an indexed worktable is relatively low-cost, but not free. The optimizer must consider that the work involved saves more than it costs. For that to work out in the spool’s favour, the plan must be estimated to consume rows from the spool more than once. Otherwise, it might as well skip the spool, and just do the underlying operation that one time.

To be accessed more than once, the spool must appear on the inner side of a nested loops join operator.
Each iteration of the loop should seek to a particular index spool key value provided by the outer side of the loop.

That means that the join needs to be an apply, not a nested loops join. For the difference between the two, please see my article Apply versus Nested Loops Join.

Notable features

While an eager index spool may only appear on the inner side of an nested loops apply, it is not a “performance spool”. An eager index spool cannot be disabled with trace flag 8690 or the NO_PERFORMANCE_SPOOL query hint.

Rows inserted into the index spool are not normally pre-sorted into index key order, which can result in index page splits. Undocumented trace flag 9260 can be used to generate a Sort operator before the index spool to avoid this. The downside is the extra sorting cost may dissuade the optimizer from choosing the spool option at all.

SQL Server does not support parallel inserts to a b-tree index. This means that everything below a parallel eager index spool runs on a single thread. The operators below the spool are still (misleadingly) marked with the parallelism icon. One thread is chosen to write to the spool. The other threads wait on EXECSYNC while that completes. Once the spool is populated, it may be read from by parallel threads.

Index spools do not tell the optimizer they support output ordered by the spool’s index keys. If sorted output from the spool is required, you may see an unnecessary Sort operator. Eager index spools should often be replaced by a permanent index anyway, so this is a minor concern much of the time.

There are five optimizer rules that can generate an Eager Index Spool option (known internally as an index on-the-fly). We will look at three of these in detail to understand where eager index spools come from.

SelToIndexOnTheFly

This is the most common one. It matches one or more relational selections (a.k.a filters or predicates) just above a data-access operator. The SelToIndexOnTheFly rule replaces the predicates with a seek predicate on an eager index spool.

Demo

An AdventureWorks sample database example is shown below:

SELECT
    P.ProductID,
    P.[Name],
    P.SafetyStockLevel,
    TH.Quantity
FROM Production.Product AS P
CROSS APPLY
(
    SELECT MAX(TH.Quantity)
    FROM Production.TransactionHistory AS TH
    WHERE 
        TH.ProductID = P.ProductID
        AND TH.Quantity < P.SafetyStockLevel
    GROUP BY ()
) AS TH (Quantity)
WHERE
    P.[Name] LIKE N'A%';

This execution plan has an estimated cost of 3.0881 units. Some points of interest:

The Nested Loops Inner Join operator is an apply, with ProductID and SafetyStockLevel from the Product table as outer references.
On the first iteration of the apply, the Eager Index Spool is fully populated from the Clustered Index Scan of the TransactionHistory table.
The spool’s worktable has a clustered index keyed on (ProductID, Quantity).
Rows matching the predicates TH.ProductID = P.ProductID and TH.Quantity < P.SafetyStockLevel are answered by the spool using its index. This is true for every iteration of the apply, including the first one.
The TransactionHistory table is only scanned once.

Sorted input to the spool

It is possible to enforce sorted input to the eager index spool, but this does affect estimated cost, as noted in the introduction. For the example above, enabling the undocumented trace flag produces a plan without a spool:

SELECT
    P.ProductID,
    P.[Name],
    P.SafetyStockLevel,
    TH.Quantity
FROM Production.Product AS P
CROSS APPLY
(
    SELECT
        MAX(TH.Quantity)
    FROM Production.TransactionHistory AS TH
    WHERE 
        TH.ProductID = P.ProductID
        AND TH.Quantity < P.SafetyStockLevel
    GROUP BY ()
) AS TH (Quantity)
WHERE
    P.[Name] LIKE N'A%'
OPTION (QUERYTRACEON 9260);

The estimated cost of this Index Seek and Key Lookup plan is 3.11631 units. This is more than the cost of the plan with an index spool alone, but less than the plan with an index spool and sorted input.

To see a plan with sorted input to the spool, we need to increase the expected number of loop iterations. This gives the spool a chance to repay the extra cost of the Sort. One way to expand the number of rows expected from the Product table is to make the Name predicate less restrictive:

SELECT
    P.ProductID,
    P.[Name],
    P.SafetyStockLevel,
    TH.Quantity
FROM Production.Product AS P
CROSS APPLY
(
    SELECT
        MAX(TH.Quantity)
    FROM Production.TransactionHistory AS TH
    WHERE 
        TH.ProductID = P.ProductID
        AND TH.Quantity < P.SafetyStockLevel
    GROUP BY ()
) AS TH (Quantity)
WHERE
    P.[Name] LIKE N'[A-P]%'
OPTION (QUERYTRACEON 9260);

This gives us an execution plan with sorted input to the spool:

JoinToIndexOnTheFly

This rule transforms an inner join to an apply, with an eager index spool on the inner side. At least one of the join predicates must be an inequality for this rule to be matched.

This is a much more specialized rule than SelToIndexOnTheFly, but the idea is much the same. In this case, the selection (predicate) being transformed to an index spool seek is associated with the join. The transformation from join to apply allows the join predicate to be moved from the join itself to the inner side of the apply.

Demo

SELECT
    P.ProductID,
    P.[Name],
    P.SafetyStockLevel,
    Quantity = MAX(TH.Quantity)
FROM Production.Product AS P
JOIN Production.TransactionHistory AS TH
    ON TH.ProductID = P.ProductID
    AND TH.Quantity < P.SafetyStockLevel
WHERE
    P.[Name] LIKE N'[A-P]%'
GROUP BY
    P.ProductID,
    P.[Name],
    P.SafetyStockLevel
OPTION (LOOP JOIN);

As before, we can request sorted input to the spool:

SELECT
    P.ProductID,
    P.[Name],
    P.SafetyStockLevel,
    Quantity = MAX(TH.Quantity)
FROM Production.Product AS P
JOIN Production.TransactionHistory AS TH
    ON TH.ProductID = P.ProductID
    AND TH.Quantity < P.SafetyStockLevel
WHERE
    P.[Name] LIKE N'[A-P]%'
GROUP BY
    P.ProductID,
    P.[Name],
    P.SafetyStockLevel
OPTION (LOOP JOIN, QUERYTRACEON 9260);

This time, the extra cost of sorting has encouraged the optimizer to choose a parallel plan.

An unwelcome side-effect is the Sort operator spills to tempdb. The total memory grant available for sorting is sufficient, but it is evenly split between parallel threads (as usual). As noted in the introduction, SQL Server does not support parallel inserts to a b-tree index, so the operators below the eager index spool run on a single thread. This single thread only gets a fraction of the memory grant, so the Sort spills to tempdb.

This side-effect is perhaps one reason the trace flag is undocumented and unsupported.

SelSTVFToIdxOnFly

This rule does the same thing as SelToIndexOnTheFly, but for a streaming table-valued function (sTVF) row source. These sTVFs are used extensively internally to implement DMVs and DMFs among other things. They appear in modern execution plans as Table Valued Function operators (originally as remote table scans).

In the past, many of these sTVFs could not accept correlated parameters from an apply. They could accept literals, variables, and module parameters, just not apply outer references. There are still warnings about this in the documentation, but they are somewhat out of date now.

Anyway, the point is that sometimes it is not possible for SQL Server to pass an apply outer reference as a parameter to an sTVF. In that situation, it can make sense to materialize part of the sTVF result in an eager index spool. The present rule provides that ability.

Demo

The next code example shows a DMV query that is successfully converted from a join to an apply. Outer references are passed as parameters to the second DMV:

-- Transformed to an apply
-- Outer reference passed as a parameter
SELECT
    DES.session_id,
    DES.login_time,
    DESWS.waiting_tasks_count
FROM sys.dm_exec_sessions AS DES
JOIN sys.dm_exec_session_wait_stats AS DESWS
    ON DESWS.session_id = DES.session_id
OPTION (FORCE ORDER);

The plan properties of the wait stats TVF show the input parameters. The second parameter value is provided as an outer reference from the sessions DMV:

It is a shame that sys.dm_exec_session_wait_stats is a view, not a function, because that prevents us writing an apply directly.

The rewrite below is enough to defeat the internal conversion:

-- Rewrite to avoid TVF parameter trickery
SELECT
    DES.session_id,
    DES.login_time,
    DESWS.waiting_tasks_count
FROM sys.dm_exec_sessions AS DES
JOIN sys.dm_exec_session_wait_stats AS DESWS
    ON DESWS.session_id >= DES.session_id
    AND DESWS.session_id <= DES.session_id
OPTION (FORCE ORDER);

With the session_id predicates now not consumed as parameters, the SelSTVFToIdxOnFly rule is free to convert them to an eager index spool:

I don’t want to leave you with the impression that tricky rewrites are needed to get an eager index spool over a DMV source – it just makes for an easier demo. If you do happen to encounter a query with DMV joins that produces a plan with an eager spool, at least you know how it got there.

You can’t create indexes on DMVs, so you may need to use a hash or merge join if the execution plan does not perform well enough.

Recursive CTEs

The remaining two rules are SelIterToIdxOnFly and JoinIterToIdxOnFly. They are direct counterparts to SelToIndexOnTheFly and JoinToIndexOnTheFly for recursive CTE data sources. These are extremely rare in my experience, so I am not going to provide demos for them. (Just so the Iter part of the rule name makes sense: It comes from the fact that SQL Server implements tail recursion as nested iteration.)

When a recursive CTE is referenced multiple times on the inside of an apply, a different rule (SpoolOnIterator) can cache the result of the CTE:

WITH R AS
(
    SELECT 1 AS n 
    UNION ALL
    SELECT R.n + 1 
    FROM R 
    WHERE R.n < 10
)
SELECT
    R1.n
FROM R AS R1
CROSS JOIN R AS R2;

The execution plan features a rare Eager Row Count Spool:

Final Thoughts

Eager index spools are often a sign that a useful permanent index is missing from the database schema. This is not always the case, as the streaming table-valued function examples show.