I have previously written about the benefits of using
NOEXPAND hints, even in Enterprise Edition. The details are all in the linked article, but to summarize briefly:
- SQL Server will only automatically create statistics on an indexed view when a
NOEXPANDtable hint is used. Omitting this hint can lead to execution plan warnings about missing statistics that cannot be resolved by creating statistics manually.
- SQL Server will only use automatically or manually created view statistics in cardinality estimation calculations when the query references the view directly and a
NOEXPANDhint is used. For all but the most trivial view definitions, this means the quality of cardinality estimates is likely to be lower when this hint is not used, often resulting in less optimal execution plans.
- The lack of, or inability to use, view statistics can cause the optimizer to guess at cardinality estimates, even where base table statistics are available. This can happen where part of the query plan is replaced with an indexed view reference by the automatic view matching feature, but view statistics are not available, as described above.
There is another consequence of not using the
NOEXPAND hint, which I mentioned in passing a couple of years ago in my article, Optimizer Limitations with Filtered Indexes:
NOEXPANDhints are needed even in Enterprise Edition to ensure the uniqueness guarantee provided by the view indexes is used by the optimizer.
This article examines that statement, and its implications in more detail.
The following script creates a simple table and indexed view:
CREATE TABLE dbo.T
col1 integer NOT NULL
INSERT dbo.T WITH (TABLOCKX)
FROM master.dbo.spt_values AS SV
SV.type = N'P';
CREATE VIEW dbo.VT
FROM dbo.T AS T;
CREATE UNIQUE CLUSTERED INDEX cuq
ON dbo.VT (col1);
That creates a single column heap table, and an unrestricted view of the same table with a unique clustered index. This is not intended to be a realistic use case for an indexed view; but it will help illustrate the key points with the minimum of distractions. The important point is that the base table here has no indexes at all (not even a clustered index) but the view does, and that index is unique.
The Example Query
Consider the following simple query against the base table:
FROM dbo.T AS T;
The execution plan you will see for this query depends on the edition of SQL Server in use. If not Enterprise Edition (or equivalent) you will see a plan like this:
The SQL Server query optimizer has chosen to scan the base table and apply the specified distinctness using a Distinct Sort operator. This plan shape is fully expected, since automatic indexed view matching is not available outside Enterprise Edition. I am going to stop saying "Enterprise Edition or equivalent" from this point forward, but please continue to infer that I mean any edition that supports automatic view matching when I say, "Enterprise Edition" from now on.
The EXPAND VIEWS hint
This is a bit of an aside, but to get the same plan on Enterprise Edition, we need to use an
EXPAND VIEWS query hint:
FROM dbo.T AS T
OPTION (EXPAND VIEWS);
It might seem a bit odd to use this hint when there are no view references in the query, but that is how it works. The
EXPAND VIEWS hint effectively specifies that indexed view matching should be disabled while compiling and optimizing the query. To be clear: Without this hint, Enterprise Edition may otherwise match (parts of) the query to one or more indexed views.
With Automatic View Matching Enabled
EXPAND VIEWS hint, compiling the same query on Developer Edition (for example) produces a different plan:
The application of indexed view matching means the execution plan features a scan of the view clustered index instead of a base table scan.
The same plan is produced in this case if the query references the view directly (instead of the base table):
FROM dbo.VT AS V;
In all editions, the view reference is expanded before query optimization begins. In Enterprise-equivalent editions, the expanded form may be matched back to the view later on. This is a key concept to understand when thinking about how the query compiler and optimizer use indexed views in SQL Server.
The Stream Aggregate
The most interesting difference between the two plans we have seen so far is the Stream Aggregate in the view-matched plan. If you look at the estimated costs of the Table Scan and View Scan operators, you will see they are exactly the same. The optimizer did not decide to use the indexed view because it made accessing the data any cheaper. Rather, scanning the view index allows the
DISTINCT requirement to be implemented as a Stream Aggregate, rather than a Hash Aggregate or Distinct Sort (as in the first plan).
A Stream Aggregate requires input ordered by the grouping column(s). In this case, the distinct is equivalent to grouping by the single column, and the view's unique clustered index provides the necessary ordering guarantee. The optimizer's cost model identifies the Stream Aggregate as a cheaper option than a Distinct Sort or Hash Aggregate for this query. This is the basis for the optimizer choosing to access the indexed view when automatic view matching is available.
With all that said and understood, the Stream Aggregate is still unexpected: Given the uniqueness guarantee provided by the view index, there is no need to perform this grouping operation at all. The unique clustered index already ensures the column contains no duplicates.
This, in a nutshell, is the problem. When automatic view matching is used, the optimizer recognises the ordering guarantee provided by the view index, but not the uniqueness guarantee.
Using a NOEXPAND hint
To get the ideal execution plan to this query, we need to reference the view directly and use a
NOEXPAND table hint:
FROM dbo.VT AS V WITH (NOEXPAND);
This gives us the plan an experienced database person would expect; one that correctly recognises that the distinct operation is redundant and can be removed:
A Second Example
Failing to take advantage of uniqueness guarantee provided by an view index can have other effects on the final execution plan. Consider now a self join of the indexed view (again, just to illustrate a concept – this is not intended to be a realistic query):
FROM dbo.VT AS V1
JOIN dbo.VT AS V2
ON V2.col1 = V1.col1;
Using Developer Edition the execution plan chosen does not access the indexed view at all, and features a hash join (sometimes an indication that a useful index is missing):
Now let us try exactly the same query, but with a
NOEXPAND hint on each view reference:
FROM dbo.VT AS V1 WITH (NOEXPAND)
JOIN dbo.VT AS V2 WITH (NOEXPAND)
ON V2.col1 = V1.col1;
The execution plan now features two indexed view accesses and a merge join:
This new plan has a much lower estimated cost than the hash join plan, so why did the optimizer not choose this option before? We can see why by adding a merge join hint to the original query:
FROM dbo.VT AS V1
JOIN dbo.VT AS V2
ON V2.col1 = V1.col1
OPTION (MERGE JOIN);
This gives a similar-looking plan that chooses to access the view even though
NOEXPAND was not specified:
The overall estimated cost of this plan is higher than both previous examples. The Merge Join in this plan also accounts for a higher proportion of the total estimated cost than before (98% versus 48.2%).
The reason for this can be seen by looking at the properties of the merge join. In the
NOEXPAND plan, it was a one-to-many merge join. In the plan directly above, it is a many-to-many merge join. The optimizer's cost model assigns a higher cost to many-to-many merge joins because a tempdb worktable is needed to handle any duplicates.
The guarantees provided by a unique index can be a powerful optimization tool, so it is a shame that automatic index matching is currently unable to take advantage of it. The potential benefits go beyond eliminating unnecessary aggregations or enabling a one-to-many merge join as seen in the preceding simple examples. In general, in can be tough to spot that an execution plan is sub-optimal because the optimizer missed taking advantage of a uniqueness guarantee.
This optimizer limitation does not only apply to the unique clustered index that a view must have in order to be materialized. In more complex scenarios, additional nonclustered indexes may also be present on the view; perhaps to reflect cross-table relationships that are difficult to enforce or represent otherwise. If these nonclustered indexes are defined to be unique, the optimizer will overlook these guarantees as well, if automatic index matching is used.
Adding this to the limitations around the creation and usage of statistical information, it seems that relying on automatic view matching can result in inferior execution plans. The safest option is probably to reference indexed views explicitly, and to use a
NOEXPAND hint every time – at least until these issues are addressed in the product.
I should stress that the issue described in this article only applies to the uniqueness guarantee provided by a unique view index. If the optimizer can get the required uniqueness information another way, the chances are good that optimization problems will be avoided.
For example, there may be a suitable unique index on a base table referenced by the view. Or, in the case of a view that contains aggregation, the optimizer can already infer a useful uniqueness guarantee from the view's
GROUP BY clause. The common practice of adding a view clustered index to the grouping keys adds no extra uniqueness information in that case.
Nevertheless, there are times where this "uniqueness oversight" can mean you will get better quality execution plans by using an explicit view reference and
NOEXPAND hints, even in Enterprise Edition.