Introduction
Since their introduction in SQL Server 2005, window functions like ROW_NUMBER
and RANK
have proven to be extremely useful in solving a wide variety of common T-SQL problems. In an attempt to generalize such solutions, database designers often look to incorporate them into views to promote code encapsulation and reuse. Unfortunately, a limitation in the SQL Server query optimizer often means that views* containing window functions do not perform as well as expected. This post works through an illustrative example of the problem, details the reasons, and provides a number of workarounds.
*This problem can also occur in derived tables, common table expressions and in-line functions, but I see it most often with views because they are intentionally written to be more generic.
Note: The limitation described here was first fixed in SQL Server 2017 CU 30. Optimizer fixes must be enabled using trace flag 4199 or the database scoped configuration option. The fix is standard behaviour without optimizer hotfixes under compatibility level 160 (SQL Server 2022).
Window functions
Window functions are distinguished by the presence of an OVER()
clause and come in three varieties:
- Ranking window functions
ROW_NUMBER
RANK
DENSE_RANK
NTILE
- Aggregate window functions
MIN
,MAX
,AVG
,SUM
COUNT
,COUNT_BIG
CHECKSUM_AGG
STDEV
,STDEVP
,VAR
,VARP
- Analytic window functions
LAG
,LEAD
FIRST_VALUE
,LAST_VALUE
PERCENT_RANK
,PERCENTILE_CONT
,PERCENTILE_DISC
,CUME_DIST
The ranking and aggregate window functions were introduced in SQL Server 2005, and considerably extended in SQL Server 2012. The analytic window functions are new for SQL Server 2012.
All of the window functions listed above are susceptible to the optimizer limitation detailed in this article.
Example
Using the AdventureWorks sample database, the task at hand is to write a query that returns all product #878 transactions that occurred on the most recent date available. There are all sorts of ways to express this requirement in T-SQL, but we will choose to write a query that uses a windowing function. The first step is to find transaction records for product #878 and rank them in date order descending:
SELECT
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
ORDER BY
rnk;
The results of the query are as expected, with six transactions occurring on the most recent date available. The execution plan contains a warning triangle, alerting us to a missing index:
As usual for missing index suggestions, we need to remember that the recommendation is not the result of a through analysis of the query – it is more of an indication that we need to think a bit about how this query accesses the data it needs.
The suggested index would certainly be more efficient than scanning the table completely, since it would allow an index seek to the particular product we are interested in. The index would also cover all the columns needed, but it would not avoid the sort (by TransactionDate
descending). The ideal index for this query would allow a seek on ProductID
, return the selected records in reverse TransactionDate
order, and cover the other returned columns:
CREATE NONCLUSTERED INDEX ix
ON Production.TransactionHistory
(ProductID, TransactionDate DESC)
INCLUDE
(ReferenceOrderID, Quantity);
With that index in place, the execution plan is much more efficient. The clustered index scan has been replaced by a range seek, and an explicit sort is no longer necessary:
The final step for this query is to limit the results to just those rows that rank #1. We cannot filter directly in the WHERE
clause of our query because window functions may only appear in the SELECT
and ORDER BY
clauses.
We can workaround this restriction using a derived table, common table expression, function, or view. On this occasion, we will use a common table expression (aka an in-line view):
WITH RankedTransactions AS
(
SELECT
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
)
SELECT
TransactionID,
ReferenceOrderID,
TransactionDate,
Quantity
FROM RankedTransactions
WHERE
rnk = 1;
The execution plan is the same as before, with an extra Filter to return only rows ranked #1:
The query returns the six equally ranked rows we expect:
Generalizing the query
It turns out that our query is very useful, so the decision is taken to generalize it and store the definition in a view. For this to work for any product, we need to do two things: return the ProductID
from the view, and partition the ranking function by product:
CREATE VIEW dbo.MostRecentTransactionsPerProduct
WITH SCHEMABINDING
AS
SELECT
sq1.ProductID,
sq1.TransactionID,
sq1.ReferenceOrderID,
sq1.TransactionDate,
sq1.Quantity
FROM
(
SELECT
th.ProductID,
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
PARTITION BY th.ProductID
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
) AS sq1
WHERE
sq1.rnk = 1;
Selecting all the rows from the view results in the following execution plan and correct results:
We can now find the most recent transactions for product 878 with a much simpler query on the view:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = 878;
Our expectation is that the execution plan for this new query will be exactly the same as before we created the view. The query optimizer should be able to push the filter specified in the WHERE
clause down into the view, resulting in an index seek.
We need to stop and think a bit at this point, however. The query optimizer can only produce execution plans that are guaranteed to produce the same results as the logical query specification – is it safe to push our WHERE
clause into the view?<
The answer is yes, so long as the column we are filtering on appears in the PARTITION BY
clause of the window function in the view. The reasoning is that eliminating complete groups (partitions) from the window function will not affect the ranking of rows returned by the query. The question is, does the SQL Server query optimizer know this? The answer depends on which version of SQL Server we are running.
SQL Server 2005 execution plan
A look at the Filter properties in this plan shows it applying two predicates:
The ProductID = 878
predicate has not been pushed down into the view, resulting in a plan that scans our index, ranking every row in the table before filtering for product #878 and rows ranked #1.
The SQL Server 2005 query optimizer cannot push suitable predicates past a window function in a lower query scope (view, common table expression, in-line function or derived table). This limitation applies to all SQL Server 2005 builds.
SQL Server 2008+ execution plan
This is the execution plan for the same query on SQL Server 2008 or later:
The ProductID
predicate has been successfully pushed past the ranking operators, replacing the index scan with the efficient index seek.
The 2008 query optimizer includes a new simplification rule SelOnSeqPrj
(select on sequence project) that is able to push safe outer-scope predicates past window functions. To produce the less efficient plan for this query in SQL Server 2008 or later, we have to temporarily disable this query optimizer feature:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = 878
OPTION (QUERYRULEOFF SelOnSeqPrj);
Unfortunately, the SelOnSeqPrj
simplification rule only works when the predicate performs a comparison with a constant. For that reason, the following query produces the sub-optimal plan on SQL Server 2008 and later:
DECLARE @ProductID INT = 878;
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = @ProductID;
The problem can still occur even where the predicate uses a constant value. SQL Server may decide to auto-parameterize trivial queries (one for which an obvious best plan exists). If auto-parameterization is successful, the optimizer sees a parameter instead of a constant, and the SelOnSeqPrj
rule is not applied.
For queries where auto-parameterization is not attempted (or where it is determined to be unsafe), the optimization may still fail, if the database option for FORCED PARAMETERIZATION
is on. Our test query (with the constant value 878) is not safe for auto-parameterization, but the forced parameterization setting overrides this, resulting in the inefficient plan:
ALTER DATABASE AdventureWorks
SET PARAMETERIZATION FORCED;
GO
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = 878;
GO
ALTER DATABASE AdventureWorks
SET PARAMETERIZATION SIMPLE;
SQL Server 2008+ Workaround
To allow the optimizer to ‘see’ a constant value for query that references a local variable or parameter we can add an OPTION (RECOMPILE)
query hint:
DECLARE @ProductID INT = 878;
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = @ProductID
OPTION (RECOMPILE);
Note: The pre-execution (‘estimated’) execution plan still shows an index scan because the value of the variable is not actually set yet. When the query is executed, however, the execution plan shows the desired index seek plan:
The SelOnSeqPrj
rule does not exist in SQL Server 2005, so OPTION (RECOMPILE)
cannot help there. In case you are wondering, the OPTION (RECOMPILE)
workaround results in a seek even if the database option for forced parameterization is on.
All versions workaround #1
In some cases, it is possible to replace the problematic view, common table expression, or derived table with a parameterized in-line table-valued function:
CREATE FUNCTION dbo.MostRecentTransactionsForProduct
(
@ProductID integer
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
SELECT
sq1.ProductID,
sq1.TransactionID,
sq1.ReferenceOrderID,
sq1.TransactionDate,
sq1.Quantity
FROM
(
SELECT
th.ProductID,
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
PARTITION BY th.ProductID
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = @ProductID
) AS sq1
WHERE
sq1.rnk = 1;
This function explicitly places the ProductID
predicate in the same scope as the window function, avoiding the optimizer limitation. Written to use the in-line function, our example query becomes:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsForProduct(878) AS mrt;
This produces the desired index seek plan on all versions of SQL Server that support window functions. This workaround produces a seek even where the predicate references a parameter or local variable – OPTION (RECOMPILE)
is not required.<
The function body could of course be simplified to remove the now-redundant PARTITION BY
clause, and to no longer return the ProductID
column. I left the definition the same as the view it replaced to more clearly illustrate the cause of the execution plan differences.
All versions workaround #2
The second workaround only applies to ranking window functions that are filtered to return rows numbered or ranked #1 (using ROW_NUMBER
, RANK
, or DENSE_RANK
). This is a very common usage however, so it is worth mentioning.
An additional benefit is that this workaround can produce plans that are even more efficient than the index seek plans seen previously. As a reminder, the previous best plan looked like this:
That execution plan ranks 1,918 rows even though it ultimately returns only 6. We can improve this execution plan by using the window function in an ORDER BY
clause instead of ranking rows and then filtering for rank #1:
SELECT TOP (1) WITH TIES
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
ORDER BY
RANK() OVER (
ORDER BY th.TransactionDate DESC);
That query nicely illustrates the use of a window function in the ORDER BY
clause, but we can do even better, eliminating the window function completely:
SELECT TOP (1) WITH TIES
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
ORDER BY
th.TransactionDate DESC;
This plan reads only 7 rows from the table to return the same 6-row result set. Why 7 rows? The Top operator is running in WITH TIES
mode:
It continues to request one row at a time from its subtree until the TransactionDate changes. The seventh row is required for the Top to be sure that no more tied-value rows will qualify.
We can extend the logic of the query above to replace the problematic view definition:
ALTER VIEW dbo.MostRecentTransactionsPerProduct
WITH SCHEMABINDING
AS
SELECT
p.ProductID,
Ranked1.TransactionID,
Ranked1.ReferenceOrderID,
Ranked1.TransactionDate,
Ranked1.Quantity
FROM
-- List of product IDs
(SELECT ProductID FROM Production.Product) AS p
CROSS APPLY
(
-- Returns rank #1 results for each product ID
SELECT TOP (1) WITH TIES
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = p.ProductID
ORDER BY
th.TransactionDate DESC
) AS Ranked1;
The view now uses a CROSS APPLY
to combine the results of our optimized ORDER BY
query for each product. Our test query is unchanged:
DECLARE @ProductID integer;
SET @ProductID = 878;
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = @ProductID;
Both pre- and post-execution plans show an index seek without needing an OPTION (RECOMPILE)
query hint. The following is a post-execution (‘actual’) plan:
If the view had used ROW_NUMBER
instead of RANK
, the replacement view would simply have omitted the WITH TIES
clause on the TOP (1)
. The new view could also be written as a parameterized in-line table-valued function of course.
One could argue that the original index seek plan with the rnk = 1
predicate could also be optimized to only test 7 rows. After all, the optimizer should know that rankings are produced by the Sequence Project operator in strict ascending order, so execution could end as soon as a row with a rank greater than one is seen. The optimizer does not contain this logic today, however.
Final Thoughts
People are often disappointed by the performance of views that incorporate window functions. The reason can often be traced back to the optimizer limitation described in this post (or perhaps because the view designer did not appreciate that predicates applied to the view must appear in the PARTITION BY
clause to be safely pushed down).
I do want to emphasise that this limitation does not just apply to views, and neither is it limited to ROW_NUMBER
, RANK
, and DENSE_RANK
. You should be aware of this limitation when using any function with an OVER
clause in a view, common table expression, derived table, or in-line table-valued function.
SQL Server 2005 users that encounter this issue are faced with the choice of rewriting the view as a parameterized in-line table-valued function, or using the APPLY
technique (where applicable).
SQL Server 2008 users have the extra option of using an OPTION (RECOMPILE)
query hint if the issue can be solved by allowing the optimizer to see a constant instead of a variable or parameter reference. Remember to check post-execution plans when using this hint though: the pre-execution plan cannot generally show the optimal plan.
Hi Paul,
Great post, heres another option for the
rank=1
check, you need to flip your index round for this to work.You would guess that the query optimiser could figure this out from what you are asking but with a bit of prompting, it finally gets what you want it to do.
Hi Lee,
Yes, that was one of the reasons I wrote, "There are all sorts of ways to express this requirement in T-SQL, but we will choose to write a query that uses a windowing function."
The particular index and rewrite you use there was the subject of my first ever blog post for SQLblog.com:
The Segment Top Query Optimization
The particular query I used was just an example to allow me to talk about window functions, but thanks for the feedback anyway.
Paul
"CREATE VIEW dbo.MostRecentTransactionsPerProduct
WITH SCHEMABINDING"
I have noticed you regularly use the 'WITH SCHEMABINDING' option on your VIEWs. Do you have post or something already written for the pros and cons of this? None of my views/UDFs are schemabound now and i am comtemplating the value of it because it is the requirement for putting an index on a view. However, introducing something like this into a well established process and sizable schema seems scary. Thoughts?
Hi Dan,
There are important reasons to use schema binding with functions (see my recent posts on the Halloween Problem on this site for details). As far as views are concerned, this is partly habit, but I do find it useful when cleaning up test scripts – I can't drop the tables I create and leave the views hanging. More practically, I would often forget to run sp_refreshview or sp_refreshsqlmodule when changing an object that some view depends on, without which unexpected results could occur.
Paul
Paul, I learned a LOT from this article.. thanks so much for writing it and documenting everything so well!
Great article Paul. Also it would be great if you also update it with Sql server 2012 enhancement (if any you think as mentioned in http://technet.microsoft.com/en-us/library/ms189461)
Thanks
The article already mentions the 2012 extensions and notes that they are susceptible to this limitation as well.
Great Article Paul, Very much useful for me to reduce the view execution duration. Earlier my view took 2 mins to display top 100 records now it is taking less than 1 sec. Great…
Great Post.