The ISO/IEC 9075:2016 standard (SQL:2016) defines a feature called nested window functions. This feature allows you to nest two kinds of window functions as an argument of a window aggregate function. The idea is to allow you to refer to either a row number, or to a value of an expression, at strategic markers in windowing elements. The markers give you access to the first or last row in the partition, the first or last row in the frame, the current outer row, and the current frame row. This idea is very powerful, enabling you to apply filtering and other kinds of manipulations within your window function that are sometimes difficult to achieve otherwise. You can also use nested window functions to easily emulate other features, like RANGE-based frames. This feature is currently not available in T-SQL. I posted a suggestion to improve SQL Server by adding support for nested window functions. Make sure to add your vote if you feel that this feature could be beneficial to you.
What nested window functions are not about
At the date of this writing, there’s not much information available out there about the true standard nested window functions. What makes it harder is that I don’t know of any platform that implemented this feature yet. In fact, running a web search for nested window functions returns mostly coverage of and discussions about nesting grouped aggregate functions within windowed aggregate functions. For example, suppose that you want to query the Sales.OrderValues view in the TSQLV5 sample database, and return for each customer and order date, the daily total of the order values, and the running total until the current day. Such a task involves both grouping and windowing. You group the rows by the customer ID and the order date, and apply a running sum on top of the group sum of the order values, like so:
USE TSQLV5; -- http://tsql.solidq.com/SampleDatabases/TSQLV5.zip
SELECT custid, orderdate, SUM(val) AS daytotal,
SUM(SUM(val)) OVER(PARTITION BY custid
ORDER BY orderdate
ROWS UNBOUNDED PRECEDING) AS runningsum
FROM Sales.OrderValues
GROUP BY custid, orderdate;
This query generates the following output, shown here in abbreviated form:
custid orderdate daytotal runningsum ------- ---------- -------- ---------- 1 2018-08-25 814.50 814.50 1 2018-10-03 878.00 1692.50 1 2018-10-13 330.00 2022.50 1 2019-01-15 845.80 2868.30 1 2019-03-16 471.20 3339.50 1 2019-04-09 933.50 4273.00 2 2017-09-18 88.80 88.80 2 2018-08-08 479.75 568.55 2 2018-11-28 320.00 888.55 2 2019-03-04 514.40 1402.95 ...
Even though this technique is pretty cool, and even though web searches for nested window functions return mainly such techniques, that’s not what the SQL standard means by nested window functions. Since I couldn’t find any information out there on the topic, I just had to figure it out from the standard itself. Hopefully, this article will increase awareness of the true nested window functions feature, and cause people to turn to Microsoft and ask to add support for it in SQL Server.
What nested window functions are about
Nested window functions include two functions that you can nest as an argument of a window aggregate function. Those are the nested row number function, and the nested value_of expression at row function.
Nested row number function
The nested row number function allows you to refer to the row number of strategic markers in windowing elements. Here’s the syntax of the function:
The row markers that you can specify are:
- BEGIN_PARTITION
- END_PARTITION
- BEGIN_FRAME
- END_FRAME
- CURRENT_ROW
- FRAME_ROW
The first four markers are self-explanatory. As for the last two, the CURRENT_ROW marker represents the current outer row, and the FRAME_ROW represents the current inner frame row.
As an example for using the nested row number function, consider the following task. You need to query the Sales.OrderValues view, and return for each order some of its attributes, as well as the difference between the current order value and the customer average, but excluding the first and last customer orders from the average.
This task is achievable without nested window functions, but the solution involves quite a few steps:
WITH C1 AS
(
SELECT custid, val,
ROW_NUMBER() OVER( PARTITION BY custid
ORDER BY orderdate, orderid ) AS rownumasc,
ROW_NUMBER() OVER( PARTITION BY custid
ORDER BY orderdate DESC, orderid DESC ) AS rownumdesc
FROM Sales.OrderValues
),
C2 AS
(
SELECT custid, AVG(val) AS avgval
FROM C1
WHERE 1 NOT IN (rownumasc, rownumdesc)
GROUP BY custid
)
SELECT O.orderid, O.custid, O.orderdate, O.val,
O.val - C2.avgval AS diff
FROM Sales.OrderValues AS O
LEFT OUTER JOIN C2
ON O.custid = C2.custid;
Here’s the output of this query, shown here in abbreviated form:
orderid custid orderdate val diff -------- ------- ---------- -------- ------------ 10411 10 2018-01-10 966.80 -570.184166 10743 4 2018-11-17 319.20 -809.813636 11075 68 2019-05-06 498.10 -1546.297500 10388 72 2017-12-19 1228.80 -358.864285 10720 61 2018-10-28 550.00 -144.744285 11052 34 2019-04-27 1332.00 -1164.397500 10457 39 2018-02-25 1584.00 -797.999166 10789 23 2018-12-22 3687.00 1567.833334 10434 24 2018-02-03 321.12 -1329.582352 10766 56 2018-12-05 2310.00 1015.105000 ...
Using nested row number functions, the task is achievable with a single query, like so:
SELECT orderid, custid, orderdate, val,
val - AVG( CASE
WHEN ROW_NUMBER(FRAME_ROW) NOT IN
( ROW_NUMBER(BEGIN_PARTITION), ROW_NUMBER(END_PARTITION) ) THEN val
END )
OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS diff
FROM Sales.OrderValues;
Also, the currently supported solution requires at least one sort in the plan, and multiple passes over the data. The solution using nested row number functions has all the potential to get optimized with reliance on index order, and a reduced number of passes over the data. This, of course, is implementation dependent though.
Nested value_of expression at row function
The nested value_of expression at row function enables you to interact with a value of an expression at the same strategic row markers mentioned earlier in an argument of a window aggregate function. Here’s the syntax of this function:
VALUE OF <expression> AT <row marker> [<delta>] [, <default>]
>) OVER(<specification>)
As you can see, you can specify a certain negative or positive delta with respect to the row marker, and optionally provide a default value in case a row doesn’t exist at the specified position.
This capability gives you a lot of power when you need to interact with different points in windowing elements. Consider the fact that as powerful as window functions may be compared to alternative tools like subqueries, what window functions don’t support is a basic concept of a correlation. Using the CURRENT_ROW marker you get access to the outer row, and this way emulate correlations. At the same time you get to benefit from all of the advantages that window functions have compared to subqueries.
As an example, suppose that you need to query the Sales.OrderValues view, and return for each order some of its attributes, as well as the difference between the current order value and the customer average, but excluding orders placed on the same date as the current order date. This requires a capability similar to a correlation. With the nested value_of expression at row function, using the CURRENT_ROW marker, this is achievable easily like so:
SELECT orderid, custid, orderdate, val,
val - AVG( CASE WHEN orderdate <> VALUE OF orderdate AT CURRENT_ROW THEN val END )
OVER( PARTITION BY custid ) AS diff
FROM Sales.OrderValues;
This query is supposed to generate the following output:
orderid custid orderdate val diff -------- ------- ---------- -------- ------------ 10248 85 2017-07-04 440.00 180.000000 10249 79 2017-07-05 1863.40 1280.452000 10250 34 2017-07-08 1552.60 -854.228461 10251 84 2017-07-08 654.06 -293.536666 10252 76 2017-07-09 3597.90 1735.092728 10253 34 2017-07-10 1444.80 -970.320769 10254 14 2017-07-11 556.62 -1127.988571 10255 68 2017-07-12 2490.50 617.913334 10256 88 2017-07-15 517.80 -176.000000 10257 35 2017-07-16 1119.90 -153.562352 ...
If you’re thinking that this task is achievable just as easily with correlated subqueries, in this simplistic case you’d be right. The same can be achieved with the following query:
SELECT O1.orderid, O1.custid, O1.orderdate, O1.val,
O1.val - ( SELECT AVG(O2.val)
FROM Sales.OrderValues AS O2
WHERE O2.custid = O1.custid
AND O2.orderdate <> O1.orderdate ) AS diff
FROM Sales.OrderValues AS O1;
However, remember that a subquery operates on an independent view of the data, whereas a window function operates on the set that is provided as input to the logical query processing step that handles the SELECT clause. Usually, the underlying query has extra logic like joins, filters, grouping, and such. With subqueries, you either need to prepare a preliminary CTE, or repeat the logic of the underlying query also in the subquery. With window functions, there’s no need to repeat any of the logic.
For example, say that you were supposed to operate only on shipped orders (where the shipped date is not NULL) that were handled by employee 3. The solution with the window function needs to add the filter predicates only once, like so:
SELECT orderid, custid, orderdate, val,
val - AVG( CASE WHEN orderdate <> VALUE OF orderdate AT CURRENT_ROW THEN val END )
OVER( PARTITION BY custid ) AS diff
FROM Sales.OrderValues
WHERE empid = 3 AND shippeddate IS NOT NULL;
This query is supposed to generate the following output:
orderid custid orderdate val diff -------- ------- ---------- -------- ------------- 10251 84 2017-07-08 654.06 -459.965000 10253 34 2017-07-10 1444.80 531.733334 10256 88 2017-07-15 517.80 -1022.020000 10266 87 2017-07-26 346.56 NULL 10273 63 2017-08-05 2037.28 -3149.075000 10283 46 2017-08-16 1414.80 534.300000 10309 37 2017-09-19 1762.00 -1951.262500 10321 38 2017-10-03 144.00 NULL 10330 46 2017-10-16 1649.00 885.600000 10332 51 2017-10-17 1786.88 495.830000 ...
The solution with the subquery needs to add the filter predicates twice—once in the outer query and once in the subquery—like so:
SELECT O1.orderid, O1.custid, O1.orderdate, O1.val,
O1.val - ( SELECT AVG(O2.val)
FROM Sales.OrderValues AS O2
WHERE O2.custid = O1.custid
AND O2.orderdate <> O1.orderdate
AND empid = 3
AND shippeddate IS NOT NULL) AS diff
FROM Sales.OrderValues AS O1
WHERE empid = 3 AND shippeddate IS NOT NULL;
It’s either this, or adding a preliminary CTE that takes care of all of the filtering and any other logic. Anyway you look at it, with subqueries, there are more complexity layers involved.
The other benefit in nested window functions is that if we had support for those in T-SQL, it would have been easy to emulate the missing full support for the RANGE window frame unit. The RANGE option is supposed to enable you to define dynamic frames that are based on an offset from the ordering value in the current row. For example, suppose that you need to compute for each customer order from the Sales.OrderValues view the moving average value of the last 14 days. According to the SQL standard, you can achieve this using the RANGE option and the INTERVAL type, like so:
SELECT orderid, custid, orderdate, val,
AVG(val) OVER( PARTITION BY custid
ORDER BY orderdate
RANGE BETWEEN INTERVAL '13' DAY PRECEDING
AND CURRENT ROW ) AS movingavg14days
FROM Sales.OrderValues;
This query is supposed to generate the following output:
orderid custid orderdate val movingavg14days -------- ------- ---------- ------- --------------- 10643 1 2018-08-25 814.50 814.500000 10692 1 2018-10-03 878.00 878.000000 10702 1 2018-10-13 330.00 604.000000 10835 1 2019-01-15 845.80 845.800000 10952 1 2019-03-16 471.20 471.200000 11011 1 2019-04-09 933.50 933.500000 10308 2 2017-09-18 88.80 88.800000 10625 2 2018-08-08 479.75 479.750000 10759 2 2018-11-28 320.00 320.000000 10926 2 2019-03-04 514.40 514.400000 10365 3 2017-11-27 403.20 403.200000 10507 3 2018-04-15 749.06 749.060000 10535 3 2018-05-13 1940.85 1940.850000 10573 3 2018-06-19 2082.00 2082.000000 10677 3 2018-09-22 813.37 813.370000 10682 3 2018-09-25 375.50 594.435000 10856 3 2019-01-28 660.00 660.000000 ...
At the date of this writing, this syntax is not supported in T-SQL. If we had support for nested window functions in T-SQL, you would have been able to emulate this query with the following code:
SELECT orderid, custid, orderdate, val,
AVG( CASE WHEN DATEDIFF(day, orderdate, VALUE OF orderdate AT CURRENT_ROW)
BETWEEN 0 AND 13
THEN val END )
OVER( PARTITION BY custid
ORDER BY orderdate
RANGE UNBOUNDED PRECEDING ) AS movingavg14days
FROM Sales.OrderValues;
What’s not to like?
Cast your vote
The standard nested window functions seem like a very powerful concept that enables a lot of flexibility in interacting with different points in windowing elements. I’m quite surprised that I cannot find any coverage of the concept other than in the standard itself, and that I don’t see many platforms implementing it. Hopefully this article will increase awareness for this feature. If you feel that it could be useful for you to have it available in T-SQL, make sure to cast your vote!