Microsoft recently released the first public preview of SQL Server 2022. This release has a number of T-SQL improvements. In this article I focus on windowing and NULL-related improvements. These include the new WINDOW clause and the windowing NULL treatment clause.
I’ll be using the sample database TSQLV6 in the examples in this article. You can download this sample database here.
The WINDOW Clause
The WINDOW clause is part of the ISO/IEC SQL standard. It allows you to name parts of a window specification—or an entire one—and then use the window name in the OVER clause of your query’s window functions. This clause allows you to shorten your code by avoiding the repetition of identical parts of your window specifications. This clause is now available in Azure SQL Database and SQL Server 2022, provided you use database compatibility level 160 or higher.
The WINDOW clause is located between the query’s HAVING and ORDER BY clauses:
FROM
WHERE
GROUP BY
HAVING
WINDOW
ORDER BY
The WINDOW clause has the following syntax:
WINDOW window_name AS ( [ reference_window_name ] [ <window partition clause> ] [ <window order clause> ] [ <window frame clause> ] )
As an example where the WINDOW clause can be handy in shortening your code, consider the following query:
USE TSQLV6;
SELECT orderid, custid, orderdate, qty, val,
SUM(qty) OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING ) AS runsumqty,
SUM(val) OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
ORDER BY custid, orderdate, orderid;
This query generates the following output:
orderid custid orderdate qty val runsumqty runsumval ----------- ----------- ---------- ----------- ------- ----------- ---------- 10643 1 2021-08-25 38 814.50 38 814.50 10692 1 2021-10-03 20 878.00 58 1692.50 10702 1 2021-10-13 21 330.00 79 2022.50 10835 1 2022-01-15 17 845.80 96 2868.30 10952 1 2022-03-16 18 471.20 114 3339.50 11011 1 2022-04-09 60 933.50 174 4273.00 10308 2 2020-09-18 6 88.80 6 88.80 10625 2 2021-08-08 18 479.75 24 568.55 10759 2 2021-11-28 10 320.00 34 888.55 10926 2 2022-03-04 29 514.40 63 1402.95
In this query you can see two window functions using identical window specifications, including window partitioning, ordering and framing clauses. To shorten the query, you can use the WINDOW clause to name a window specification with all three elements, say as W, and then specify OVER W in both window functions, like so:
SELECT orderid, custid, orderdate, qty, val,
SUM(qty) OVER W AS runsumqty,
SUM(val) OVER W AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW W AS ( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;
As you can see, when the window name represents the whole window specification that you need and not just part of it, you specify the window name right after the OVER clause without parentheses.
You may have noticed in the WINDOW clause’s syntax one window name specification can have a reference to another window name. This is especially useful when your query has different window functions with different window specifications and one window specification is the same as part of another. Consider the following query as an example:
SELECT orderid, custid, orderdate, qty, val,
ROW_NUMBER() OVER( PARTITION BY custid
ORDER BY orderdate, orderid ) AS ordernum,
MAX(orderdate) OVER( PARTITION BY custid ) AS maxorderdate,
SUM(qty) OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING ) AS runsumqty,
SUM(val) OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
ORDER BY custid, orderdate, orderid;
This query generates the following output:
orderid custid orderdate qty val ordernum maxorderdate runsumqty runsumval -------- ------- ---------- ---- ------- --------- ------------ ----------- ----------- 10643 1 2021-08-25 38 814.50 1 2022-04-09 38 814.50 10692 1 2021-10-03 20 878.00 2 2022-04-09 58 1692.50 10702 1 2021-10-13 21 330.00 3 2022-04-09 79 2022.50 10835 1 2022-01-15 17 845.80 4 2022-04-09 96 2868.30 10952 1 2022-03-16 18 471.20 5 2022-04-09 114 3339.50 11011 1 2022-04-09 60 933.50 6 2022-04-09 174 4273.00 10308 2 2020-09-18 6 88.80 1 2022-03-04 6 88.80 10625 2 2021-08-08 18 479.75 2 2022-03-04 24 568.55 10759 2 2021-11-28 10 320.00 3 2022-03-04 34 888.55 10926 2 2022-03-04 29 514.40 4 2022-03-04 63 1402.95
The MAX function’s window specification has only a window partition clause. The ROW_NUMBER function’s window specification has a window partition clause that is the same as the MAX function’s, plus a window order clause. Both SUM functions have the same window partition and order clauses as the ROW_NUMBER function’s, plus a window frame clause.
The recursive capability of the WINDOW clause’s syntax allows you to shorten the query’s code, like so:
SELECT orderid, custid, orderdate, qty, val,
ROW_NUMBER() OVER PO AS ordernum,
MAX(orderdate) OVER P AS maxorderdate,
SUM(qty) OVER POF AS runsumqty,
SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid ),
PO AS ( P ORDER BY orderdate, orderid ),
POF AS ( PO ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;
The order of the window name definitions in the WINDOW clause is insignificant. For example, the following code is valid and has the same meaning as the above query:
SELECT orderid, custid, orderdate, qty, val,
ROW_NUMBER() OVER PO AS ordernum,
MAX(orderdate) OVER P AS maxorderdate,
SUM(qty) OVER POF AS runsumqty,
SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW POF AS ( PO ROWS UNBOUNDED PRECEDING ),
PO AS ( P ORDER BY orderdate, orderid ),
P AS ( PARTITION BY custid )
ORDER BY custid, orderdate, orderid;
Note, though, you can't use multiple window name references in one window name specification. You're limited to only one window name reference, plus any relevant additional window specification elements. For example, the following code isn’t valid for this reason:
SELECT orderid, custid, orderdate, qty, val,
SUM(qty) OVER ( P O F ) AS runsumqty,
SUM(val) OVER ( P O F ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid ),
O AS ( ORDER BY orderdate, orderid ),
F AS ( ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;
This code generates the following error:
Incorrect syntax near 'O'.
You're allowed to mix one window name and additional windowing elements in a window specification, like so:
SELECT orderid, custid, orderdate, qty, val,
ROW_NUMBER() OVER ( P ORDER BY orderdate, orderid ) AS ordernum,
MAX(orderdate) OVER P AS maxorderdate
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid )
ORDER BY custid, orderdate, orderid;
This query generates the following output:
orderid custid orderdate qty val ordernum maxorderdate ----------- ----------- ---------- ----------- ------- -------------------- ------------ 10643 1 2021-08-25 38 814.50 1 2022-04-09 10692 1 2021-10-03 20 878.00 2 2022-04-09 10702 1 2021-10-13 21 330.00 3 2022-04-09 10835 1 2022-01-15 17 845.80 4 2022-04-09 10952 1 2022-03-16 18 471.20 5 2022-04-09 11011 1 2022-04-09 60 933.50 6 2022-04-09 10308 2 2020-09-18 6 88.80 1 2022-03-04 10625 2 2021-08-08 18 479.75 2 2022-03-04 10759 2 2021-11-28 10 320.00 3 2022-03-04 10926 2 2022-03-04 29 514.40 4 2022-03-04
As I mentioned before, when a window name represents the whole window specification, like with the MAX function in this query, you specify the window name right after the OVER clause without parentheses. When the window name is only part of the window specification, like with the ROW_NUMBER function in this query, you specify the window name followed by the rest of the windowing elements within parentheses.
By now, you know you're allowed to recursively define one window name based on another. However, in case it wasn’t obvious, cyclic references aren’t allowed. For example, the following query is valid since the window name definitions aren’t cyclic:
SELECT 'This is valid'
WINDOW W1 AS (), W2 AS (W1), W3 AS (W2);
This query generates the following output:
------------- This is valid
However, the following query is invalid since the window name definitions are cyclic:
SELECT 'This is invalid'
WINDOW W1 AS (W2), W2 AS (W3), W3 AS (W1);
This code generates the following error:
Cyclic window references are not permitted.
Lastly, the scope of the defined window names is the immediate query/table expression, and can't cross table expression boundaries. For instance, if you define a window name in the inner query of a CTE, derived table, view or inline table valued function, the outer query won’t recognize the inner window name. As an example, the following query is invalid for this reason:
WITH C AS
(
SELECT orderid, custid, orderdate, qty, val,
SUM(qty) OVER W AS runsumqtyall
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW W AS ( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING )
)
SELECT *,
SUM(qty) OVER W AS runsumqty22
FROM C
WHERE orderdate >= '20220101';
This code generates the following error:
Window 'W' is undefined.
You have to define a window name you want to use in each of the scopes where you want to use it, like so:
WITH C AS
(
SELECT orderid, custid, orderdate, qty, val,
SUM(qty) OVER W AS runsumqtyall
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW W AS ( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING )
)
SELECT *,
SUM(qty) OVER W AS runsumqty22
FROM C
WHERE orderdate >= '20220101'
WINDOW W AS ( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING );
This query generates the following output:
orderid custid orderdate qty val runsumqtyall runsumqty22 ----------- ----------- ---------- ----------- ------- ------------ ----------- 10835 1 2022-01-15 17 845.80 96 17 10952 1 2022-03-16 18 471.20 114 35 11011 1 2022-04-09 60 933.50 174 95 10926 2 2022-03-04 29 514.40 63 29
Each of the scopes defines its own window name W, and they don’t have to be based on the same specification (though they are in this example).
The Windowing NULL Treatment Clause
The NULL treatment clause is part of the ISO/IEC SQL standard and is available to the offset window functions FIRST_VALUE, LAST_VALUE, LAG and LEAD. This clause has the following syntax:
<function>(<scalar_expression>[, <other args>]) [IGNORE NULLS | RESPECT NULLS] OVER( <specification> )
The RESPECT NULLS option is the default, in case you don’t indicate this clause. It means you want the function to return the value of <scalar_expression> in the requested position (first, last, previous, next), whether it's NULL or non-NULL. The IGNORE NULLS option introduces a new capability that people have been eagerly waiting to have in T-SQL for a long time. It means you want the function to return the value of <scalar_expression> in the requested position if it's non-NULL. However, if it is NULL, you want the function to keep going in the relevant direction (backward for LAST_VALUE and LAG, forward for FIRST_VALUE and LEAD) until a non-NULL value is found. If no non-NULL value is found, then it will return a NULL.
To illustrate the utility of this clause, I’ll use a table called T1 in my examples. Use the following code to create and populate T1:
DROP TABLE IF EXISTS dbo.T1;
CREATE TABLE dbo.T1
(
id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
col1 INT NULL,
col2 INT NULL
);
GO
INSERT INTO dbo.T1(id, col1, col2) VALUES
( 2, NULL, 200),
( 3, 10, NULL),
( 5, -1, NULL),
( 7, NULL, 202),
(11, NULL, 150),
(13, -12, 50),
(17, NULL, 180),
(19, NULL, 170),
(23, 1759, NULL);
Suppose the column id represents the chronological order of the events recorded in T1. Each row represents an event where one or more attribute values have changed. A NULL means the attribute retains whatever last non-NULL value it had up to that point.
Suppose you need to return the last known (non-NULL) col1 value per event. Without access to the NULL treatment clause, you'd need to use a fairly complex technique such as the following:
WITH C AS
(
SELECT id, col1,
MAX(CASE WHEN col1 IS NOT NULL THEN id END)
OVER(ORDER BY id
ROWS UNBOUNDED PRECEDING) AS grp
FROM dbo.T1
)
SELECT id, col1,
MAX(col1) OVER(PARTITION BY grp
ORDER BY id
ROWS UNBOUNDED PRECEDING) AS lastknowncol1
FROM C;
If you aren't already familiar with this technique, it can take a bit to figure out the logic here.
This code generates the following output:
id col1 lastknowncol1 ----------- ----------- ------------- 2 NULL NULL 3 10 10 5 -1 -1 7 NULL -1 11 NULL -1 13 -12 -12 17 NULL -12 19 NULL -12 23 1759 1759
Having access to the NULL treatment clause, you can easily achieve the same using the LAST_VALUE function with the IGNORE NULLS option, like so:
SELECT id, col1,
LAST_VALUE(col1) IGNORE NULLS OVER( ORDER BY id ROWS UNBOUNDED PRECEDING ) AS lastknowncol
FROM dbo.T1;
The difference is of course more dramatic if you need to apply this logic to multiple attributes.
Without access to the NULL treatment clause, you'd use the following code to return the last known col1 and col2 values:
WITH C AS
(
SELECT id, col1, col2,
MAX(CASE WHEN col1 IS NOT NULL THEN id END)
OVER(ORDER BY id
ROWS UNBOUNDED PRECEDING) AS grp1,
MAX(CASE WHEN col2 IS NOT NULL THEN id END)
OVER(ORDER BY id
ROWS UNBOUNDED PRECEDING) AS grp2
FROM dbo.T1
)
SELECT id,
col1,
MAX(col1) OVER(PARTITION BY grp1
ORDER BY id
ROWS UNBOUNDED PRECEDING) AS lastknowncol1,
col2,
MAX(col2) OVER(PARTITION BY grp2
ORDER BY id
ROWS UNBOUNDED PRECEDING) AS lastknowncol2
FROM C;
This code generates the following output:
id col1 lastknowncol1 col2 lastknowncol2 ----------- ----------- ------------- ----------- ------------- 2 NULL NULL 200 200 3 10 10 NULL 200 5 -1 -1 NULL 200 7 NULL -1 202 202 11 NULL -1 150 150 13 -12 -12 50 50 17 NULL -12 180 180 19 NULL -12 170 170 23 1759 1759 NULL 170
I should also note even though the table T1 has a supporting covering index with id as the key, each of the last known attribute calculations in the query above results in an explicit sort operator in the plan, as shown in Figure 1.
Figure 1: Plan for query without the NULL treatment clause
This fact makes this solution quite expensive.
Here’s the alternative using the NULL treatment clause:
SELECT id,
col1, LAST_VALUE(col1) IGNORE NULLS OVER W AS lastknowncol1,
col2, LAST_VALUE(col2) IGNORE NULLS OVER W AS lastknowncol2
FROM dbo.T1
WINDOW W AS ( ORDER BY id ROWS UNBOUNDED PRECEDING );
This solution is so much shorter and more elegant, and the optimization of the functions with this option can rely on an ordered scan of a supporting index, and thus avoid explicit sorting, as shown in the plan for this query in Figure 2.
Figure 2: Plan for query with the NULL treatment clause
As mentioned, the NULL treatment clause is available to all offset window functions (FIRST_VALUE, LAST_VALUE, LAG, and LEAD). Here’s an example using LAG to return the previous known value:
SELECT id, col1,
LAG(col1) IGNORE NULLS OVER ( ORDER BY id ) AS prevknowncol1
FROM dbo.T1;
This code generates the following output:
id col1 prevknowncol1 ----------- ----------- ------------- 2 NULL NULL 3 10 NULL 5 -1 10 7 NULL -1 11 NULL -1 13 -12 -1 17 NULL -12 19 NULL -12 23 1759 -12
Want to try to achieve the same without the NULL treatment clause? I bet you don’t!
Conclusion and Other T-SQL Improvements in SQL Server 2022
In this article I covered T-SQL improvements in SQL Server 2022 concerning window functions and NULL handling. I showed how to:
- Reuse parts of—or entire—window definitions with the WINDOW clause
- Control NULL treatment in offset window functions with the NULL treatment clause
SQL Server 2022 has additional T-SQL improvements, covered by Aaron Bertrand in this article:
- GREATEST / LEAST
- STRING_SPLIT
- DATE_BUCKET
- GENERATE_SERIES
Wow, they did it!? I asked you about this a long time ago, I'm glad it will be here soon! It will make some of my reporting queries look a lot cleaner.
Excellent coverage of two great new additions to T-SQL. I've been looking forward to the WINDOW clause for awhile and the new NULL handling features look very helpful.
Yes they did, Doug!
Thanks Mark!
Hi Itzik, Does the WINDOW clause have any relevance to the earlier article about reducing the number of sort operators when using windowing functions? I.e. can we achieve the same improvements by keeping the order of functions referring to windows? Or is this technique somehow affected e.g. do window definitions also have to stick to some order?
Hi Kamil,
I was planning on checking this since I was curious about it too. I’m without my laptop this week but will check when I can and let you know.
Hi again Kamil,
As a follow-up to my last response, I checked it, and I'm afraid the answer is no. That is, the order of the window definitions in the WINDOW clause has no effect on optimization.
So an attempt to use a similar trick to the one in Tip 4 with CTEs using the WINDOW clause doesn't work. For example, the following query results in four sorts and not the optimal two:
It was worth checking thought!
Hi Itzik, Thanks for checking.
Great examples, as always! Does using the WINDOW clause as shown in your examples offer a performance benefit, or does it only serve to make the code more concise and less repetitive (a good enough reason to use it regardless)?
Thanks Mark!
As for your question, there’s no performance side to the WINDOW clause. It just helps you shorten your code.