Itzik Ben-Gan

Fundamentals of Table Expressions, Part 12 – Inline Table-Valued Functions

October 13, 2021 by in T-SQL Queries | 4 Comments
SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Itzik is a T-SQL trainer, a co-founder of SolidQ, and blogs about T-SQL fundamentals and query tuning.

Itzik’s Posts

This article is the twelfth part in a series about named table expressions. So far I covered derived tables and CTEs, which are statement-scoped named table expressions, and views, which are reusable named table expressions. This month I introduce inline table-valued functions, or iTVFs, and describe their benefits compared to the other named table expressions. I also compare them with stored procedures, mainly focusing on differences in terms of default optimization strategy, and plan caching and reuse behavior. There’s a lot to cover in terms of optimization, so I’ll start the discussion this month and continue it next month.

In my examples I’ll use a sample database called TSQLV5. You can find the script that creates and populates it here and its ER diagram here.

What Is an Inline Table-Valued Function?

Compared to the previously covered named table expressions, iTVFs resemble mostly views. Like views, iTVFs are created as a permanent object in the database, and therefore are reusable by users who have permissions to interact with them. The main advantage iTVFs have compared to views is the fact that they support input parameters. So, the easiest way to describe an iTVF is as a parameterized view, although technically you create it with a CREATE FUNCTION statement and not with a CREATE VIEW statement.

It’s important not to confuse iTVFs with multi-statement table-valued functions (MSTVFs). The former is an inlinable named table expression based on a single query similar to a view and is the focus of this article. The latter is a programmatic module that returns a table variable as its output, with multi-statement flow in its body whose purpose is to fill the returned table variable with data.

Syntax

Here’s the T-SQL syntax for creating an iTVF:

CREATE [ OR ALTER ] FUNCTION [ < schema name > . ] < function name >

  [ (< input parameters >) ]

RETURNS TABLE

[ WITH < function attributes including SCHEMABINDING > ]

AS

RETURN

  < table expression > [ ; ]

Observe in the syntax the ability to define input parameters.

The purpose of the SCHEMABIDNING attribute is the same as with views and should be evaluated based on similar considerations. For details, see Part 10 in the series.

An Example

As an example for an iTVF, suppose you need to create a reusable named table expression that accepts as inputs a customer ID (@custid) and a number (@n) and returns the requested number of most recent orders from the Sales.Orders table for the input customer.

You cannot implement this task with a view since views lack support for input parameters. As mentioned, you can think of an iTVF as a parameterized view, and as such, it’s the right tool for this task.

Before implementing the function itself, here’s code to create a supporting index on the Sales.Orders table:

USE TSQLV5;
GO

CREATE INDEX idx_nc_cid_odD_oidD_i_eid
  ON Sales.Orders(custid, orderdate DESC, orderid DESC)
  INCLUDE(empid);

And here’s the code to create the function, named Sales.GetTopCustOrders:

CREATE OR ALTER FUNCTION Sales.GetTopCustOrders
  ( @custid AS INT, @n AS BIGINT )
RETURNS TABLE
AS
RETURN
  SELECT TOP (@n) orderid, orderdate, empid
  FROM Sales.Orders
  WHERE custid = @custid
  ORDER BY orderdate DESC, orderid DESC;
GO

Just like with base tables and views, when you’re after retrieving data, you specify iTVFs in the FROM clause of a SELECT statement. Here’s an example requesting the three most recent orders for customer 1:

SELECT orderid, orderdate, empid
FROM Sales.GetTopCustOrders(1, 3);

I’ll refer to this example as Query 1. The plan for Query 1 is shown in Figure 1.

Figure 1: Plan for Query 1

What’s Inline About iTVFs?

If you’re wondering about the source of the term inline in inline table-valued functions, it has to do with how they get optimized. The inlining concept is applicable to all four kinds of named table expressions T-SQL supports, and in part involves what I described in Part 4 in the series as unnesting/substitution. Make sure you revisit the relevant section in Part 4 if you need a refresher.

As you can see in Figure 1, thanks to the fact that the function got inlined, SQL Server was able to create an optimal plan that interacts directly with the underlying base table’s indexes. In our case, the plan performs a seek in the supporting index you created earlier.

iTVFs take the inlining concept a step further by applying parameter embedding optimization by default. Paul White describes parameter embedding optimization in his excellent article Parameter Sniffing, Embedding, and the RECOMPILE Options. With parameter embedding optimization, query parameter references are replaced with the literal constant values from the current execution, and then the code with the constants gets optimized.

Observe in the plan in Figure 1 that both the seek predicate of the Index Seek operator and the top expression of the Top operator show the embedded literal constant values 1 and 3 from the current query execution. They don’t show the parameters @custid and @n, respectively.

With iTVFs, parameter embedding optimization is used by default. With stored procedures, parameterized queries are optimized by default. You need to add OPTION(RECOMPILE) to a stored procedure’s query to request parameter embedding optimization. More details on optimization of iTVFs versus stored procedures, including implications, shortly.

Modifying Data Through iTVFs

Recall from Part 11 in the series that as long as certain requirements are met, named table expressions can be a target of modification statements. This ability applies to iTVFs similar to the way it applies to views. For instance, here’s code you could use to delete the three most recent orders of customer 1 (don't actually run this):

DELETE FROM Sales.GetTopCustOrders(1, 3);

Specifically in our database, attempting to run this code would fail due to referential integrity enforcement (the affected orders happen to have related order lines in the Sales.OrderDetails table), but it’s valid and supported code.

iTVFs vs. Stored Procedures

As mentioned earlier, the default query optimization strategy for iTVFs is different than the one for stored procedures. With iTVFs, the default is to use parameter embedding optimization. With stored procedures, the default is to optimize parameterized queries while applying parameter sniffing. To get parameter embedding for a stored procedure query, you need to add OPTION(RECOMPILE).

As with many optimization strategies and techniques, parameter embedding has its pluses and minuses.

The main plus is it enables query simplifications that can sometimes result in more efficient plans. Some of those simplifications are truly fascinating. Paul demonstrates this with stored procedures in his article, and I’ll demonstrate this with iTVFs next month.

The main minus of parameter embedding optimization is you don’t get efficient plan caching and reuse behavior like you do for parameterized plans. With each distinct combination of parameter values, you get a distinct query string, and hence a separate compilation that results in a separate cached plan. With iTVFs with constant inputs, you can get plan reuse behavior, but only if the same parameter values are repeated. Obviously, a stored procedure query with OPTION(RECOMPILE) won’t reuse a plan even when repeating the same parameter values, by request.

I’ll demonstrate three cases:

  1. Reusable plans with constants resulting from the default parameter embedding optimization for iTVF queries with constants
  2. Reusable parameterized plans resulting from the default optimization of parameterized stored procedure queries
  3. Nonreusable plans with constants resulting from parameter embedding optimization for stored procedure queries with OPTION(RECOMPILE)

Let’s start with case #1.

Use the following code to query our iTVF with @custid = 1 and @n = 3:

SELECT orderid, orderdate, empid
FROM Sales.GetTopCustOrders(1, 3);

As a reminder, this would be the second execution of the same code since you already executed it once with the same parameter values earlier, resulting in the plan shown in Figure 1.

Use the following code to query the iTVF with @custid = 2 and @n = 3 once:

SELECT orderid, orderdate, empid
FROM Sales.GetTopCustOrders(2, 3);

I’ll refer to this code as Query 2. The plan for Query 2 is shown in Figure 2.

Figure 2: Plan for Query 2

Recall that the plan in Figure 1 for Query 1 referred to the constant customer ID 1 in the seek predicate, whereas this plan refers to the constant customer ID 2.

Use the following code to examine query execution stats:

SELECT Q.plan_handle, Q.execution_count, T.text, P.query_plan
FROM sys.dm_exec_query_stats AS Q
  CROSS APPLY sys.dm_exec_sql_text(Q.plan_handle) AS T
  CROSS APPLY sys.dm_exec_query_plan(Q.plan_handle) AS P
WHERE T.text LIKE '%Sales.' + 'GetTopCustOrders(%';

This code generates the following output:

plan_handle         execution_count text                                           query_plan
------------------- --------------- ---------------------------------------------- ----------------
0x06000B00FD9A1...  1               SELECT ... FROM Sales.GetTopCustOrders(2, 3);  <ShowPlanXML...>
0x06000B00F5C34...  2               SELECT ... FROM Sales.GetTopCustOrders(1, 3);  <ShowPlanXML...>

(2 rows affected)

There are two separate plans created here: one for the query with customer ID 1, which was used twice, and another for the query with customer ID 2, which was used once. With a very large number of distinct combinations of parameter values, you’ll end up with a large number of compilations and cached plans.

Let’s proceed with case #2: the default optimization strategy of parameterized stored procedure queries. Use the following code to encapsulate our query in a stored procedure called Sales.GetTopCustOrders2:

CREATE OR ALTER PROC Sales.GetTopCustOrders2
  ( @custid AS INT, @n AS BIGINT )
AS
  SET NOCOUNT ON;

  SELECT TOP (@n) orderid, orderdate, empid
  FROM Sales.Orders
  WHERE custid = @custid
  ORDER BY orderdate DESC, orderid DESC;
GO

Use the following code to execute the stored procedure with @custid = 1 and @n = 3 twice:

EXEC Sales.GetTopCustOrders2 @custid = 1, @n = 3;
EXEC Sales.GetTopCustOrders2 @custid = 1, @n = 3;

The first execution triggers the optimization of the query, resulting in the parameterized plan shown in Figure 3:

Figure 3: Plan for Sales.GetTopCustOrders2 proc

Observe the reference to the parameter @custid in the seek predicate and to the parameter @n in the top expression.

Use the following code to execute the stored procedure with @custid = 2 and @n = 3 once:

EXEC Sales.GetTopCustOrders2 @custid = 2, @n = 3;

The cached parameterized plan shown in Figure 3 is reused again.

Use the following code to examine query execution stats:

SELECT Q.plan_handle, Q.execution_count, T.text, P.query_plan
FROM sys.dm_exec_query_stats AS Q
  CROSS APPLY sys.dm_exec_sql_text(Q.plan_handle) AS T
  CROSS APPLY sys.dm_exec_query_plan(Q.plan_handle) AS P
WHERE T.text LIKE '%Sales.' + 'GetTopCustOrders2%';

This code generates the following output:

plan_handle         execution_count text                                            query_plan
------------------- --------------- ----------------------------------------------- ----------------
0x05000B00F1604...  3               ...SELECT TOP (@n)...WHERE custid = @custid...; <ShowPlanXML...>

(1 row affected)

Only one parameterized plan was created and cached, and used three times, despite the changing customer ID values.

Let’s proceed to case #3. As mentioned, with stored procedure queries you may get parameter embedding optimization when using OPTION(RECOMPILE). Use the following code to alter the procedure query to include this option:

CREATE OR ALTER PROC Sales.GetTopCustOrders2
  ( @custid AS INT, @n AS BIGINT )
AS
  SET NOCOUNT ON;

  SELECT TOP (@n) orderid, orderdate, empid
  FROM Sales.Orders
  WHERE custid = @custid
  ORDER BY orderdate DESC, orderid DESC
  OPTION(RECOMPILE);
GO

Execute the proc with @custid = 1 and @n = 3 twice:

EXEC Sales.GetTopCustOrders2 @custid = 1, @n = 3;
EXEC Sales.GetTopCustOrders2 @custid = 1, @n = 3;

You get the same plan shown earlier in Figure 1 with the embedded constants.

Execute the proc with @custid = 2 and @n = 3 once:

EXEC Sales.GetTopCustOrders2 @custid = 2, @n = 3;

You get the same plan shown earlier in Figure 2 with the embedded constants.

Examine query execution stats:

SELECT Q.plan_handle, Q.execution_count, T.text, P.query_plan
FROM sys.dm_exec_query_stats AS Q
  CROSS APPLY sys.dm_exec_sql_text(Q.plan_handle) AS T
  CROSS APPLY sys.dm_exec_query_plan(Q.plan_handle) AS P
WHERE T.text LIKE '%Sales.' + 'GetTopCustOrders2%';

This code generates the following output:

plan_handle         execution_count text                                            query_plan
------------------- --------------- ----------------------------------------------- ----------------
0x05000B00F1604...  1               ...SELECT TOP (@n)...WHERE custid = @custid...; <ShowPlanXML...>

(1 row affected)

The execution count shows 1, reflecting just the last execution. SQL Server caches the last executed plan, so it can show statistics for that execution, but by request, it doesn’t reuse the plan. If you check the plan shown under the query_plan attribute, you’ll find it’s the one created for the constants in the last execution, shown earlier in Figure 2.

If you’re after fewer compilations, and efficient plan caching and reuse behavior, the default stored procedure optimization approach of parameterized queries is the way to go.

There’s a big advantage that an iTVF-based implementation has over a stored procedure-based one—when you need to apply the function to each row in a table, and pass columns from the table as inputs. For instance, suppose you need to return the three most recent orders for each customer in the Sales.Customers table. No query construct enables you to apply a stored procedure per row in a table. You could implement an iterative solution with a cursor, but it’s always a good day when you can avoid cursors. Combining the APPLY operator with an iTVF call, you can achieve the task nicely and cleanly, like so:

SELECT C.custid, O.orderid, O.orderdate, O.empid
FROM Sales.Customers AS C
  CROSS APPLY Sales.GetTopCustOrders( C.custid, 3 ) AS O;

This code generates the following output (abbreviated):

custid      orderid     orderdate  empid
----------- ----------- ---------- -----------
1           11011       2019-04-09 3
1           10952       2019-03-16 1
1           10835       2019-01-15 1
2           10926       2019-03-04 4
2           10759       2018-11-28 3
2           10625       2018-08-08 3
...

(263 rows affected)

The function call gets inlined, and the reference to the parameter @custid is replaced with the correlation C.custid. This results in the plan shown in Figure 4.

Figure 4: Plan for query with APPLY and Sales.GetTopCustOrders iTVF

The plan scans some index on the Sales.Customers table to get the set of customer IDs and applies a seek in the supporting index you created earlier on Sales.Orders per customer. There’s only one plan since the function got inlined in the outer query, turning into a correlated, or a lateral, join. This plan is highly efficient, especially when the custid column in Sales.Orders is very dense, meaning when there’s a small number of distinct customer IDs.

Of course, there are other ways to implement this task, such as using a CTE with the ROW_NUMBER function. Such a solution tends to work better than the APPLY-based one when the custid column in the Sales.Orders table has low density. Either way, the specific task I used in my examples is not so important for the purposes of our discussion. My point was to explain the different optimization strategies SQL Server employs with the different tools.

When you’re done, use the following code for cleanup:

DROP INDEX IF EXISTS idx_nc_cid_odD_oidD_i_eid ON Sales.Orders;

Summary and What’s Next

So, what have we learned from this?

An iTVF is a reusable parameterized named table expression.

SQL Server uses a parameter embedding optimization strategy with iTVFs by default, and a parameterized query optimization strategy with stored procedure queries. Adding OPTION(RECOMPILE) to a stored procedure query may result in parameter embedding optimization.

If you want to get fewer compilations and efficient plan caching and reuse behavior, parameterized procedure query plans is the way to go.

Plans for iTVF queries are cached and can be reused, so long as the same parameter values are repeated.

You can conveniently combine the use of the APPLY operator and an iTVF to apply the iTVF to each row from the left table, passing columns from the left table as inputs to the iTVF.

As mentioned, there’s a lot to cover about iTVFs optimization. This month I compared iTVFs and stored procedures in terms of the default optimization strategy and plan caching and reuse behavior. Next month I’ll dig more deeply into simplifications resulting from parameter embedding optimization.