Itzik Ben-Gan

Number series generator challenge solutions – Part 1

January 13, 2021 by in T-SQL Queries | 17 Comments
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

Last month I posted a challenge to create an efficient number series generator. The responses were overwhelming. There were many brilliant ideas and suggestions, with lots of applications well beyond this particular challenge. It made me realize how great it is to be part of a community, and that amazing things can be achieved when a group of smart people joins forces. Thanks Alan Burstein, Joe Obbish, Adam Machanic, Christopher Ford, Jeff Moden, Charlie, NoamGr, Kamil Kosno, Dave Mason and John Number2 for sharing your ideas and comments.

Initially I thought of writing just one article to summarize the ideas people submitted, but there were too many. So I’ll split the coverage to several articles. This month I’ll focus primarily on Charlie’s and Alan Burstein’s suggested improvements to the two original solutions that I posted last month in the form of the inline TVFs called dbo.GetNumsItzikBatch and dbo.GetNumsItzik. I’ll name the improved versions dbo.GetNumsAlanCharlieItzikBatch and dbo.GetNumsAlanCharlieItzik, respectively.

This is so exciting!

Itzik’s original solutions

As a quick reminder, the functions I covered last month use a base CTE that defines a table value constructor with 16 rows. The functions use a series of cascading CTEs, each applying a product (cross join) of two instances of its preceding CTE. This way, with five CTEs beyond the base one, you can get a set of up to 4,294,967,296 rows. A CTE called Nums uses the ROW_NUMBER function to produce a series of numbers starting with 1. Finally, the outer query computes the numbers in the requested range between the inputs @low and @high.

The dbo.GetNumsItzikBatch function uses a dummy join to a table with a columnstore index to get batch processing. Here’s the code to create the dummy table:

CREATE TABLE dbo.BatchMe(col1 INT NOT NULL, INDEX idx_cs CLUSTERED COLUMNSTORE);

And here’s the code defining the dbo.GetNumsItzikBatch function:

CREATE OR ALTER FUNCTION dbo.GetNumsItzikBatch(@low AS BIGINT, @high AS BIGINT)
  RETURNS TABLE
AS
RETURN

  WITH
    L0 AS ( SELECT 1 AS c 
            FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
                        (1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
    L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
    L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
    L3 AS ( SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B ),
    Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
              FROM L3 )
  SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n
  FROM Nums LEFT OUTER JOIN dbo.BatchMe ON 1 = 0
  ORDER BY rownum;

I used the following code to test the function with "Discard results after execution" enabled in SSMS:

SELECT n FROM dbo.GetNumsItzikBatch(1, 100000000) OPTION(MAXDOP 1);

Here are the performance stats that I got for this execution:

CPU time = 16985 ms,  elapsed time = 18348 ms.

The dbo.GetNumsItzik function is similar, only it doesn’t have a dummy join, and normally gets row mode processing throughout the plan. Here’s the function’s definition:

CREATE OR ALTER FUNCTION dbo.GetNumsItzik(@low AS BIGINT, @high AS BIGINT)
  RETURNS TABLE
AS
RETURN

  WITH
    L0 AS ( SELECT 1 AS c 
            FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
                        (1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
    L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
    L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
    L3 AS ( SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B ),
    Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
              FROM L3 )
  SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n
  FROM Nums
  ORDER BY rownum;

Here’s the code I used to test function:

SELECT n FROM dbo.GetNumsItzik(1, 100000000) OPTION(MAXDOP 1);

Here are the performance stats that I got for this execution:

CPU time = 19969 ms,  elapsed time = 21229 ms.

Alan Burstein's and Charlie's improvements

Alan and Charlie suggested several improvements to my functions, some with moderate performance implications and some with more dramatic ones. I’ll start with Charlie’s findings regarding compilation overhead and constant folding. I’ll then cover Alan’s suggestions, including 1-based versus @low-based sequences (also shared by Charlie and Jeff Moden), avoiding unnecessary sort, and computing a numbers range in opposite order.

Compilation time findings

As Charlie noted, a number series generator is often used to generate series with very small numbers of rows. In those cases, the compile time of the code can become a substantial part of the total query processing time. That’s especially important when using iTVFs, since unlike with stored procedures, it’s not the parameterized query code that gets optimized, rather the query code after parameter embedding takes place. In other words, the parameters are substituted with the input values prior to optimization, and the code with the constants gets optimized. This process can have both negative and positive implications. One of the negative implications is that you get more compilations as the function is called with different input values. For this reason, compile times should definitely be taken into account—especially when using the function very frequently with small ranges.

Here are the compilation times Charlie found for the various base CTE cardinalities:

2: 22ms
4: 9ms
16: 7ms
256: 35ms

It’s curious to see that among these, 16 is the optimal, and that there’s a very dramatic jump when you go up to the next level, which is 256. Recall that the functions dbo.GetNumsItzikBacth and dbo.GetNumsItzik use base CTE cardinality of 16.

Constant folding

Constant folding is often a positive implication that in the right conditions may be enabled thanks to the parameter embedding process that an iTVF experiences. For example, suppose that your function has an expression @x + 1, where @x is an input parameter of the function. You invoke the function with @x = 5 as input. The inlined expression then becomes 5 + 1, and if eligible for constant folding (more on this shortly), then becomes 6. If this expression is part of a more elaborate expression involving columns, and is applied to many millions of rows, this can result in nonnegligible savings in CPU cycles.

The tricky part is that SQL Server is very picky about what to constant fold and what to not constant fold. For example, SQL Server will not constant fold col1 + 5 + 1, nor will it fold 5 + col1 + 1. But it will fold 5 + 1 + col1 to 6 + col1. I know. So, for example, if your function returned SELECT @x + col1 + 1 AS mycol1 FROM dbo.T1, you could enable constant folding with the following small alteration: SELECT @x + 1 + col1 AS mycol1 FROM dbo.T1. Don’t believe me? Examine the plans for the following three queries in the PerformanceV5 database (or similar queries with your data) and see for yourself:

SELECT orderid + 5 + 1 AS myorderid FROM dbo.orders;

SELECT 5 + orderid + 1 AS myorderid FROM dbo.orders;

SELECT 5 + 1 + orderid AS myorderid FROM dbo.orders;

I got the following three expressions in the Compute Scalar operators for these three queries, respectively:

[Expr1003] = Scalar Operator([PerformanceV5].[dbo].[Orders].[orderid]+(5)+(1))

[Expr1003] = Scalar Operator((5)+[PerformanceV5].[dbo].[Orders].[orderid]+(1))

[Expr1003] = Scalar Operator((6)+[PerformanceV5].[dbo].[Orders].[orderid])

See where I’m going with this? In my functions I used the following expression to define the result column n:

@low + rownum - 1 AS n

Charlie realized that with the following small alteration, he can enable constant folding:

@low - 1 + rownum AS n

For example, the plan for the earlier query that I provided against dbo.GetNumsItzik, with @low = 1, originally had the following expression defined by the Compute Scalar operator:

[Expr1154] = Scalar Operator((1)+[Expr1153]-(1))

After applying the above minor change, the expression in the plan becomes:

[Expr1154] = Scalar Operator((0)+[Expr1153])

That’s brilliant!

As for the performance implications, recall that the performance statistics that I got for the query against dbo.GetNumsItzikBatch before the change were the following:

CPU time = 16985 ms,  elapsed time = 18348 ms.

Here are the numbers I got after the change:

CPU time = 16375 ms,  elapsed time = 17932 ms.

Here are the numbers that I got for the query against dbo.GetNumsItzik originally:

CPU time = 19969 ms,  elapsed time = 21229 ms.

And here are the numbers after the change:

CPU time = 19266 ms,  elapsed time = 20588 ms.

Performance improved just a bit by a few percent. But wait, there’s more! If you need to process the data ordered, the performance implications can be much more dramatic, as I’ll get to later in the section about ordering.

1-based versus @low-based sequence and opposite row numbers

Alan, Charlie and Jeff noted that in the vast majority of the real-life cases where you need a range of numbers, you need it to start with 1, or sometimes 0. It’s far less common to need a different starting point. So it could make more sense to have the function always return a range that starts with, say, 1, and when you need a different starting point, apply any computations externally in the query against the function.

Alan actually came up with an elegant idea to have the inline TVF return both a column that starts with 1 (simply the direct result of the ROW_NUMBER function) aliased as rn, and a column that starts with @low aliased as n. Since the function gets inlined, when the outer query interacts only with the column rn, the column n doesn’t even get evaluated, and you get the performance benefit. When you do need the sequence to start with @low, you interact with the column n and pay the applicable extra cost, so there’s no need to add any explicit external computations. Alan even suggested adding a column called op that computes the numbers in opposite order, and interact with it only when in need for such a sequence. The column op is based on the computation: @high + 1 – rownum. This column has significance when you need to process the rows in descending number ordering as I demonstrate later in the ordering section.

So, let’s apply Charlie’s and Alan’s improvements to my functions.

For the batch mode version, make sure that you create the dummy table with the columnstore index first, if it’s not already present:

CREATE TABLE dbo.BatchMe(col1 INT NOT NULL, INDEX idx_cs CLUSTERED COLUMNSTORE);

Then use the following definition for the dbo.GetNumsAlanCharlieItzikBatch function:

CREATE OR ALTER FUNCTION dbo.GetNumsAlanCharlieItzikBatch(@low AS BIGINT = 1, @high AS BIGINT)
  RETURNS TABLE
AS
RETURN

  WITH
    L0 AS ( SELECT 1 AS c 
            FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
                        (1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
    L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
    L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
    L3 AS ( SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B ),
    Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
              FROM L3 )
  SELECT TOP(@high - @low + 1)
     rownum AS rn,
     @high + 1 - rownum AS op,
     @low - 1 + rownum AS n
  FROM Nums LEFT OUTER JOIN dbo.BatchMe ON 1 = 0
  ORDER BY rownum;

Here’s an example for using the function:

SELECT *
FROM dbo.GetNumsAlanCharlieItzikBatch(-2, 3) AS F
ORDER BY rn;

This code generates the following output:

rn  op  n
--- --- ---
1   3   -2
2   2   -1
3   1   0
4   0   1
5   -1  2
6   -2  3

Next, test the function’s performance with 100M rows, first returning the column n:

SELECT n FROM dbo.GetNumsAlanCharlieItzikBatch(1, 100000000) OPTION(MAXDOP 1);

Here are the performance statistics that I got for this execution:

CPU time = 16375 ms,  elapsed time = 17932 ms.

As you can see, there’s a small improvement compared to dbo.GetNumsItzikBatch in both CPU and elapsed time thanks to the constant folding that took place here.

Test the function, only this time returning the column rn:

SELECT rn FROM dbo.GetNumsAlanCharlieItzikBatch(1, 100000000) OPTION(MAXDOP 1);

Here are the performance statistics that I got for this execution:

CPU time = 15890 ms,  elapsed time = 18561 ms.

CPU time further reduced, though elapsed time seems to have increased a bit in this execution compared to when querying the column n.

Figure 1 has the plans for both queries.

Figure 1: Plans for GetNumsAlanCharlieItzikBatch returning n versus rn

You can clearly see in the plans that when interacting with the column rn, there’s no need for the extra Compute Scalar operator. Also notice in the first plan the outcome of the constant folding activity that I described earlier, where @low – 1 + rownum was inlined to 1 – 1 + rownum, and then folded into 0 + rownum.

Here’s the definition of the row-mode version of the function called dbo.GetNumsAlanCharlieItzik:

CREATE OR ALTER FUNCTION dbo.GetNumsAlanCharlieItzik(@low AS BIGINT = 1, @high AS BIGINT)
  RETURNS TABLE
AS
RETURN

  WITH
    L0 AS ( SELECT 1 AS c 
            FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
                        (1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
    L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
    L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
    L3 AS ( SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B ),
    Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
              FROM L3 )
  SELECT TOP(@high - @low + 1)
     rownum AS rn,
     @high + 1 - rownum AS op,
     @low - 1 + rownum AS n
  FROM Nums
  ORDER BY rownum;

Use the following code to test the function, first querying the column n:

SELECT n FROM dbo.GetNumsAlanCharlieItzik(1, 100000000) OPTION(MAXDOP 1);

Here are the performance statistics that I got:

CPU time = 19047 ms,  elapsed time = 20121 ms.

As you can see, it’s a bit faster than dbo.GetNumsItzik.

Next, query the column rn:

SELECT rn FROM dbo.GetNumsAlanCharlieItzik(1, 100000000) OPTION(MAXDOP 1);

The performance numbers further improve on both the CPU and the elapsed time fronts:

CPU time = 17656 ms,  elapsed time = 18990 ms.

Ordering considerations

The aforementioned improvements are certainly interesting, and the performance impact is nonnegligible, but not very significant. A much more dramatic and profound performance impact can be observed when you need to process the data ordered by the number column. This could be as simple as needing to return the rows ordered, but is just as relevant for any order-based processing need, e.g., a Stream Aggregate operator for grouping and aggregation, a Merge Join algorithm for joining, and so forth.

When querying dbo.GetNumsItzikBatch or dbo.GetNumsItzik and ordering by n, the optimizer doesn’t realize that the underlying ordering expression @low + rownum – 1 is order-preserving with respect to rownum. The implication is a bit similar to that of a non-SARGable filtering expression, only with an ordering expression this results in an explicit Sort operator in the plan. The extra sort affects the response time. It also affects scaling, which typically becomes n log n instead of n.

To demonstrate this, query dbo.GetNumsItzikBatch, requesting the column n, ordered by n:

SELECT n
FROM dbo.GetNumsItzikBatch(1,100000000)
ORDER BY n
OPTION(MAXDOP 1);

I got the following performance stats:

CPU time = 34125 ms,  elapsed time = 39656 ms.

The run time more than doubles compared to the test without the ORDER BY clause.

Test the dbo.GetNumsItzik function in a similar way:

SELECT n
FROM dbo.GetNumsItzik(1,100000000)
ORDER BY n
OPTION(MAXDOP 1);

I got the following numbers for this test:

CPU time = 52391 ms,  elapsed time = 55175 ms.

Also here run time more than doubles compared to the test without the ORDER BY clause.

Figure 2 has the plans for both queries.

Figure 2: Plans for GetNumsItzikBatch and GetNumsItzik ordering by n

In both cases you can see the explicit Sort operator in the plans.

When querying dbo.GetNumsAlanCharlieItzikBatch or dbo.GetNumsAlanCharlieItzik and ordering by rn the optimizer doesn’t need to add a Sort operator to the plan. So you could return n, but order by rn, and this way avoid a sort. What’s a bit shocking, though—and I mean it in a good way—is that the revised version of n which experiences constant folding, is order-preserving! It’s easy for the optimizer to realize that 0 + rownum is an order-preserving expression with respect to rownum, and thus avoid a sort.

Try it. Query dbo.GetNumsAlanCharlieItzikBatch, returning n, and ordering by either n or rn, like so:

SELECT n
FROM dbo.GetNumsAlanCharlieItzikBatch(1,100000000)
ORDER BY n -- same with rn
OPTION(MAXDOP 1);

I got the following performance numbers:

CPU time = 16500 ms,  elapsed time = 17684 ms.

That’s of course thanks to the fact that there was no need for a Sort operator in the plan.

Run a similar test against dbo.GetNumsAlanCharlieItzik:

SELECT n
FROM dbo.GetNumsAlanCharlieItzik(1,100000000)
ORDER BY n -- same with rn
OPTION(MAXDOP 1);

I got the following numbers:

CPU time = 19546 ms,  elapsed time = 20803 ms.

Figure 3 has the plans for both queries:


Figure 3: Plans for GetNumsAlanCharlieItzikBatch and GetNumsAlanCharlieItzik ordering by n or rn

Observe that there’s no Sort operator in the plans.

Makes you want to sing…

All you need is constant folding
All you need is constant folding
All you need is constant folding, constant folding
Constant folding is all you need

Thank you Charlie!

But what if you need to return or process the numbers in descending order? The obvious attempt is to use ORDER BY n DESC, or ORDER BY rn DESC, like so:

SELECT n
FROM dbo.GetNumsAlanCharlieItzikBatch(1,100000000)
ORDER BY n DESC
OPTION(MAXDOP 1);

SELECT n
FROM dbo.GetNumsAlanCharlieItzikBatch(1,100000000)
ORDER BY rn DESC
OPTION(MAXDOP 1);

Unfortunately, though, both cases result in an explicit sort in the plans, as shown in Figure 4.

Figure 4: Plans for GetNumsAlanCharlieItzikBatch ordering by n or rn descending

This is where Alan’s clever trick with the column op becomes a lifesaver. Return the column op while ordering by either n or rn, like so:

SELECT op
FROM dbo.GetNumsAlanCharlieItzikBatch(1,100000000)
ORDER BY n
OPTION(MAXDOP 1);

The plan for this query is shown in Figure 5.

Figure 5: Plan for GetNumsAlanCharlieItzikBatch returning op and ordering by n or rn ascending

You get the data back ordered by n descending and there’s no need for a sort in the plan.

Thank you Alan!

Performance summary

So what have we learned from all of this?

Compilation times can be a factor, especially when using the function frequently with small ranges. On a logarithmic scale with a base 2, sweet 16 seems to be a nice magical number.

Understand the peculiarities of constant folding and use them to your advantage. When an iTVF has expressions that involve parameters, constants and columns, put the parameters and constants in the leading portion of the expression. This will increase the likelihood for folding, reduce CPU overhead, and increase the likelihood for order preservation.

It’s ok to have multiple columns that are used for different purposes in an iTVF, and query the relevant ones in each case without worrying about paying for the ones that aren’t referenced.

When you need the number sequence returned in opposite order, use the original n or rn column in the ORDER BY clause with ascending order, and return the column op, which computes the numbers in inverse order.

Figure 6 summarizes the performance numbers that I got in the various tests.

Figure 6: Performance summary

Next month I’ll continue exploring additional ideas, insights and solutions to the number series generator challenge.