Itzik Ben-Gan

Overlooked T-SQL Gems

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

My good friend Aaron Bertrand inspired me to write this article. He reminded me of how sometimes we take things for granted when they seem obvious to us and don’t always bother checking the full story behind them. The relevance to T-SQL is that sometimes we assume that we know everything there is to know about certain T-SQL features, and don’t always bother checking the documentation to see if there’s more to them. In this article I cover a number of T-SQL features that are either often entirely overlooked, or that support parameters or capabilities that are often overlooked. If you have examples of your own of T-SQL gems that are often overlooked, please do share those in the comments section of this article.

Before you start reading this article ask yourself what do you know about the following T-SQL features: EOMONTH, TRANSLATE, TRIM, CONCAT and CONCAT_WS, LOG, cursor variables, and MERGE with OUTPUT.

In my examples I’ll use a sample database called TSQLV5. You can find the script that creates and populates this database here, and its ER diagram here.

EOMONTH has a second parameter

The EOMONTH function was introduced in SQL Server 2012. Many people think that it supports only one parameter holding an input date, and that it simply returns the end-of-month date that corresponds to the input date.

Consider a slightly more sophisticated need to compute the end of the previous month. For example, suppose that you need to query the Sales.Orders table, and return orders that were placed at the end of the previous month.

One way to achieve this is to apply the EOMONTH function to SYSDATETIME to get the end-of-month date of the current month, and then apply the DATEADD function to subtract a month from the result, like so:

USE TSQLV5; 

SELECT orderid, orderdate
FROM Sales.Orders
WHERE orderdate = EOMONTH(DATEADD(month, -1, SYSDATETIME()));

Note that if you actually run this query in the TSQLV5 sample database you will get an empty result since the last order date recorded in the table is May 6th, 2019. However, if the table had orders with an order date that falls on the last day of the previous month, the query would have returned those.

What many people don’t realize is that EOMONTH supports a second parameter where you indicate how many months to add, or subtract. Here’s the [fully documented] syntax of the function:

EOMONTH ( start_date [, month_to_add ] )

Our task can be achieved more easily and naturally by simply specifying -1 as the second parameter to the function, like so:

SELECT orderid, orderdate
FROM Sales.Orders
WHERE orderdate = EOMONTH(SYSDATETIME(), -1);

TRANSLATE is sometimes simpler than REPLACE

Many people are familiar with the REPLACE function and how it works. You use it when you want to replace all occurrences of one substring with another in an input string. Sometimes, though, when you have multiple replacements that you need to apply, using REPLACE is a bit tricky and results in convoluted expressions.

As an example, suppose that you are given an input string @s that contains a number with Spanish formatting. In Spain they use a period as the separator for groups of thousands, and a comma as the decimal separator. You need to convert the input to US formatting, where a comma is used as the separator for groups of thousands, and a period as the decimal separator.

Using one call to the REPLACE function, you can replace only all occurrences of one character or substring with another. To apply two replacements (periods to commas and commas to periods) you need to nest function calls. The tricky part is that if you use REPLACE once to change periods to commas, and then a second time against the result to change commas to periods, you end up with only periods. Try it:

DECLARE @s AS VARCHAR(20) = '123.456.789,00';

SELECT REPLACE(REPLACE(@s, '.', ','), ',', '.');

You get the following output:

123.456.789.00

If you want to stick to using the REPLACE function, you need three function calls. One to replace periods with a neutral character that you know that cannot normally appear in the data (say, ~). Another against the result to replace all commas with periods. Another against the result to replace all occurrences of the temporary character (~ in our example) with commas. Here’s the complete expression:

DECLARE @s AS VARCHAR(20) = '123.456.789,00';
SELECT REPLACE(REPLACE(REPLACE(@s, '.', '~'), ',', '.'), '~', ',');

This time you get the right output:

123,456,789.00

It’s kind of doable, but it results in a long and convoluted expression. What if you had more replacements to apply?

Many people are not aware that SQL Server 2017 introduced a new function called TRANSLATE that simplifies such replacements a great deal. Here’s the function’s syntax:

TRANSLATE ( inputString, characters, translations )

The second input (characters) is a string with the list of the individual characters that you want to replace, and the third input (translations) is a string with the list of the corresponding characters that you want replace the source characters with. This naturally means that the second and third parameters must have the same number of characters. What’s important about the function is that it doesn’t do separate passes for each of the replacements. If it did, it would have potentially resulted in the same bug as in the first example I showed using the two calls to the REPLACE function. Consequently, handling our task becomes a no-brainer:

DECLARE @s AS VARCHAR(20) = '123.456.789,00';
SELECT TRANSLATE(@s, '.,', ',.');

This code generates the desired output:

123,456,789.00

That’s pretty neat!

TRIM is more than LTRIM(RTRIM())

SQL Server 2017 introduced support for the function TRIM. Many people, myself included, initially just assume that it’s no more than a simple shortcut to LTRIM(RTRIM(input)). However, if you check the documentation, you realize that it’s actually more powerful than that.

Before I go into the details, consider the following task: given an input string @s, remove leading and trailing slashes (backward and forward). As an example, suppose that @s contains the following string:

//\\ remove leading and trailing backward (\) and forward (/) slashes \\//

The desired output is:

 remove leading and trailing backward (\) and forward (/) slashes 

Note that the output should retain the leading and trailing spaces.

If you didn’t know of TRIM’s full capabilities, here’s one way you might have solved the task:

DECLARE @s AS VARCHAR(100) = '//\\ remove leading and trailing backward (\) and forward (/) slashes \\//';

SELECT
  TRANSLATE(TRIM(TRANSLATE(TRIM(TRANSLATE(@s, ' /', '~ ')), ' \', '^ ')), ' ^~', '\/ ')
    AS outputstring;

The solution starts by using TRANSLATE to replace all spaces with a neutral character (~) and forward slashes with spaces, then using TRIM to trim leading and trailing spaces from the result. This step essentially trims leading and trailing forward slashes, temporarily using ~ instead of original spaces. Here’s the outcome of this step:

\\~remove~leading~and~trailing~backward~(\)~and~forward~( )~slashes~\\

The second step then uses TRANSLATE to replace all spaces with another neutral character (^) and backward slashes with spaces, then using TRIM to trim leading and trailing spaces from the result. This step essentially trims leading and trailing backward slashes, temporarily using ^ instead of intermediate spaces. Here’s the outcome of this step:

~remove~leading~and~trailing~backward~( )~and~forward~(^)~slashes~

The last step uses TRANSLATE to replace spaces with backward slashes, ^ with forward slashes, and ~ with spaces, generating the desired output:

 remove leading and trailing backward (\) and forward (/) slashes 

As an exercise, try solving this task with a pre-SQL Server 2017 compatible solution where you cannot use TRIM and TRANSLATE.

Back to SQL Server 2017 and above, if you did bother checking the documentation, you would have discovered that TRIM is more sophisticated that what you thought initially. Here’s the function’s Syntax:

TRIM ( [ characters FROM ] string )

The optional characters FROM part allows you to specify one or more characters that you want trimmed from the beginning and end of the input string. In our case, all you need to do is specify '/\' as this part, like so:

DECLARE @s AS VARCHAR(100) = '//\\ remove leading and trailing backward (\) and forward (/) slashes \\//';

SELECT TRIM( '/\' FROM @s) AS outputstring;

That’s a pretty significant improvement compared to the previous solution!

CONCAT and CONCAT_WS

If you’ve been working with T-SQL for a while you know how awkward it is to deal with NULLs when you need to concatenate strings. As an example, consider the location data recorded for employees in the HR.Employees table:

SELECT empid, country, region, city
FROM HR.Employees;

This query generates the following output:

empid       country         region          city
----------- --------------- --------------- ---------------
1           USA             WA              Seattle
2           USA             WA              Tacoma
3           USA             WA              Kirkland
4           USA             WA              Redmond
5           UK              NULL            London
6           UK              NULL            London
7           UK              NULL            London
8           USA             WA              Seattle
9           UK              NULL            London

Notice that for some employees the region part is irrelevant and an irrelevant region is represented by a NULL. Suppose that you need to concatenate the location parts (country, region and city), using a comma as a separator, but ignoring NULL regions. When the region is relevant, you want the result to have the form <coutry>,<region>,<city> and when the region is irrelevant you want the result to have the form <country>,<city>. Normally, concatenating something with a NULL produces a NULL result. You can change this behavior by turning off the CONCAT_NULL_YIELDS_NULL session option, but I wouldn’t recommend enabling nonstandard behavior.

If you didn’t know of the existence of the CONCAT and CONCAT_WS functions, you would have probably used ISNULL or COALESCE to replace a NULL with an empty string, like so:

SELECT empid, country + ISNULL(',' + region, '') + ',' + city AS location
FROM HR.Employees;

Here’s the output of this query:

empid       location
----------- -----------------------------------------------
1           USA,WA,Seattle
2           USA,WA,Tacoma
3           USA,WA,Kirkland
4           USA,WA,Redmond
5           UK,London
6           UK,London
7           UK,London
8           USA,WA,Seattle
9           UK,London

SQL Server 2012 introduced the function CONCAT. This function accepts a list of character string inputs and concatenates them, and while doing so, it ignores NULLs. So using CONCAT you can simplify the solution like this:

SELECT empid, CONCAT(country, ',' + region, ',', city) AS location
FROM HR.Employees;

Still, you have to explicitly specify the separators as part of the function’s inputs. To make our lives even easier, SQL Server 2017 introduced a similar function called CONCAT_WS where you start by indicating the separator, followed by the items that you want to concatenate. With this function the solution is further simplified like so:

SELECT empid, CONCAT_WS(',', country, region, city) AS location
FROM HR.Employees;

The next step is of course mindreading. On April 1st, 2020 Microsoft is planning to release CONCAT_MR. The function will accept an empty input, and figure out automatically which elements you want it to concatenate by reading your mind. The query will then look like this:

SELECT empid, CONCAT_MR() AS location
FROM HR.Employees;

LOG has a second parameter

Similar to the EOMONTH function, many people don’t realize that starting already with SQL Server 2012, the LOG function supports a second parameter that allows you to indicate the logarithm’s base. Prior to that, T-SQL supported the function LOG(input) which returns the natural logarithm of the input (using the constant e as the base), and LOG10(input) which uses 10 as the base.

Not being aware of the existence of the second parameter to the LOG function, when people wanted to compute Logb(x), where b is a base other than e and 10, they often did it the long way. You could rely on the following equation:

Logb(x) = Loga(x)/Loga(b)

As an example, to compute Log2(8), you rely on the following equation:

Log2(8) = Loge(8)/Loge(2)

Translated to T-SQL, you apply the following calculation:

DECLARE @x AS FLOAT = 8, @b AS INT = 2;
SELECT LOG(@x) / LOG(@b);

Once you realize that LOG supports a second parameter where you indicate the base, the calculation simply becomes:

DECLARE @x AS FLOAT = 8, @b AS INT = 2;
SELECT LOG(@x, @b);

Cursor variable

If you’ve been working with T-SQL for a while, you probably had lots of chances to work with cursors. As you know, when working with a cursor, you typically use the following steps:

  • Declare the cursor
  • Open the cursor
  • Iterate through the cursor records
  • Close the cursor
  • Deallocate the cursor

As an example, suppose that you need to perform some task per database in your instance. Using a cursor, you would normally use code similar to the following:

DECLARE @dbname AS sysname;

DECLARE C CURSOR FORWARD_ONLY STATIC READ_ONLY FOR
  SELECT name FROM sys.databases;

OPEN C;

FETCH NEXT FROM C INTO @dbname;

WHILE @@FETCH_STATUS = 0
BEGIN
  PRINT N'Handling database ' + QUOTENAME(@dbname) + N'...';
  /* ... do your thing here ... */
  FETCH NEXT FROM C INTO @dbname;
END;

CLOSE C;
DEALLOCATE C;

The CLOSE command releases the current result set and frees locks. The DEALLOCATE command removes a cursor reference, and when the last reference is deallocated, frees the data structures comprising the cursor. If you try running the above code twice without the CLOSE and DEALLOCATE commands, you will get the following error:

Msg 16915, Level 16, State 1, Line 4
A cursor with the name 'C' already exists.
Msg 16905, Level 16, State 1, Line 6
The cursor is already open.

Make sure you do run the CLOSE and DEALLOCATE commands before you continue.

Many people don’t realize that when they need to work with a cursor in only one batch, which is the most common case, instead of using a regular cursor you can work with a cursor variable. Like any variable, the scope of a cursor variable is only the batch where it was declared. This means that as soon as a batch finishes, all variables expire. Using a cursor variable, once a batch finishes, SQL Server closes and deallocates it automatically, saving you the need to run the CLOSE and DEALLOCATE command explicitly.

Here’s the revised code using a cursor variable this time:

DECLARE @dbname AS sysname, @C AS CURSOR;

SET @C = CURSOR FORWARD_ONLY STATIC READ_ONLY FOR
  SELECT name FROM sys.databases;

OPEN @C;

FETCH NEXT FROM @C INTO @dbname;

WHILE @@FETCH_STATUS = 0
BEGIN
  PRINT N'Handling database ' + QUOTENAME(@dbname) + N'...';
  /* ... do your thing here ... */
  FETCH NEXT FROM @C INTO @dbname;
END;

Feel free to execute it multiple times and notice that this time you don’t get any errors. It’s just cleaner, and you don’t have to worry about keeping cursor resources if you forgot to close and deallocate the cursor.

MERGE with OUTPUT

Since the inception of the OUTPUT clause for modification statements in SQL Server 2005, it turned out to be a very practical tool whenever you wanted to return data from modified rows. People use this feature regularly for purposes like archiving, auditing and many other use cases. One of the annoying things about this feature, though, is that if you use it with INSERT statements, you’re only allowed to return data from the inserted rows, prefixing the output columns with inserted. You don’t have access to the source table’s columns, even though sometimes you do need to return columns from the source alongside columns from the target.

As an example, consider the tables T1 and T2, which you create and populate by running the following code:

DROP TABLE IF EXISTS dbo.T1, dbo.T2;
GO

CREATE TABLE dbo.T1(keycol INT NOT NULL IDENTITY PRIMARY KEY, datacol VARCHAR(10) NOT NULL);

CREATE TABLE dbo.T2(keycol INT NOT NULL IDENTITY PRIMARY KEY, datacol VARCHAR(10) NOT NULL);

INSERT INTO dbo.T1(datacol) VALUES('A'),('B'),('C'),('D'),('E'),('F');

Notice that an identity property is used to generate the keys in both tables.

Suppose that you need to copy some rows from T1 to T2; say, the ones where keycol % 2 = 1. You want to use the OUTPUT clause to return the newly generated keys in T2, but you also want to return alongside those keys the respective source keys from T1. The intuitive expectation is to use the following INSERT statement:

INSERT INTO dbo.T2(datacol)
    OUTPUT T1.keycol AS T1_keycol, inserted.keycol AS T2_keycol
  SELECT datacol FROM dbo.T1 WHERE keycol % 2 = 1;

Unfortunately though, as mentioned, the OUTPUT clause doesn’t allow you to refer to columns from the source table, so you get the following error:

Msg 4104, Level 16, State 1, Line 2
The multi-part identifier "T1.keycol" could not be bound.

Many people don’t realize that oddly this limitation doesn’t apply to the MERGE statement. So even though it’s a bit awkward, you can convert your INSERT statement into a MERGE statement, but in order to do so, you need the MERGE predicate to always be false. This will activate the WHEN NOT MATCHED clause and apply the only supported INSERT action there. You can use a dummy false condition such as 1 = 2. Here’s the complete converted code:

MERGE INTO dbo.T2 AS TGT
USING (SELECT keycol, datacol FROM dbo.T1 WHERE keycol % 2 = 1) AS SRC 
  ON 1 = 2
WHEN NOT MATCHED THEN
  INSERT(datacol) VALUES(SRC.datacol)
OUTPUT SRC.keycol AS T1_keycol, inserted.keycol AS T2_keycol;

This time the code runs successfully, producing the following output:

T1_keycol   T2_keycol
----------- -----------
1           1
3           2
5           3

Hopefully, Microsoft will enhance support for the OUTPUT clause in the other modification statements to allow returning columns from the source table as well.

Conclusion

Don’t assume, and RTFM! :-)