A common scenario in many client-server applications is allowing the end user to dictate the sort order of results. Some people want to see the lowest priced items first, some want to see the newest items first, and some want to see them alphabetically. This is a complex thing to achieve in Transact-SQL because you can't just say:
CREATE PROCEDURE dbo.SortOnSomeTable
@SortColumn NVARCHAR(128) = N'key_col',
@SortDirection VARCHAR(4) = 'ASC'
AS
BEGIN
... ORDER BY @SortColumn;
-- or
... ORDER BY @SortColumn @SortDirection;
END
GO
This is because T-SQL does not allow variables in these locations. If you just use @SortColumn, you receive:
The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.
(And when the error message says, "an expression referencing a column name," you might find it ambiguous, and I agree. But I can assure you that this does not mean a variable is a suitable expression.)
If you try to append @SortDirection, the error message is a little more opaque:
Incorrect syntax near '@SortDirection'.
There are a few ways around this, and your first instinct might be to use dynamic SQL, or to introduce the CASE expression. But as with most things, there are complications that can force you down one path or another. So which one should you use? Let's explore how these solutions might work, and compare the impacts on performance for a few different approaches.
Sample Data
Using a catalog view we all probably understand quite well, sys.all_objects, I created the following table based on a cross join, limiting the table to 100,000 rows (I wanted data that filled many pages but that didn't take significant time to query and test):
CREATE DATABASE OrderBy;
GO
USE OrderBy;
GO
SELECT TOP (100000)
key_col = ROW_NUMBER() OVER (ORDER BY s1.[object_id]), -- a BIGINT with clustered index
s1.[object_id], -- an INT without an index
name = s1.name -- an NVARCHAR with a supporting index
COLLATE SQL_Latin1_General_CP1_CI_AS,
type_desc = s1.type_desc -- an NVARCHAR(60) without an index
COLLATE SQL_Latin1_General_CP1_CI_AS,
s1.modify_date -- a datetime without an index
INTO dbo.sys_objects
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id];
(The COLLATE trick is because many catalog views have different columns with different collations, and this ensures that the two columns will match for the purposes of this demo.)
Then I created a typical clustered / non-clustered index pair that might exist on such a table, prior to optimization (I can't use object_id for the key, because the cross join creates duplicates):
CREATE UNIQUE CLUSTERED INDEX key_col ON dbo.sys_objects(key_col);
CREATE INDEX name ON dbo.sys_objects(name);
Use Cases
As mentioned above, users may want to see this data ordered in a variety of ways, so let's set out some typical use cases we want to support (and by support, I mean demonstrate):
- Ordered by key_col ascending ** default if user doesn't care
- Ordered by object_id (ascending/descending)
- Ordered by name (ascending/descending)
- Ordered by type_desc (ascending/descending)
- Ordered by modify_date (ascending/descending)
We'll leave the key_col ordering as the default because it should be the most efficient if the user doesn't have a preference; since the key_col is an arbitrary surrogate that should mean nothing to the user (and may not even be exposed to them), there is no reason to allow reverse sorting on that column.
Approaches That Don't Work
The most common approach I see when someone first starts to tackle this problem is introducing control-of-flow logic to the query. They expect to be able to do this:
SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY
IF @SortColumn = 'key_col'
key_col
IF @SortColumn = 'object_id'
[object_id]
IF @SortColumn = 'name'
name
...
IF @SortDirection = 'ASC'
ASC
ELSE
DESC;
This obviously doesn't work. Next I see CASE being introduced incorrectly, using similar syntax:
SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY CASE @SortColumn
WHEN 'key_col' THEN key_col
WHEN 'object_id' THEN [object_id]
WHEN 'name' THEN name
...
END CASE @SortDirection WHEN 'ASC' THEN ASC ELSE DESC END;
This is closer, but it fails for two reasons. One is that CASE is an expression that returns exactly one value of a specific data type; this merges data types that are incompatible and therefore will break the CASE expression. The other is that there is no way to conditionally apply the sort direction this way without using dynamic SQL.
Approaches That Do Work
The three primary approaches I've seen are as follows:
Group compatible types and directions together
In order to use CASE with ORDER BY, there must be a distinct expression for each combination of compatible types and directions. In this case we would have to use something like this:
CREATE PROCEDURE dbo.Sort_CaseExpanded
@SortColumn NVARCHAR(128) = N'key_col',
@SortDirection VARCHAR(4) = 'ASC'
AS
BEGIN
SET NOCOUNT ON;
SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY
CASE WHEN @SortDirection = 'ASC' THEN
CASE @SortColumn
WHEN 'key_col' THEN key_col
WHEN 'object_id' THEN [object_id]
END
END,
CASE WHEN @SortDirection = 'DESC' THEN
CASE @SortColumn
WHEN 'key_col' THEN key_col
WHEN 'object_id' THEN [object_id]
END
END DESC,
CASE WHEN @SortDirection = 'ASC' THEN
CASE @SortColumn
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
END
END,
CASE WHEN @SortDirection = 'DESC' THEN
CASE @SortColumn
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
END
END DESC,
CASE WHEN @SortColumn = 'modify_date'
AND @SortDirection = 'ASC' THEN modify_date
END,
CASE WHEN @SortColumn = 'modify_date'
AND @SortDirection = 'DESC' THEN modify_date
END DESC;
END
You might say, wow, that's an ugly bit of code, and I would agree with you. I think this is why a lot of folks cache their data on the front end and let the presentation tier deal with juggling it around in different orders. :-)
You can collapse this logic a little bit further by converting all the non-string types into strings that will sort correctly, e.g.
CREATE PROCEDURE dbo.Sort_CaseCollapsed
@SortColumn NVARCHAR(128) = N'key_col',
@SortDirection VARCHAR(4) = 'ASC'
AS
BEGIN
SET NOCOUNT ON;
SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY
CASE WHEN @SortDirection = 'ASC' THEN
CASE @SortColumn
WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12)
WHEN 'object_id' THEN
RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0')
+ REPLICATE('0', 23) + RTRIM([object_id]), 24)
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
END
END,
CASE WHEN @SortDirection = 'DESC' THEN
CASE @SortColumn
WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12)
WHEN 'object_id' THEN
RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0')
+ REPLICATE('0', 23) + RTRIM([object_id]), 24)
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
END
END DESC;
END
Still, it's a pretty ugly mess, and you have to repeat the expressions twice to deal with the different sort directions. I would also suspect that using OPTION RECOMPILE on that query would prevent you from being stung by parameter sniffing. Except in the default case, it's not like the majority of the work being done here is going to be compilation.
Apply a rank using window functions
I discovered this neat trick from AndriyM, though it is most useful in cases where all of the potential ordering columns are of compatible types, otherwise the expression used for ROW_NUMBER() is equally complex. The most clever part is that in order to switch between ascending and descending order, we simply multiply the ROW_NUMBER() by 1 or -1. We can apply it in this situation as follows:
CREATE PROCEDURE dbo.Sort_RowNumber
@SortColumn NVARCHAR(128) = N'key_col',
@SortDirection VARCHAR(4) = 'ASC'
AS
BEGIN
SET NOCOUNT ON;
;WITH x AS
(
SELECT key_col, [object_id], name, type_desc, modify_date,
rn = ROW_NUMBER() OVER (
ORDER BY CASE @SortColumn
WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12)
WHEN 'object_id' THEN
RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0')
+ REPLICATE('0', 23) + RTRIM([object_id]), 24)
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
END
) * CASE @SortDirection WHEN 'ASC' THEN 1 ELSE -1 END
FROM dbo.sys_objects
)
SELECT key_col, [object_id], name, type_desc, modify_date
FROM x
ORDER BY rn;
END
GO
Again, OPTION RECOMPILE can help here. Also, you might notice in some of these cases that ties are handled differently by the various plans – when ordering by name, for example, you will usually see key_col come through in ascending order within each set of duplicate names, but you may also see the values mixed up. To provide more predictable behavior in the event of ties, you can always add an additional ORDER BY clause. Note that if you were to add key_col to the first example, you'll need to make it an expression so that key_col is not listed in the ORDER BY twice (you can do this using key_col + 0, for example).
Dynamic SQL
A lot of people have reservations about dynamic SQL – it's impossible to read, it's a breeding ground for SQL injection, it leads to plan cache bloat, it defeats the purpose of using stored procedures… Some of these are simply untrue, and some of them are easy to mitigate. I've added some validation here that could just as easily be added to any of the above procedures:
CREATE PROCEDURE dbo.Sort_DynamicSQL
@SortColumn NVARCHAR(128) = N'key_col',
@SortDirection VARCHAR(4) = 'ASC'
AS
BEGIN
SET NOCOUNT ON;
-- reject any invalid sort directions:
IF UPPER(@SortDirection) NOT IN ('ASC','DESC')
BEGIN
RAISERROR('Invalid parameter for @SortDirection: %s', 11, 1, @SortDirection);
RETURN -1;
END
-- reject any unexpected column names:
IF LOWER(@SortColumn) NOT IN (N'key_col', N'object_id', N'name', N'type_desc', N'modify_date')
BEGIN
RAISERROR('Invalid parameter for @SortColumn: %s', 11, 1, @SortColumn);
RETURN -1;
END
SET @SortColumn = QUOTENAME(@SortColumn);
DECLARE @sql NVARCHAR(MAX);
SET @sql = N'SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY ' + @SortColumn + ' ' + @SortDirection + ';';
EXEC sp_executesql @sql;
END
Performance Comparisons
I created a wrapper stored procedure for each procedure above, so that I could easily test all scenarios. The four wrapper procedures look like this, with the procedure name varying of course:
CREATE PROCEDURE dbo.Test_Sort_CaseExpanded
AS
BEGIN
SET NOCOUNT ON;
EXEC dbo.Sort_CaseExpanded; -- default
EXEC dbo.Sort_CaseExpanded N'name', 'ASC';
EXEC dbo.Sort_CaseExpanded N'name', 'DESC';
EXEC dbo.Sort_CaseExpanded N'object_id', 'ASC';
EXEC dbo.Sort_CaseExpanded N'object_id', 'DESC';
EXEC dbo.Sort_CaseExpanded N'type_desc', 'ASC';
EXEC dbo.Sort_CaseExpanded N'type_desc', 'DESC';
EXEC dbo.Sort_CaseExpanded N'modify_date', 'ASC';
EXEC dbo.Sort_CaseExpanded N'modify_date', 'DESC';
END
And then using SQL Sentry Plan Explorer, I generated actual execution plans (and the metrics to go along with it) with the following queries, and repeated the process 10 times to sum up total duration:
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
EXEC dbo.Test_Sort_CaseExpanded;
--EXEC dbo.Test_Sort_CaseCollapsed;
--EXEC dbo.Test_Sort_RowNumber;
--EXEC dbo.Test_Sort_DynamicSQL;
GO 10
I also tested the first three cases with OPTION RECOMPILE (doesn't make much sense for the dynamic SQL case, since we know it will be a new plan each time), and all four cases with MAXDOP 1 to eliminate parallelism interference. Here are the results:
Conclusion
For outright performance, dynamic SQL wins every time (though only by a small margin on this data set). The ROW_NUMBER() approach, while clever, was the loser in each test (sorry AndriyM).
It gets even more fun when you want to introduce a WHERE clause, never mind paging. These three are like the perfect storm for introducing complexity to what starts out as a simple search query. The more permutations your query has, the more likely you'll want to throw readability out the window and use dynamic SQL in combination with the "optimize for ad hoc workloads" setting to minimize the impact of single-use plans in your plan cache.
Hi,
If you add parameters to sp_executesql it should save the plan cache, isn't it?
Thanks
MCKLMT well, it depends. You can't parameterize column names or table names into sp_executesql, you must construct the first parameter with those names embedded first. So if you have 30 different combinations of ORDER BY something ASC|DESC, you'll end up with as many as 30 different plans.
But it's better to have 30 differents cached plans than always recalculate the plan of the query, isn't it? :)
Again, it depends. How many queries like this do you have? Is your server under memory pressure or CPU pressure? (Usually it's the former rather than the latter.) I'm not saying having 30 plans is bad, I'm just making sure people are aware of what that means. And as the post points out, I *am* advocating dynamic SQL, especially in cases where there are many variations of the query. So I'm not quite sure if you're arguing with me or not, but I think we're in agreement. :-)
Thanks for your explanations! ;)
May I recommend a visit to http://www.sommarskog.se ?
@Henrik, sure, I recommend Erland's articles all the time, and it is a great place to learn about dynamic SQL in general. He treats a small subset of potential "dynamic" ORDER BY solutions here: http://sommarskog.se/dynamic_sql.html#Order_by – but he doesn't address other potential solutions that I've covered here, nor does he elaborate on the performance aspect. He also seems to imply that it is better to write elaborate CASE statements than to use dynamic SQL, and I'm not sure I agree with that general suggestion in all cases.
I was thinking of http://sommarskog.se/dyn-search-2008.html, but Yes, it is nice to see things from a new angle, as you do in your post.
I have used your case approach to do conditional group by. I do it so that I can reuse a complex join condition while summarizing data differently (by company, by year, by product…). In this case I tend to be inserting into a temp table anyway, so I don't really think that it costs all that much.
As an app developer (not a DBA), In cases where I would use dynamic sql, I would tend to move the query closer to my code.
Hi Mike, have you looked into GROUPING SETS? This was introduced in SQL Server 2008 and may simplify some of your queries.
http://blogs.msdn.com/b/craigfr/archive/2007/10/11/grouping-sets-in-sql-server-2008.aspx
http://msdn.microsoft.com/en-us/library/bb510427%28v=sql.100%29
http://msdn.microsoft.com/en-us/library/bb522495%28v=sql.100%29
Grouping sets look compelling. Unfortunately we are skipping SQL Server 2008 and will move to 2012 some time in 2014.
I used a very simple method for my scenario. I have the sorting criteria set in a separate table. My solution is like this:
DECLARE @LIST VARCHAR(4000)
SET @LIST = SUBSTRING( ( SELECT ‘, ‘ + CASE pfr.PRIORITY_CD WHEN ‘ACCT’ THEN ‘Acct’ ELSE ‘PlayerID’ END + CASE pfr.IS_ASC WHEN 1 THEN ‘ ASC’ ELSE ‘ DESC’ END
FROM dbo.PRIORITY_COMP_SETTING pcs WITH(NOLOCK) ORDER BY pcs.PRIORITY_IND FOR XML PATH (”) ),2,20000)
–PRINT @LIST SELECT * FROM dbo.players ORDER BY ( SELECT @LIST)
The only thing to keep in mind is to specify column names in single quotes in case you are directly writing the logic in order by clause instead of using a varchar type variable. I compared two methods and found that cost of using a varchar type variable is lower.
I suppose this method should apply to your example in blog and other examples in comments section.
Guys, please let me know whether this approach works for you. Please let me know your suggestions regarding performance as well.
Thanks,
Deepak Khandelwal
Guys, sorry, its not working, i have to use dynamic sql.
Does not work, gives no errors, but resultset has unpredicted order !!
A had to sort 1 date column dynamically (ASC or DESC).
When you convert the value of the field you want to sort to a numeric one (ISO date format, 112), you can multiply by 1 or -1 and so change the sorting.
declare @sort = 1
if @orderby = 'desc' set @sort = -1
…… ORDER BY convert(float,convert(varchar,datefield,112)) * @sort
this can also be done with text columns when you convert the letters to numeric values.
Yes, while this works, I don't think it will be very efficient, because even in cases where there is an index to support the desired sorting, the computations against the column will lead to a full scan and an additional sort operation anyway.
Also, please use
CHAR(8)
and notVARCHAR
without length (more details here).Really really helpful. Thank you so much