Aaron Bertrand

Conditional Order By

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Itzik is a T-SQL trainer, a co-founder of SolidQ, and blogs about T-SQL fundamentals and query tuning.

Itzik’s Posts

A common scenario in many client-server applications is allowing the end user to dictate the sort order of results. Some people want to see the lowest priced items first, some want to see the newest items first, and some want to see them alphabetically. This is a complex thing to achieve in Transact-SQL because you can't just say:

CREATE PROCEDURE dbo.SortOnSomeTable
  @SortColumn    NVARCHAR(128) = N'key_col',
  @SortDirection VARCHAR(4)    = 'ASC'
AS
BEGIN
  ... ORDER BY @SortColumn;

  -- or

  ... ORDER BY @SortColumn @SortDirection;
END
GO

This is because T-SQL does not allow variables in these locations. If you just use @SortColumn, you receive:

Msg 1008, Level 16, State 1, Line x
The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.

(And when the error message says, "an expression referencing a column name," you might find it ambiguous, and I agree. But I can assure you that this does not mean a variable is a suitable expression.)

If you try to append @SortDirection, the error message is a little more opaque:

Msg 102, Level 15, State 1, Line x
Incorrect syntax near '@SortDirection'.

There are a few ways around this, and your first instinct might be to use dynamic SQL, or to introduce the CASE expression. But as with most things, there are complications that can force you down one path or another. So which one should you use? Let's explore how these solutions might work, and compare the impacts on performance for a few different approaches.

 

Sample Data

Using a catalog view we all probably understand quite well, sys.all_objects, I created the following table based on a cross join, limiting the table to 100,000 rows (I wanted data that filled many pages but that didn't take significant time to query and test):

CREATE DATABASE OrderBy;
GO
USE OrderBy;
GO

SELECT TOP (100000) 
  key_col = ROW_NUMBER() OVER (ORDER BY s1.[object_id]), -- a BIGINT with clustered index
  s1.[object_id],             -- an INT without an index
  name = s1.name              -- an NVARCHAR with a supporting index
              COLLATE SQL_Latin1_General_CP1_CI_AS,
  type_desc = s1.type_desc    -- an NVARCHAR(60) without an index
              COLLATE SQL_Latin1_General_CP1_CI_AS,
  s1.modify_date              -- a datetime without an index
INTO       dbo.sys_objects 
FROM       sys.all_objects AS s1 
CROSS JOIN sys.all_objects AS s2
ORDER BY   s1.[object_id];

(The COLLATE trick is because many catalog views have different columns with different collations, and this ensures that the two columns will match for the purposes of this demo.)

Then I created a typical clustered / non-clustered index pair that might exist on such a table, prior to optimization (I can't use object_id for the key, because the cross join creates duplicates):

CREATE UNIQUE CLUSTERED INDEX key_col ON dbo.sys_objects(key_col);

CREATE INDEX name ON dbo.sys_objects(name);
 

Use Cases

As mentioned above, users may want to see this data ordered in a variety of ways, so let's set out some typical use cases we want to support (and by support, I mean demonstrate):

  • Ordered by key_col ascending ** default if user doesn't care
  • Ordered by object_id (ascending/descending)
  • Ordered by name (ascending/descending)
  • Ordered by type_desc (ascending/descending)
  • Ordered by modify_date (ascending/descending)

We'll leave the key_col ordering as the default because it should be the most efficient if the user doesn't have a preference; since the key_col is an arbitrary surrogate that should mean nothing to the user (and may not even be exposed to them), there is no reason to allow reverse sorting on that column.

 

Approaches That Don't Work

The most common approach I see when someone first starts to tackle this problem is introducing control-of-flow logic to the query. They expect to be able to do this:

SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY 
IF @SortColumn = 'key_col'
    key_col
IF @SortColumn = 'object_id'
    [object_id]
IF @SortColumn = 'name'
    name
...
IF @SortDirection = 'ASC'
    ASC
ELSE
    DESC;

This obviously doesn't work. Next I see CASE being introduced incorrectly, using similar syntax:

SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY CASE @SortColumn 
    WHEN 'key_col'   THEN key_col
    WHEN 'object_id' THEN [object_id]
    WHEN 'name'      THEN name
    ... 
    END CASE @SortDirection WHEN 'ASC' THEN ASC ELSE DESC END;

This is closer, but it fails for two reasons. One is that CASE is an expression that returns exactly one value of a specific data type; this merges data types that are incompatible and therefore will break the CASE expression. The other is that there is no way to conditionally apply the sort direction this way without using dynamic SQL.

 

Approaches That Do Work

The three primary approaches I've seen are as follows:

Group compatible types and directions together

In order to use CASE with ORDER BY, there must be a distinct expression for each combination of compatible types and directions. In this case we would have to use something like this:

CREATE PROCEDURE dbo.Sort_CaseExpanded
  @SortColumn    NVARCHAR(128) = N'key_col',
  @SortDirection VARCHAR(4)    = 'ASC'
AS
BEGIN
  SET NOCOUNT ON;

  SELECT key_col, [object_id], name, type_desc, modify_date
  FROM dbo.sys_objects
  ORDER BY 
    CASE WHEN @SortDirection = 'ASC' THEN
      CASE @SortColumn 
        WHEN 'key_col'   THEN key_col
        WHEN 'object_id' THEN [object_id] 
      END
    END,
    CASE WHEN @SortDirection = 'DESC' THEN
      CASE @SortColumn 
        WHEN 'key_col'   THEN key_col
        WHEN 'object_id' THEN [object_id]
      END
    END DESC,
    CASE WHEN @SortDirection = 'ASC' THEN
      CASE @SortColumn 
        WHEN 'name'      THEN name
        WHEN 'type_desc' THEN type_desc 
      END
    END,
    CASE WHEN @SortDirection = 'DESC' THEN
      CASE @SortColumn 
        WHEN 'name'      THEN name
        WHEN 'type_desc' THEN type_desc 
      END
    END DESC,
    CASE WHEN @SortColumn = 'modify_date' 
      AND @SortDirection = 'ASC' THEN modify_date 
    END,
    CASE WHEN @SortColumn = 'modify_date' 
      AND @SortDirection = 'DESC' THEN modify_date 
    END DESC;
END

You might say, wow, that's an ugly bit of code, and I would agree with you. I think this is why a lot of folks cache their data on the front end and let the presentation tier deal with juggling it around in different orders. :-)

You can collapse this logic a little bit further by converting all the non-string types into strings that will sort correctly, e.g.

CREATE PROCEDURE dbo.Sort_CaseCollapsed
  @SortColumn    NVARCHAR(128) = N'key_col',
  @SortDirection VARCHAR(4)    = 'ASC'
AS
BEGIN
  SET NOCOUNT ON;

  SELECT key_col, [object_id], name, type_desc, modify_date
  FROM dbo.sys_objects
  ORDER BY 
    CASE WHEN @SortDirection = 'ASC' THEN
      CASE @SortColumn 
        WHEN 'key_col'     THEN RIGHT('000000000000' + RTRIM(key_col), 12)
        WHEN 'object_id'   THEN 
	  RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0') 
	   + REPLICATE('0', 23) + RTRIM([object_id]), 24)
        WHEN 'name'        THEN name
        WHEN 'type_desc'   THEN type_desc 
	WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
      END
    END,
    CASE WHEN @SortDirection = 'DESC' THEN
      CASE @SortColumn 
        WHEN 'key_col'     THEN RIGHT('000000000000' + RTRIM(key_col), 12)
        WHEN 'object_id'   THEN 
	  RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0') 
	   + REPLICATE('0', 23) + RTRIM([object_id]), 24)
        WHEN 'name'      THEN name
        WHEN 'type_desc' THEN type_desc 
	WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
    END
  END DESC;
END

Still, it's a pretty ugly mess, and you have to repeat the expressions twice to deal with the different sort directions. I would also suspect that using OPTION RECOMPILE on that query would prevent you from being stung by parameter sniffing. Except in the default case, it's not like the majority of the work being done here is going to be compilation.

Apply a rank using window functions

I discovered this neat trick from AndriyM, though it is most useful in cases where all of the potential ordering columns are of compatible types, otherwise the expression used for ROW_NUMBER() is equally complex. The most clever part is that in order to switch between ascending and descending order, we simply multiply the ROW_NUMBER() by 1 or -1. We can apply it in this situation as follows:

CREATE PROCEDURE dbo.Sort_RowNumber
  @SortColumn    NVARCHAR(128) = N'key_col',
  @SortDirection VARCHAR(4)    = 'ASC'
AS
BEGIN
  SET NOCOUNT ON;

  ;WITH x AS
  (
    SELECT key_col, [object_id], name, type_desc, modify_date,
      rn = ROW_NUMBER() OVER (
        ORDER BY CASE @SortColumn 
          WHEN 'key_col'     THEN RIGHT('000000000000' + RTRIM(key_col), 12)
          WHEN 'object_id'   THEN 
	    RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0') 
             + REPLICATE('0', 23) + RTRIM([object_id]), 24)
          WHEN 'name'        THEN name
          WHEN 'type_desc'   THEN type_desc 
          WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
      END
      ) * CASE @SortDirection WHEN 'ASC' THEN 1 ELSE -1 END
    FROM dbo.sys_objects
  )
  SELECT key_col, [object_id], name, type_desc, modify_date
  FROM x
  ORDER BY rn;
END
GO

Again, OPTION RECOMPILE can help here. Also, you might notice in some of these cases that ties are handled differently by the various plans – when ordering by name, for example, you will usually see key_col come through in ascending order within each set of duplicate names, but you may also see the values mixed up. To provide more predictable behavior in the event of ties, you can always add an additional ORDER BY clause. Note that if you were to add key_col to the first example, you'll need to make it an expression so that key_col is not listed in the ORDER BY twice (you can do this using key_col + 0, for example).

Dynamic SQL

A lot of people have reservations about dynamic SQL – it's impossible to read, it's a breeding ground for SQL injection, it leads to plan cache bloat, it defeats the purpose of using stored procedures… Some of these are simply untrue, and some of them are easy to mitigate. I've added some validation here that could just as easily be added to any of the above procedures:

CREATE PROCEDURE dbo.Sort_DynamicSQL
  @SortColumn    NVARCHAR(128) = N'key_col',
  @SortDirection VARCHAR(4)    = 'ASC'
AS
BEGIN
  SET NOCOUNT ON;

  -- reject any invalid sort directions:
  IF UPPER(@SortDirection) NOT IN ('ASC','DESC')
  BEGIN
    RAISERROR('Invalid parameter for @SortDirection: %s', 11, 1, @SortDirection);
    RETURN -1;
  END 

  -- reject any unexpected column names:
  IF LOWER(@SortColumn) NOT IN (N'key_col', N'object_id', N'name', N'type_desc', N'modify_date')
  BEGIN
    RAISERROR('Invalid parameter for @SortColumn: %s', 11, 1, @SortColumn);
    RETURN -1;
  END 

  SET @SortColumn = QUOTENAME(@SortColumn);

  DECLARE @sql NVARCHAR(MAX);

  SET @sql = N'SELECT key_col, [object_id], name, type_desc, modify_date
               FROM dbo.sys_objects
               ORDER BY ' + @SortColumn + ' ' + @SortDirection + ';';

  EXEC sp_executesql @sql;
END
 

Performance Comparisons

I created a wrapper stored procedure for each procedure above, so that I could easily test all scenarios. The four wrapper procedures look like this, with the procedure name varying of course:

CREATE PROCEDURE dbo.Test_Sort_CaseExpanded
AS
BEGIN
	SET NOCOUNT ON;

	EXEC dbo.Sort_CaseExpanded; -- default
	EXEC dbo.Sort_CaseExpanded N'name',        'ASC';
	EXEC dbo.Sort_CaseExpanded N'name',        'DESC';
	EXEC dbo.Sort_CaseExpanded N'object_id',   'ASC';
	EXEC dbo.Sort_CaseExpanded N'object_id',   'DESC';
	EXEC dbo.Sort_CaseExpanded N'type_desc',   'ASC';
	EXEC dbo.Sort_CaseExpanded N'type_desc',   'DESC';
	EXEC dbo.Sort_CaseExpanded N'modify_date', 'ASC';
	EXEC dbo.Sort_CaseExpanded N'modify_date', 'DESC';
END

And then using SQL Sentry Plan Explorer, I generated actual execution plans (and the metrics to go along with it) with the following queries, and repeated the process 10 times to sum up total duration:

DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
EXEC dbo.Test_Sort_CaseExpanded;
--EXEC dbo.Test_Sort_CaseCollapsed;
--EXEC dbo.Test_Sort_RowNumber;
--EXEC dbo.Test_Sort_DynamicSQL;
GO 10

I also tested the first three cases with OPTION RECOMPILE (doesn't make much sense for the dynamic SQL case, since we know it will be a new plan each time), and all four cases with MAXDOP 1 to eliminate parallelism interference. Here are the results:

Performance Results : Duration (milliseconds)

 

Conclusion

For outright performance, dynamic SQL wins every time (though only by a small margin on this data set). The ROW_NUMBER() approach, while clever, was the loser in each test (sorry AndriyM).

It gets even more fun when you want to introduce a WHERE clause, never mind paging. These three are like the perfect storm for introducing complexity to what starts out as a simple search query. The more permutations your query has, the more likely you'll want to throw readability out the window and use dynamic SQL in combination with the "optimize for ad hoc workloads" setting to minimize the impact of single-use plans in your plan cache.