Aaron Bertrand

FORMAT() is nice and all, but…

June 29, 2015 by in T-SQL Queries | 24 Comments
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul White is an independent SQL Server consultant specializing in performance tuning, execution plans, and the query optimizer.

Paul’s Posts

Back when SQL Server 2012 was still in beta, I blogged about the new FORMAT() function:     SQL Server v.Next (Denali) : CTP3 T-SQL Enhancements : FORMAT().

At that time, I was so excited about the new functionality, that I didn't even think to do any performance testing. I did address this in a more recent blog post, but solely in the context of stripping time from a datetime:     Trimming time from datetime – a follow-up.

Last week, my good friend Jason Horner (blog | @jasonhorner) trolled me with these tweets:

My issue with this is just that FORMAT() looks convenient, but it is extremely inefficient compared to other approaches (oh and that AS VARCHAR thing is bad too). If you're doing this onesy-twosy and for small resultsets, I wouldn't worry too much about it; but at scale, it can get pretty expensive. Let me illustrate with an example. First, let's create a small table with 1000 pseudo-random dates:

SELECT TOP (1000) d = DATEADD(DAY, CHECKSUM(NEWID())%1000, o.create_date)
  INTO dbo.dtTest
  FROM sys.all_objects AS o
  ORDER BY NEWID();
GO
CREATE CLUSTERED INDEX d ON dbo.dtTest(d);

Now, let's prime the cache with the data from this table, and illustrate three of the common ways people tend to present just the time:

SELECT d, 
  CONVERT(DATE, d), 
  CONVERT(CHAR(10), d, 120),
  FORMAT(d, 'yyyy-MM-dd')
FROM dbo.dtTest;

Now, let's perform individual queries that use these different techniques. We'll run them each 5 times and we'll run the following variations:

  1. Selecting all 1,000 rows
  2. Selecting TOP (1) ordered by the clustered index key
  3. Assigning to a variable (which forces a full scan, but prevents SSMS rendering from interfering with performance)

Here is the script:

-- select all 1,000 rows
GO
SELECT d FROM dbo.dtTest;
GO 5
SELECT d = CONVERT(DATE, d) FROM dbo.dtTest;
GO 5
SELECT d = CONVERT(CHAR(10), d, 120) FROM dbo.dtTest;
GO 5
SELECT d = FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest;
GO 5

-- select top 1
GO
SELECT TOP (1) d FROM dbo.dtTest ORDER BY d;
GO 5
SELECT TOP (1) CONVERT(DATE, d) FROM dbo.dtTest ORDER BY d;
GO 5
SELECT TOP (1) CONVERT(CHAR(10), d, 120) FROM dbo.dtTest ORDER BY d;
GO 5
SELECT TOP (1) FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest ORDER BY d;
GO 5

-- force scan but leave SSMS mostly out of it
GO
DECLARE @d DATE;
SELECT @d = d FROM dbo.dtTest;
GO 5
DECLARE @d DATE;
SELECT @d = CONVERT(DATE, d) FROM dbo.dtTest;
GO 5
DECLARE @d CHAR(10);
SELECT @d = CONVERT(CHAR(10), d, 120) FROM dbo.dtTest;
GO 5
DECLARE @d CHAR(10);
SELECT @d = FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest;
GO 5

Now, we can measure the performance with the following query (my system is pretty quiet; on yours, you may need to perform more advanced filtering than just execution_count):

SELECT 
  [t] = CONVERT(CHAR(255), t.[text]), 
  s.total_elapsed_time, 
  avg_elapsed_time = CONVERT(DECIMAL(12,2),s.total_elapsed_time / 5.0),
  s.total_worker_time, 
  avg_worker_time = CONVERT(DECIMAL(12,2),s.total_worker_time / 5.0),
  s.total_clr_time
FROM sys.dm_exec_query_stats AS s 
CROSS APPLY sys.dm_exec_sql_text(s.[sql_handle]) AS t
WHERE s.execution_count = 5
  AND t.[text] LIKE N'%dbo.dtTest%'
ORDER BY s.last_execution_time;

Results in my case were fairly consistent:

Query (truncated) Duration (microseconds)
total_elapsed avg_elapsed total_clr
SELECT 1,000 rows SELECT d FROM dbo.dtTest ORDER BY d; 1,170 234.00 0
SELECT d = CONVERT(DATE, d) FROM dbo.dtTest ORDER BY d; 2,437 487.40 0
SELECT d = CONVERT(CHAR(10), d, 120) FROM dbo.dtTest ORD ... 151,521 30,304.20 0
SELECT d = FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest ORDER ... 240,152 48,030.40 107,258
SELECT TOP (1) SELECT TOP (1) d FROM dbo.dtTest ORDER BY d; 251 50.20 0
SELECT TOP (1) CONVERT(DATE, d) FROM dbo.dtTest ORDER BY ... 440 88.00 0
SELECT TOP (1) CONVERT(CHAR(10), d, 120) FROM dbo.dtTest ... 301 60.20 0
SELECT TOP (1) FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest O ... 1,094 218.80 589
Assign variable DECLARE @d DATE; SELECT @d = d FROM dbo.dtTest; 639 127.80 0
DECLARE @d DATE; SELECT @d = CONVERT(DATE, d) FROM dbo.d ... 644 128.80 0
DECLARE @d CHAR(10); SELECT @d = CONVERT(CHAR(10), d, 12 ... 1,972 394.40 0
DECLARE @d CHAR(10); SELECT @d = FORMAT(d, 'yyyy-MM-dd') ... 118,062 23,612.40 98,556

 

And to visualize the avg_elapsed_time output (click to enlarge):

FORMAT() is clearly the loser : avg_elapsed_time results (microseconds)FORMAT() is clearly the loser : avg_elapsed_time results (microseconds)

What we can learn from these results (again):

  1. First and foremost, FORMAT() is expensive.
  2. FORMAT() can, admittedly, provide more flexibility and give more intuitive methods that are consistent with those in other languages like C#. However, in addition to its overhead, and while CONVERT() style numbers are cryptic and less exhaustive, you may have to use the older approach anyway, since FORMAT() is only valid in SQL Server 2012 and newer.
  3. Even the standby CONVERT() method can be drastically expensive (though only severely so in the case where SSMS had to render the results - it clearly handles strings differently than date values).
  4. Just pulling the datetime value directly out of the database was always most efficient. You should profile what additional time it takes for your application to format the date as desired at the presentation tier - it's highly likely that you're not going to want SQL Server to get involved with prettying format at all (and in fact many would argue that this is where that logic always belongs).

We're only talking microseconds here, but we're also only talking 1,000 rows. Scale that out to your actual table sizes, and the impact of choosing the wrong formatting approach could be devastating.

If you want to try out this experiment on your own machine, I've uploaded a sample script:     FormatIsNiceAndAllBut.sql_.zip