Aaron Bertrand

Performance Surprises and Assumptions : DATEADD

SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Jonathan Kehayias is a Principal Consultant with SQLskills and the youngest MCM ever.

Jonathan’s Posts

Back in 2013, I wrote about a bug in the optimizer where the 2nd and 3rd arguments to DATEDIFF() can be swapped – which can lead to incorrect row count estimates and, in turn, poor execution plan selection:

This past weekend, I learned about a similar situation, and made the immediate assumption that it was the same problem. After all, the symptoms seemed nearly identical:

  1. There was a date/time function in the WHERE clause.
    • This time it was DATEADD() instead of DATEDIFF().
  2. There was an obviously incorrect row count estimate of 1, compared to an actual row count of over 3 million.
    • This was actually an estimate of 0, but SQL Server always rounds up such estimates to 1.
  3. A poor plan selection was made (in this case, a loop join was chosen) due to the low estimate.

The offending pattern looked like this:

WHERE [datetime2(7) column] >= DATEADD(DAY, -365, SYSUTCDATETIME());

The user tried several variations, but nothing changed; they eventually managed to work around the problem by changing the predicate to:

WHERE DATEDIFF(DAY, [column], SYSUTCDATETIME()) <= 365;

This got a better estimate (the typical 30% inequality guess); so not quite right. And while it eliminated the loop join, there are two major problems with this predicate:

  1. It is not the same query, since it is now looking for 365 day boundaries to have passed, as opposed to being greater than a specific point in time 365 days ago. Statistically significant? Maybe not. But sill, technically, not the same.
  2. Applying the function against the column makes the entire expression non-sargable - leading to a full scan. When the table only contains a little over a year of data, this is not a big deal, but as the table gets larger, or the predicate becomes narrower, this will become a problem.

Again, I jumped to the conclusion that the DATEADD() operation was the problem, and recommended an approach that didn't rely on DATEADD() - building a datetime from all of the parts of the current time, allowing me to subtract a year without using DATEADD():

WHERE [column] >= DATETIMEFROMPARTS(
      DATEPART(YEAR,   SYSUTCDATETIME())-1, 
      DATEPART(MONTH,  SYSUTCDATETIME()),
      DATEPART(DAY,    SYSUTCDATETIME()),
      DATEPART(HOUR,   SYSUTCDATETIME()), 
      DATEPART(MINUTE, SYSUTCDATETIME()),
      DATEPART(SECOND, SYSUTCDATETIME()), 0);

In addition to being bulky, this had some problems of its own, namely that a bunch of logic would have to be added to properly account for leap years. First, so that it doesn't fail if it happens to run on February 29th, and second, to include exactly 365 days in all cases (instead of 366 during the year following a leap day). Easy fixes, of course, but they make the logic much uglier - especially because the query needed to exist inside a view, where intermediate variables and multiple steps are not possible.

In the meantime, the OP filed a Connect item, dismayed by the 1-row estimate:

Then Paul White (@SQL_Kiwi) came along and, like many times before, shed some additional light onto the problem. He shared a related Connect item filed by Erland Sommarskog back in 2011:

Essentially, the problem is that a poor estimate can be made not simply when SYSDATETIME() (or SYSUTCDATETIME()) appears, as Erland originally reported, but when any datetime2 expression is involved in the predicate (and perhaps only when DATEADD() is also used). And it can go both ways - if we swap >= for <=, the estimate becomes the whole table, so it seems that the optimizer is looking at the SYSDATETIME() value as a constant, and completely ignoring any operations like DATEADD() that are performed against it.

Paul shared that the workaround is simply to use a datetime equivalent when calculating the date, before converting it to the proper data type. In this case, we can swap out SYSUTCDATETIME() and change it to GETUTCDATE():

WHERE [column] >= CONVERT(datetime2(7), DATEADD(DAY, -365, GETUTCDATE()));

Yes, this results in a tiny loss of precision, but so could a dust particle slowing down your finger on its way to pressing the F5 key. The important thing is that a seek could still be used and the estimates were correct - almost perfect, in fact:

DATEADD/DATEDIFF estimates vs. actuals

The reads are similar because the table contains data almost exclusively from the past year, so even a seek becomes a range scan of most of the table. The row counts are not identical because (a) the second query cuts off at midnight and (b) the third query includes an extra day of data due to the leap day earlier this year. In any case, this still demonstrates how we can get closer to proper estimates by eliminating DATEADD(), but the proper fix is to remove the direct combination of DATEADD() and datetime2.

To further illustrate how the estimates are getting it wrong, you can see that if we pass different arguments and directions to the original query and Paul's re-write, the number of estimated rows for the former is always based on the current time - they don't change with the number of days passed in (whereas Paul's is relatively accurate every time):

Estimates vs. actuals for two approachesActual rows for the first query are slightly lower because this was executed after a lengthy nap

The estimates won't always be this good; my table just has relatively stable distribution. I populated it with the following query and then updated statistics with fullscan, in case you want to try this out on your own:

-- OP's table definition:
CREATE TABLE dbo.DateaddRepro 
(
  SessionId  int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
  CreatedUtc datetime2(7) NOT NULL DEFAULT SYSUTCDATETIME()
);
GO
 
CREATE NONCLUSTERED INDEX [IX_User_Session_CreatedUtc]
ON dbo.DateaddRepro(CreatedUtc) INCLUDE (SessionId);
GO
 
INSERT dbo.DateaddRepro(CreatedUtc)
SELECT dt FROM 
(
  SELECT TOP (3150000) dt = DATEADD(HOUR, (s1.[precision]-ROW_NUMBER()
    OVER (PARTITION BY s1.[object_id] ORDER BY s2.[object_id])) / 15, GETUTCDATE())
  FROM sys.all_columns AS s1 CROSS JOIN sys.all_objects AS s2
) AS x;
 
UPDATE STATISTICS dbo.DateaddRepro WITH FULLSCAN;

SELECT DISTINCT SessionId FROM dbo.DateaddRepro 
WHERE /* pick your WHERE clause to test */;

I commented on the new Connect item, and will likely go back and touch up my Stack Exchange answer.

The moral of the story

Try to avoid combining DATEADD() with expressions that yield datetime2, especially on older versions of SQL Server (this was on SQL Server 2012). It can also be a problem, even on SQL Server 2016, when using the older cardinality estimation model (due to lower compatibility level, or explicit use of trace flag 9481). Problems like this are subtle and not always immediately obvious, so hopefully this serves as a reminder (maybe even for me the next time I come across a similar scenario). Like I suggested in the last post, if you have query patterns like this, check that you are getting correct estimates, and make a note somewhere to check them again whenever anything major changes in the system (like an upgrade or a service pack).