Aaron Bertrand

Another argument for stored procedures

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Itzik is a T-SQL trainer, a co-founder of SolidQ, and blogs about T-SQL fundamentals and query tuning.

Itzik’s Posts

This is one of those religious/political debates that has been raging for years: should I use stored procedures, or should I put ad hoc queries in my application? I have always been a proponent of stored procedures, for a few reasons:

  1. I can't implement SQL injection protections if the query is constructed in the application code. The developers may be aware of parameterized queries but nothing is forcing them to use them properly.
  2. I can't tune a query that's embedded in application source code, nor can I enforce any best practices.
  3. If I do find an opportunity for query tuning, in order to deploy it, I have to re-compile and re-deploy the application code, as opposed to just changing the stored procedure.
  4. If the query is used in multiple places in the application, or in multiple applications, and it requires a change, I have to change it in multiple places, whereas with a stored procedure I only have to change it once (deployment issues aside).

I also see that a lot of people are ditching stored procedures in favor of ORMs. For simple applications this will probably go okay, but as your application gets more complex, you are likely to find that your ORM of choice is simply incapable of performing certain query patterns, *forcing* you to use a stored procedure. If it supports stored procedures, that is.

While I still find all of these arguments pretty compelling, they're not what I want to talk about today; I want to talk about performance.

A lot of arguments out there will simply say, "stored procedures perform better!" That may have been marginally true at some point, but since SQL Server added the ability to compile at the statement level rather than the object level, and has acquired powerful functionality like optimize for ad hoc workloads, this is no longer a very strong argument. Index tuning and sensible query patterns have a much greater impact on performance than choosing to use a stored procedure ever will; on modern versions, I doubt you will find many cases where the exact same query exhibits noticeable performance differences, unless you are also introducing other variables (such as running a procedure locally vs. an application in a different data center on a different continent).

That said, there is a performance aspect that is often overlooked when dealing with ad hoc queries: the plan cache. We can use optimize for ad hoc workloads to prevent single-use plans from filling up our cache (Kimberly Tripp (@KimberlyLTripp) of SQLskills.com has some great information about this here), and that affects single-use plans regardless of whether the queries are run from within a stored procedure or are run ad hoc. A different impact you might not notice, regardless of this setting, is when identical plans take up multiple slots in the cache because of differences in SET options or minor deltas in the actual query text. The whole "slow in the application, fast in SSMS" phenomenon has helped a lot of people resolve issues involving settings like SET ARITHABORT. Today I wanted to talk about query text differences and demonstrate something that surprises people every time I bring it up.

Cache to burn

Let's say we have a very simple system running AdventureWorks2012. And just to prove that it doesn't help, we have enabled optimize for ad hoc workloads:

EXEC sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'optimize for ad hoc workloads', 1;
GO
RECONFIGURE WITH OVERRIDE;

And then free the plan cache:

DBCC FREEPROCCACHE;

Now we generate a few simple variations to a query that is otherwise identical. These variations can potentially represent coding styles for two different developers – slight differences in white space, upper/lower case, etc.

SELECT TOP (1) SalesOrderID, OrderDate, SubTotal
FROM Sales.SalesOrderHeader
WHERE SalesOrderID >= 75120
ORDER BY OrderDate DESC;
GO

-- change >= 75120 to > 75119 (same logic since it's an INT)
GO

SELECT TOP (1) SalesOrderID, OrderDate, SubTotal
FROM Sales.SalesOrderHeader
WHERE SalesOrderID > 75119
ORDER BY OrderDate DESC;
GO

-- change the query to all lower case
GO

select top (1) salesorderid, orderdate, subtotal
from sales.salesorderheader
where salesorderid > 75119
order by orderdate desc;
GO

-- remove the parentheses around the argument for top
GO

select top 1 salesorderid, orderdate, subtotal
from sales.salesorderheader
where salesorderid > 75119
order by orderdate desc;
GO 

-- add a space after top 1
GO

select top 1  salesorderid, orderdate, subtotal
from sales.salesorderheader
where salesorderid > 75119
order by orderdate desc;
GO

-- remove the spaces between the commas
GO

select top 1  salesorderid,orderdate,subtotal
from sales.salesorderheader
where salesorderid > 75119
order by orderdate desc;
GO

If we run that batch once, and then check the plan cache, we see that we have 6 copies of, essentially, the exact same execution plan. This is because the query text is binary hashed, meaning case and white space do make a difference and can make otherwise identical queries look unique to SQL Server.

SELECT [text], size_in_bytes, usecounts, cacheobjtype
FROM sys.dm_exec_cached_plans AS p
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
WHERE LOWER(t.[text]) LIKE '%ales.sales'+'orderheader%';

Results:

text size_in_bytes usecounts cacheobjtype
select top 1 salesorderid,o… 272 1 Compiled Plan Stub
select top 1 salesorderid, … 272 1 Compiled Plan Stub
select top 1 salesorderid, o… 272 1 Compiled Plan Stub
select top (1) salesorderid,… 272 1 Compiled Plan Stub
SELECT TOP (1) SalesOrderID,… 272 1 Compiled Plan Stub
SELECT TOP (1) SalesOrderID,… 272 1 Compiled Plan Stub

Results after first execution of "identical" queries

So, this isn't altogether wasteful, since the ad hoc setting has allowed SQL Server to only store small stubs on first execution. If we run the batch again though (without freeing the procedure cache), we see a slightly more alarming result:

text size_in_bytes usecounts cacheobjtype
select top 1 salesorderid,o… 49,152 1 Compiled Plan
select top 1 salesorderid, … 49,152 1 Compiled Plan
select top 1 salesorderid, o… 49,152 1 Compiled Plan
select top (1) salesorderid,… 49,152 1 Compiled Plan
SELECT TOP (1) SalesOrderID,… 49,152 1 Compiled Plan
SELECT TOP (1) SalesOrderID,… 49,152 1 Compiled Plan

Results after second execution of "identical" queries

The same thing happens for parameterized queries, regardless of whether parameterization is simple or forced. And the same thing happens when the ad hoc setting is not enabled, except that it happens sooner.

The net result is that this can produce a lot of plan cache bloat, even for queries that look identical – all the way down to two queries where one developer indents with a tab and the other indents with 4 spaces. I don't have to tell you that trying to enforce this type of consistency across a team can be anywhere from tedious to impossible. So in my mind this gives a strong nod to modularizing, giving in to DRY, and centralizing this type of query into a single stored procedure.

A caveat

Of course, if you place this query in a stored procedure, you're only going to have one copy of it, so you entirely avoid the potential for having multiple versions of the query with slightly different query text. Now, you could also argue that different users might create the same stored procedure with different names, and in each stored procedure there is a slight variation of the query text. While possible, I think that represents an entirely different problem. :-)