This is one of those religious/political debates that has been raging for years: should I use stored procedures, or should I put ad hoc queries in my application? I have always been a proponent of stored procedures, for a few reasons:
- I can't implement SQL injection protections if the query is constructed in the application code. The developers may be aware of parameterized queries but nothing is forcing them to use them properly.
- I can't tune a query that's embedded in application source code, nor can I enforce any best practices.
- If I do find an opportunity for query tuning, in order to deploy it, I have to re-compile and re-deploy the application code, as opposed to just changing the stored procedure.
- If the query is used in multiple places in the application, or in multiple applications, and it requires a change, I have to change it in multiple places, whereas with a stored procedure I only have to change it once (deployment issues aside).
I also see that a lot of people are ditching stored procedures in favor of ORMs. For simple applications this will probably go okay, but as your application gets more complex, you are likely to find that your ORM of choice is simply incapable of performing certain query patterns, *forcing* you to use a stored procedure. If it supports stored procedures, that is.
While I still find all of these arguments pretty compelling, they're not what I want to talk about today; I want to talk about performance.
A lot of arguments out there will simply say, "stored procedures perform better!" That may have been marginally true at some point, but since SQL Server added the ability to compile at the statement level rather than the object level, and has acquired powerful functionality like optimize for ad hoc workloads
, this is no longer a very strong argument. Index tuning and sensible query patterns have a much greater impact on performance than choosing to use a stored procedure ever will; on modern versions, I doubt you will find many cases where the exact same query exhibits noticeable performance differences, unless you are also introducing other variables (such as running a procedure locally vs. an application in a different data center on a different continent).
That said, there is a performance aspect that is often overlooked when dealing with ad hoc queries: the plan cache. We can use optimize for ad hoc workloads
to prevent single-use plans from filling up our cache (Kimberly Tripp (@KimberlyLTripp) of SQLskills.com has some great information about this here), and that affects single-use plans regardless of whether the queries are run from within a stored procedure or are run ad hoc. A different impact you might not notice, regardless of this setting, is when identical plans take up multiple slots in the cache because of differences in SET
options or minor deltas in the actual query text. The whole "slow in the application, fast in SSMS" phenomenon has helped a lot of people resolve issues involving settings like SET ARITHABORT
. Today I wanted to talk about query text differences and demonstrate something that surprises people every time I bring it up.
Cache to burn
Let's say we have a very simple system running AdventureWorks2012. And just to prove that it doesn't help, we have enabled optimize for ad hoc workloads
:
EXEC sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'optimize for ad hoc workloads', 1;
GO
RECONFIGURE WITH OVERRIDE;
And then free the plan cache:
DBCC FREEPROCCACHE;
Now we generate a few simple variations to a query that is otherwise identical. These variations can potentially represent coding styles for two different developers – slight differences in white space, upper/lower case, etc.
SELECT TOP (1) SalesOrderID, OrderDate, SubTotal
FROM Sales.SalesOrderHeader
WHERE SalesOrderID >= 75120
ORDER BY OrderDate DESC;
GO
-- change >= 75120 to > 75119 (same logic since it's an INT)
GO
SELECT TOP (1) SalesOrderID, OrderDate, SubTotal
FROM Sales.SalesOrderHeader
WHERE SalesOrderID > 75119
ORDER BY OrderDate DESC;
GO
-- change the query to all lower case
GO
select top (1) salesorderid, orderdate, subtotal
from sales.salesorderheader
where salesorderid > 75119
order by orderdate desc;
GO
-- remove the parentheses around the argument for top
GO
select top 1 salesorderid, orderdate, subtotal
from sales.salesorderheader
where salesorderid > 75119
order by orderdate desc;
GO
-- add a space after top 1
GO
select top 1 salesorderid, orderdate, subtotal
from sales.salesorderheader
where salesorderid > 75119
order by orderdate desc;
GO
-- remove the spaces between the commas
GO
select top 1 salesorderid,orderdate,subtotal
from sales.salesorderheader
where salesorderid > 75119
order by orderdate desc;
GO
If we run that batch once, and then check the plan cache, we see that we have 6 copies of, essentially, the exact same execution plan. This is because the query text is binary hashed, meaning case and white space do make a difference and can make otherwise identical queries look unique to SQL Server.
SELECT [text], size_in_bytes, usecounts, cacheobjtype
FROM sys.dm_exec_cached_plans AS p
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
WHERE LOWER(t.[text]) LIKE '%ales.sales'+'orderheader%';
Results:
text | size_in_bytes | usecounts | cacheobjtype |
---|---|---|---|
select top 1 salesorderid,o… | 272 | 1 | Compiled Plan Stub |
select top 1 salesorderid, … | 272 | 1 | Compiled Plan Stub |
select top 1 salesorderid, o… | 272 | 1 | Compiled Plan Stub |
select top (1) salesorderid,… | 272 | 1 | Compiled Plan Stub |
SELECT TOP (1) SalesOrderID,… | 272 | 1 | Compiled Plan Stub |
SELECT TOP (1) SalesOrderID,… | 272 | 1 | Compiled Plan Stub |
Results after first execution of "identical" queries
So, this isn't altogether wasteful, since the ad hoc setting has allowed SQL Server to only store small stubs on first execution. If we run the batch again though (without freeing the procedure cache), we see a slightly more alarming result:
text | size_in_bytes | usecounts | cacheobjtype |
---|---|---|---|
select top 1 salesorderid,o… | 49,152 | 1 | Compiled Plan |
select top 1 salesorderid, … | 49,152 | 1 | Compiled Plan |
select top 1 salesorderid, o… | 49,152 | 1 | Compiled Plan |
select top (1) salesorderid,… | 49,152 | 1 | Compiled Plan |
SELECT TOP (1) SalesOrderID,… | 49,152 | 1 | Compiled Plan |
SELECT TOP (1) SalesOrderID,… | 49,152 | 1 | Compiled Plan |
Results after second execution of "identical" queries
The same thing happens for parameterized queries, regardless of whether parameterization is simple or forced. And the same thing happens when the ad hoc setting is not enabled, except that it happens sooner.
The net result is that this can produce a lot of plan cache bloat, even for queries that look identical – all the way down to two queries where one developer indents with a tab and the other indents with 4 spaces. I don't have to tell you that trying to enforce this type of consistency across a team can be anywhere from tedious to impossible. So in my mind this gives a strong nod to modularizing, giving in to DRY, and centralizing this type of query into a single stored procedure.
A caveat
Of course, if you place this query in a stored procedure, you're only going to have one copy of it, so you entirely avoid the potential for having multiple versions of the query with slightly different query text. Now, you could also argue that different users might create the same stored procedure with different names, and in each stored procedure there is a slight variation of the query text. While possible, I think that represents an entirely different problem. :-)
Great stuff. Two more wrinkles to reducing plan cache bloat that I want to mention.
1) Plan cache bloat can also represent excessive compile work. Especially in SQL Server 2012, where escalated spinlock contention has been observed (primarily due to recent increases in cores per socket, I believe), this can be another significant performance drag. As you mentioned, 'optimize for adhoc' can help. Forced parameterization may help, too.
2) If the system is already negatively impacted by imbalanced use of the separate NUMA node buffer pools, I believe that plan cache bloat can exacerbate the trouble. Not sure about SQL Server 2012, but in previous version, although database cache, query memory, and partitioned locks were present in each of the NUMA node bpools, nonpartitioned locks and plan cache were in buffer node 0. Bloated plan cache would then lead to even smaller database cache in buffer node 0, driving down its PLE and pushing up physical reads into its database cache as contents expire before their next reference.
So, if a plan cache bloat issue is detected: consider 'optimize for adhoc' or 'forced parameterization'; consider means of making app code and queries more cache friendly; if on a NUMA server, consider disabling SQL Server NUMA detection with trace flag 8015 (and removing spinlock contention with trace flag 8048).
Its interesting the query_plan_hash and query_hash in sys.dm_exec_query_stats shows that the engine knows that 5 of the 6 queries text and plans are the same. It's a shame that the engine doesn't use them for eliminating duplicates.
Thanks Martin, yes it's true that the engine *could* make better use of these plans. I was going to dig into more detail in a follow-up at some point, spoiler. :-)
I don't buy that the same query written differently is a common occurrence. That would mean that the application contains significant amount of duplicated code which is a thing that developers avoid. They might use a query in multiple places but store the SQL string in a global variable or constant.
SET options are a more valid concern, I think. On the other hand the best practice seems to be: leave them at their defaults (IOW don't use them for application development). That makes the problem go away.
So we could either use sprocs, or not have duplicated queries and sane SET option values. Both approaches would solve the problem.
Using the sys2.plan_cache_size available here http://sys2dmvs.codeplex.com/ is possible to exactly see how much of the plan cache is being reused and how much is not
I tend to agree with this. The weak point presented here is not the use of ad-hoc queries, but the underlying application issuing 6 different versions of the same SQL statement. Suggesting the use of stored procedures to harmonize the code to one statement is flawed because this can be achieved using good programming practice in the application, without having to maintain additional assets in the database.
Unfortunately I've been on the wrong end of several clean-up scenarios where this is exactly what happened. You can preach about best practices all you want, but folks still tend to do the wrong thing because it's easier. And still even if the developers of two individual apps that share functionality are following best practices by not repeating the same code within each of their apps, there isn't much hope in getting them both to follow the *exact* same syntax conventions (or go out of their way to develop a shared middle tier, particularly if the apps are released on different schedules). If you have a very disciplined team then maybe stored procedures don't fix anything, but they don't break anything either, and they can come in handy when your team is less disciplined.
Another case in point where all the best practice preaching in the world has done little to improve the situation: parameterized queries. How many applications out there are *still* vulnerable to SQL injection?
As a developer who also wears the DBA hat a lot of this appears to boil to down to division of responsibilities and exertion of control? An application containing the same query text multiple times is poorly coded, the same would apply to duplicate stored procedures… One point I rarely see mentioned is the situation of deploying a database to many installations, in this case managing stored procedures with allowance for backwards compatibility strikes me as much pain for little gain? In a larger team would engaging a DBA in code review alleviate the developer problems identifier?
Well, I didn't mean for this to get too deep into policies and procedures, because not every environment has the ability to do this. But are you suggesting that a DBA reviewing two different applications 6 months apart is going to catch subtle differences in the text of two "identical" queries, like 3 spaces instead of 4 or a carriage return in a different spot? I'm merely suggesting that having the same ad hoc query repeated can lead to this symptom, and a substantial number of people out there are not even aware of the symptom, never mind proactively protect themselves from it. If you know about it and your team is disciplined enough to subvert it, and your DBA is willing to review application code, then by all means, go ahead and keep compiling your ad hoc code into your applications. If not, I am also not saying you must use stored procedures, but it *is* an argument in their favor.
I'm working with a team right now that consists of the kind of developer that needs to be forced into good practices. I hear you.
Aaron, thanks, this is a great informative article, but may miss a few key points I think.
This is a debate that is pretty much continual in my organisation, I am a developer/application architect, and have these sorts of discussions with DBA's all the time.
They key point I want to put across is that as a Developer I have no real issue with stored procedures, apart from the overhead and time they cost me to develop, deploy, maintain etc.
Today’s ORM's simplify my live as a developer by allowing me to mainly worry about my code and data model/table structure and not the other 500-600 stored procedure database objects that need to be managed. You are correct in stating that may ORM's cannot handle complex queries, but I would rather deal with those instances on a case by case basis, as I suspect they would only be very minor in terms of percentage across all SQL calls from my application (5%?).
Your examples on SQL statement differences are all valid, but again they are negated by todays ORM's that generate the SQL for us. They will generate the same SQL every time (upper lower case spaces, parentheses etc) from the same application code.
I am getting my DBA to run your tests on parameterised queries where only the variable changes, to see if your example still holds under these conditions.
Again thanks for the great article, is definitely a good read.
If only the variable changes, then no, parameterization should work correctly and you won't have wasted plans in the cache. Depending on how code is passed in, as well as parameterization and other settings, though, you may see different issues; for example, do a search for parameter sniffing.
I don't think I missed any key points, I just stated the difficulties with dealing with ad hoc queries scattered across application code from a DBA's perspective, not a developer's.
About the ORM, what you state is true, when you use the same ORM, in fact the same version of the ORM, for a single application. What about multiple applications that use multiple ORMs or multiple versions of an ORM? I still think it's worth pointing out that slight differences in formatting *can* yield wasteful plan cache bloat. Does everyone on earth need to drop everything and worry about this? Of course not. But based on the discussions I've had out in the field, that number is certainly non-zero.
No worries at all mate, I totaly get where you are comming from, I would say that I as a developer would be much more worried than DBA's if I had applications doing ad hoc queries scattered across application code.
Mutiple applications using different ORM's or ORM versions hitting the same applciation database would also be a huge worry for me as a developer, let alone the DBA's. That is what SOA is all about, only having a single point of access.
Anyway we digress, the article is a good one, well done.
Another important feature of ORMs is that they allow developers to compose queries from reusable parts in a typesafe and clean way. It is like concatenating SQL strings, but type-safe, injection free and in a productive programming language. It is like "temp views" that can be declared ad-hoc, stored in variables and passed around.
I fall into the "do it with procs" camp.
In addition to the points above another benefit is viewing dependencies (through SSMS, sp_depends, or DMVs) This helps when making changes to tables/views as it gives you a heads-up as to where else code changes need to be made.
Security might be a consideration too: I've worked on projects where it was mandated that the table structure was neither accessible nor visible to the middle-tier. We had to write procs for all CRUD operations and use "with execute as" It was time consuming (especially for simple SELECTs), but increased maintainability.