Note: This post was originally published only in our eBook, High Performance Techniques for SQL Server, Volume 3. You can find out about our eBooks here.
Over three years ago, I wrote a post on cursor options in SQL Server, and why you should override the defaults:
I wanted to post a follow-up to reiterate that – while you should never just accept the defaults – you really should think about which options are most applicable to your scenario. I also wanted to clarify a few items that came up in the comments on that post.
Andrew Kelly brought up a great point, and that is that a STATIC
cursor makes a one-time copy of the results, puts them into tempdb, and then avoids any concurrency issues that may impact a DYNAMIC
cursor. One option isn't a clear winner over the other in all cases; for example, you may have a lot of cursors (or cursors with very large resultsets) and/or an already over-burdened tempdb and don't want to offload any additional stress there. But it is something worthy of testing.
Fabiano also brought up a great point that both DYNAMIC
and FAST_FORWARD
cursors can be vulnerable to the Hallowe'en problem (discussed by Paul White in a 4-part series, starting here). Paul also commented that FAST_FORWARD
might not be vulnerable to the problem, depending on whether the optimizer chose a static or dynamic plan (Microsoft's Marc Friedman goes into great detail about that here).
Finally, I wanted to point out that not all default cursors are created equal. I ran some tests and checked how SQL Server decided to set cursor options under a variety of scenarios (validated using the sys.dm_exec_cursors
dynamic management function). The code is pretty simple:
DECLARE c CURSOR FOR [...blah blah...];
SELECT properties FROM sys.dm_exec_cursors(@@SPID);
Here are the results for the scenarios I tested:
Cursor query is based on… | Type | Concurrency | Scope |
---|---|---|---|
a constant (FOR SELECT 1 or FOR SELECT SYSDATETIME() ) |
Snapshot | Read Only | Global |
a #temp / ##temp table | Dynamic | Optimistic | Global |
a user table / view | Dynamic | Optimistic | Global |
a catalog view / DMV | Snapshot | Read Only | Global |
a join #tmp -> user table / view | Dynamic | Optimistic | Global |
a join #tmp -> catalog view / DMV | Snapshot | Read Only | Global |
a join user table / view -> catalog view / DMV | Snapshot | Read Only | Global |
Credit where credit is due – this investigation was triggered by an answer from Jeroen Mostert on Stack Overflow.
So you should be aware that the default options for your cursor, if you don't override them, may be different depending on the query underlying the cursor. If you are expecting a specific behavior in any or all cases, get in the habit of explicitly specifying the options you want.
But really, the point is…
…stop using cursors. There really are very few problems today where the best solution is a cursor, especially if you are on SQL Server 2012 or better – where just about every problem traditionally solved by cursors can be solved using enhancements to window functions. If you still feel you need to use cursors, please take the advice in this post and its predecessor to determine which options you should use.
In the companies I worked for cursors were used for debugging reasons / to break down a gigant task into smaller pieces.
If you have e.g. to create invoices for 10,000 customers and have a very complicated business logic (with tons of conditions) it may be VERY hard to create a single statment that works for every customer in all circumstances. And even if you find such a statement (and it is fast enough) it is very hard to maintain it. In such cases it may be much easier to break down the process and work on a single customer basis and create the invoices in a big cursor.
"If you have e.g. to create invoices for 10,000 customers and have a very complicated business logic (with tons of conditions) it may be VERY hard to create a single statment that works for every customer in all circumstances."
So don't. Create a set of statements, each targeted to a specific subset of customers with similar conditions.
Using a cursor is committing to using a minimum of four declared statements just for cursor management, and however many procedural conditional statements you have, plus the final commit statement, repeated for every single row in your recordset.
So while in simple cases you often can replace all of those statements with just one, it doesn't mean you have to consider only one statement in complex cases. Even a dozen or two well-written set-based statements are likely to perform better than a single cursor iterating through tens of thousands of rows and running procedural logic on each one individually.
Multiple set-based statements also allow for logical grouping of the conditions, making it almost as easy if not easier to maintain than a cursor-based solution.
@Thomas Franz,
While I agree that the particular case of invoicing customers may be more appropriately coded as "one invoice at a time" (especially for "real time" applications), "Set-Based" does not mean "All in one query". It's frequently much more effective to "Divide'n'Conquer" using several different statements to solve a given problem. Don't think about what you want to do to a row… rather think of what you want to do to a column (or columns).
Good article with some nice information. I agree that cursors should be avoided, but I can think of two times that I use them:
Repetitively calling a stored procedure to update some data with parameter values that come from the cursor. It's important to call the stored proecedure for 1) consistency in coding and maintainability and 2) because it's doing multiple things wrapped in a transaction based on the parameters passed in that are too difficult to do in set theory.
Performing cross-server updates based on what really comes down to a merge statement (if it's changed do an update, if it's new do an insert, if it no longer exists, delete it). If you're going to try and update 10,000+ records in bulk based on a cross-server join, it will take much longer and tie up more resources than running one at a time through a cursor and updating in chunks. You can use a loop with a counter, but that's really just a cursor in disguise since it's still performing RBAR operations.
Maybe these don't really count as a "traditional" uses of a cursor.
There are some problems that are very difficult to do any way other than cursors or cursor-like logic, and that remains true post-2012. Windowing functions like LAG are great and obviate some more RBAR scenarios, but sometimes it's more complicated than just looking at the previous row. I'm writing code to check SQL backups in CommVault for LSN chain breaks (like if somebody takes a log backup to disk without using CommVault) and figure out the number of steps needed to restore and the cumulative size. Since full or diff backups arrive in between the log backups, it's necessary to keep track of facts about the most recent backup of each type seen. There might be some set-based way to do it, but I'm convinced it would take so many passes that it would be much slower. The solution I settled on, after attending a talk by Jeff Moden last weekend, is a quirky update. Compared with a cursor updating the same temp table, the quirky update is something like 5 orders of magnitude faster. And it takes just one scan through the table to set the values correctly. 4 seconds to get through 100,000 rows is perf I can live with.
I've never suggested there aren't good uses for cursors – there are plenty. What I try to advocate to people is to not use cursors by default, which seems to be a common theme.
Oh and see this article for why I recommend staying away from quirky update:
http://sqlperformance.com/2012/07/t-sql-queries/running-totals