One of the fixes included in Cumulative Update 11 for SQL Server 2008 R2 Service Pack 2 addresses an "incorrect deadlock" that can occur in a specific scenario (explained later in this article). Unfortunately, the fix introduces a new bug, where SELECT queries under RCSI (read committed snapshot isolation) start taking table-level intent-shared locks. As a consequence, you may see increased blocking (and potentially deadlocking) for RCSI queries after applying 2008 R2 SP2 CU11 (or later).
This will come as an unwelcome surprise to anyone accustomed to readers not blocking writers (and vice-versa) when using RCSI. There is no fix for the RCSI bug at the time of writing. In fact, the Connect item created by Eugene Karpovich to report the issue has been closed as "Won't Fix", though I understand this decision is currently under review.
Ordinarily, this issue might not be such a huge concern, because cumulative updates are generally not as widely applied as full service packs. However, Microsoft recently announced there will be a Final Service Pack 3 for SQL Server 2008 R2. This service pack will be a simple roll up of existing SP2 cumulative updates (up to and including CU13) but with no new fixes. The result of all this is that, unless something changes in the meantime, users applying SP3 will suddenly start being affected by the RCSI bug introduced in CU11.
edit: Just before this article was published, Microsoft confirmed this regression will be fixed in SP3.
The same "incorrect deadlock" bug (whose fix introduces the new bug) was also fixed in Cumulative Update 8 for SQL Server 2012 Service Pack 1 as described in KB2923460. The fix for SQL Server 2012 is different, and does not introduce the new RCSI problem.
SQL Server 2014 was never affected by either issue, as far as I can tell. There is certainly no documentation to indicate otherwise, and the tests I have performed on 2014 RTM, CU1, and CU2 do not reproduce either bug.
The 2008 R2 RCSI Bug
A SELECT query running under RCSI typically takes only a schema stability (Sch-S) lock, which is compatible with all other locks except a schema modification (Sch-M) lock. When CU11 (or later) is applied to a SQL Server 2008 R2 instance, these queries start taking a table-level intent-shared (Tab-IS) lock. The following test script can be used to demonstrate the difference in behaviours:
USE master;
GO
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET NOCOUNT ON;
GO
CREATE DATABASE RCSI;
GO
ALTER DATABASE RCSI
SET READ_COMMITTED_SNAPSHOT ON;
GO
ALTER DATABASE RCSI
SET ALLOW_SNAPSHOT_ISOLATION OFF;
GO
USE RCSI;
GO
CREATE TABLE dbo.Test
(
id integer IDENTITY NOT NULL,
col1 integer NOT NULL,
CONSTRAINT PK_Test
PRIMARY KEY CLUSTERED (id)
);
GO
INSERT dbo.Test
(col1)
VALUES
(1), (2), (3), (4);
GO
-- Show locks
DBCC TRACEON (1200, 3604, -1) WITH NO_INFOMSGS;
SELECT * FROM dbo.Test;
DBCC TRACEOFF (1200, 3604, -1) WITH NO_INFOMSGS;
GO
ALTER DATABASE RCSI
SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;
USE master;
DROP DATABASE RCSI;
When run against an instance of SQL Server 2008 R2 without the bug, the debug output shows a single Sch-S lock taken for the test statement as expected:
Process releasing lock on OBJECT: 7:2105058535:0
When run against SQL Server 2008 R2 build 10.50.4302 (or higher) the output is similar to:
Process releasing lock on OBJECT: 7:2105058535:0
Notice the Sch-S lock has been replaced by a Tab-IS lock.
Implications and Mitigations
An intent-shared (IS) lock is still a very compatible lock, but it is not quite as concurrency-friendly as Sch-S. The lock compatibility matrix shows that an IS lock conflicts with:
- Sch-M (schema modification) – as per Sch-S
- BU (bulk update)
- X (exclusive)
The incompatibility with exclusive (X) locks means a read under RCSI will block if a concurrent process holds an exclusive lock on the same resource. Likewise, a writer that needs an exclusive lock will block if a concurrent RCSI reader holds an IS lock. Excusive locks are obtained whenever data is modified, and held to the end of the transaction, so the effect of the bug is that readers under RCSI will be blocked by concurrent writers (and vice versa) when they weren't before CU11 was applied.
A significant mitigating factor is that the bug only causes a table-level intent-shared lock to be taken. A concurrent writer that needs a table-level exclusive lock will cause blocking (and potentially a deadlock). However, concurrent writers that only require exclusive locks at a lower (e.g. row, page or partition) level will not cause blocking or a deadlock. At the table level, these writers will only acquire an intent-exclusive (IX) lock, which is compatible with Tab-IS. The exclusive locks taken at lower levels of granularity will not cause a conflict.
In most systems, table-level exclusive (Tab-X) locks will be relatively uncommon. Unless explicitly requested using a TABLOCKX hint, some possible causes of a Tab-X lock are:
- Lock escalation from a lower granularity
- Using SERIALIZABLE without a supporting index for key-range locks
A technical workaround is to add the (redundant) table hint WITH (READCOMMITTED)
to every table in every query that runs under RCSI. This happens to bypass the bug so only a Sch-S lock is taken, but it is hardly a practical proposition.
Despite these mitigations, taking Tab-IS for a read-only query under RCSI is still incorrect behaviour. I hope it can be fixed for SQL Server 2008 R2 before Service Pack 3 is released.
The "Incorrect Deadlock" Bug
As mentioned earlier, the RCSI bug is introduced as a side-effect of a fix for an "incorrect deadlock" bug. This earlier issue is documented for SQL Server 2008 R2 in KB2929464 and for SQL Server 2012 in KB2923460. Neither document is a model of clarity (or accuracy), but the underlying issue is quite interesting, so I want to spend a bit of time looking at it here.
Essentially, the deadlock occurs when:
- Three or more concurrent transactions read from the same table
- The UPDLOCK and TABLOCK hints are used in all three cases
- The database setting READ_COMMITTED_SNAPSHOT is ON
Note that it does not matter which isolation level the transactions run under. To reproduce the bug, first run the setup script below:
USE master;
GO
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
GO
CREATE DATABASE IncorrectDeadlock;
GO
ALTER DATABASE IncorrectDeadlock
SET READ_COMMITTED_SNAPSHOT ON;
GO
USE IncorrectDeadlock;
GO
CREATE TABLE dbo.Test
(
id integer IDENTITY NOT NULL,
col1 integer NOT NULL,
CONSTRAINT PK_Test
PRIMARY KEY CLUSTERED (id)
);
GO
INSERT dbo.Test
(col1)
VALUES
(1);
Next, run the following script in three separate connections (note that the transaction is left open):
USE IncorrectDeadlock;
GO
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
GO
BEGIN TRANSACTION;
SELECT
T.id,
T.col1
FROM dbo.Test AS T
WITH (UPDLOCK, TABLOCK);
At this point, the first session will have returned a result set and the other two will be blocked. The "incorrect deadlock" arises when the first session completes its transaction (either committing or rolling back). When this occurs, one of the other two sessions will report a deadlock:
The deadlock occurs because the two previously-blocked sessions hold Tab-IX (table-level intent-exclusive) and both want to convert their lock to Tab-X (table-level exclusive). Tab-IX is compatible with another Tab-IX, but not Tab-X. This is a conversion deadlock (and the irony here is that UPDLOCK is often used to avoid conversion deadlocks).
Feel free to vary the transaction isolation level for the three queries as you wish. The deadlock will occur regardless, so long as RCSI is enabled, with the same locks involved. When the tests are complete, remove the test database:
USE IncorrectDeadlock;
ALTER DATABASE IncorrectDeadlock
SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;
USE master;
DROP DATABASE IncorrectDeadlock;
Analysis and Explanation
I personally don't recall ever having used UPDLOCK and TABLOCK together in my code. To me, this combination of hints seems intuitively odd because SQL Server does not have a table-level update lock. So, what does it even mean to specify UPDLOCK and TABLOCK hints together?
The documentation has this to say:
Specifies that update locks are to be taken and held until the transaction completes. UPDLOCK takes update locks for read operations only at the row-level or page-level. If UPDLOCK is combined with TABLOCK, or a table-level lock is taken for some other reason, an exclusive (X) lock will be taken instead.
This suggests that the hint combination ought to result in a single exclusive table lock. In fact, this is not quite the whole story:
In SQL Server 2000, combining UPDLOCK and TABLOCK hints results in Tab-S (a shared table lock) being taken followed by conversion to Tab-X (exclusive table lock) under all isolation levels except READ UNCOMMITTED. This sequence of locks can result in a deadlock where three or more sessions are involved: two sessions acquire Tab-S and both wait on the other to convert to Tab-X. Under READ UNCOMMITTED, SQL Server 2000 takes Sch-S then Tab-X, which is not prone to deadlock (just normal blocking).
In SQL Server 2005 onward (without the bug fix) the locks taken depend only on whether RCSI is enabled or not. If RCSI is enabled, all isolation levels take Tab-IX then convert to Tab-X. This sequence causes the deadlock the bug fix addresses.
If RCSI is not enabled, the matching isolation levels behave as they did under SQL Server 2000 (taking Tab-S then converting to Tab-X). The (new for 2005) snapshot isolation level takes Sch-S followed by Tab-X. As a consequence, SI and READ UNCOMMITTED are the only isolation levels not prone to this deadlock in the UPDLOCK, TABLOCK scenario when RCSI is not enabled.
The Deadlock Fix
The fix changes the locks taken when UPDLOCK and TABLOCK are specified together, for all isolation levels, and regardless of whether RCSI is enabled or not. After the fix is applied, UPDLOCK and TABLOCK cause the engine to acquire Tab-SIX (table-level shared with intent exclusive), which is then converted to Tab-X.
This avoids the deadlock scenario because Tab-SIX is incompatible with another Tab-SIX. Remember, the deadlock occurred when two processes held Tab-IX waiting to convert to Tab-X. With Tab-IX replaced by Tab-SIX, it is not possible for both to hold Tab-SIX at the same time. The result is a normal blocking scenario instead of a deadlock.
Final Thoughts
The "incorrect deadlock" fix does resolve one particular deadlock scenario, but it still does not result in the behaviour I imagine the people specifying UPDLOCK and TABLOCK envisaged. If SQL Server did have a Tab-U (table-level update) lock, it would prevent concurrent changes to the table but allow concurrent readers. This is what I imagine the intent of people using these hints together would be, and I can see how it might be useful.
The current implementation (where Tab-X is ultimately taken instead of the missing Tab-U) does not match this expectation because Tab-X prevents concurrent reads (unless a row-versioning isolation level is used). We might as well specify TABLOCKX in many cases. The fact that the fix also introduces a new bug (for users of SQL Server 2008 R2 only) is also unfortunate, particularly if the bug goes on to be included in 2008 R2 SP3.
Note that the deadlock fix is not being made available for SQL Server versions prior to 2008 R2. These versions will continue to have the complex locking behaviour for UPDLOCK and TABLOCK as described above.
My thanks to Eugene Karpovich who first brought this issue to my attention in a comment to my article on Data Modifications Under RCSI.
Any inherent reason SQL Server doesn't have a table-level U-lock? That would seem like a correct fix as well. There aren't any intent-update locks either, right?
Paul, thank you for the article and the additional information on the subject.
Also, just yesterday Microsoft changed the status of the Connect item back to Active – good news!
I have never really understood why SQL Server doesn't have a Tab-U lock. It has probably been discussed many times over the years, but I don't recall ever seeing an explanation I personally found completely satisfying. The lack of Tab-U seems like an odd loss of orthogonality to me, and one that seems to be at the root of the whole problem (as I intended to imply). Yes, Tab-U would be a correct fix. You can get IU locks at page level (associated with a Row-U lock) but not at the table level IIRC. I've never checked partition-level locking, that might also allow IU locks…?
Since SQL server can escalate to table-based lock both under memory pressure and when too many rows are locked, I do believe trace flag 1224 can mitigate (disables escalation due to number of locks). And trace flag 1211 would disable all lock escalation to table level – which may lead to different performance issues…
Just wondering if this fix made it into the QFE build 4321 for MS14-044?
Chris
What about switching from RCSI to SI as a possible workaround?
As the article says, "As a consequence, SI and READ UNCOMMITTED are the only isolation levels not prone to this deadlock in the UPDLOCK, TABLOCK scenario when RCSI is not enabled."
Moving to SI for just this reason would be a strange choice, however. It would seem easier to apply the fix rather than switching isolation levels.
I would also mention that a few considerations (which might get quite important based on my experience) need to be taken into account when working with the SI isolation (or, how we internally call it, TLSI – "transaction level snapshot isolation").
For example, under SI, a query that accesses a user table in another database requires this database to have ALLOW_SNAPSHOT_ISOLATION option ON, otherwise the query will fail.
Also, since SQL Server does not generate versions of system metadata, any committed DDL modification of an object will cause any subsequent reference of this object made under some SI transaction to fail it and roll back (provided that this transaction has started before the DDL modification was made). Under RCSI, such reference would succeed.
Thanks Paul and Eugene,
the "fix" you are talking about, does it exist today? I can't find it
The fix will be in SP3 for 2008 R2. There is no exact date for that release yet.
Thanks Paul.
do you know if M14-044 GDR will cause the same thing? GDR goes on top of sp2, while QFE goes on top of CU5+ I think.
SQL Server 2008 R2 SP3 was released today http://www.microsoft.com/en-us/download/details.aspx?id=44271
Tested and confirmed the bug is resolved (build 10.50.6000).