The SNAPSHOT Isolation Level - SQLPerformance.com

[ See the index for the whole series ]

Concurrency problems are hard in the same way that multi-threaded programming is hard. Unless serializable isolation is used, it can be tough to code T-SQL transactions that will always function correctly when other users are making changes to the database at the same time.

The potential problems can be non-trivial even if the 'transaction' in question is a simple single SELECT statement. For complex multi-statement transactions that read and write data, the potential for unexpected results and errors under high concurrency can quickly become overwhelming. Attempting to resolve subtle and hard-to-reproduce concurrency problems by applying random locking hints or other trial-and-error methods can be an extremely frustrating experience.

In many respects, the snapshot isolation level seems like a perfect solution to these concurrency problems. The basic idea is that each snapshot transaction behaves as if it were executed against its own private copy of the committed state of the database, taken at the moment the transaction started. Providing the whole transaction with an unchanging view of committed data obviously guarantees consistent results for read-only operations, but what about transactions that change data?

Snapshot isolation handles data changes optimistically, implicitly assuming that conflicts between concurrent writers will be relatively rare. Where a write conflict does occur, the first committer wins and the losing transaction has its changes rolled back. It is unfortunate for the rolled-back transaction, of course, but if this is a rare enough occurrence the benefits of snapshot isolation can easily outweigh the costs of an occasional failure and retry.

The relatively simple and clean semantics of snapshot isolation (when compared with the alternatives) can be a significant advantage, particularly for people who do not work exclusively in the database world and therefore don't know the various isolation levels well. Even for seasoned database professionals, a relatively 'intuitive' isolation level can be a welcome relief.

Of course, things are rarely as simple as they first appear, and snapshot isolation is no exception. The official documentation does a pretty good job of describing the major advantages and disadvantages of snapshot isolation, so the bulk of this article concentrates on exploring some of the less well-known and surprising issues you may encounter. First, though, a quick look at the logical properties of this isolation level:

ACID Properties and Snapshot Isolation

Snapshot isolation is not one of the isolation levels defined in the SQL Standard, but it is still often compared using the 'concurrency phenomena' defined there. For example, the following comparison table is reproduced from the SQL Server Technical Article, "SQL Server 2005 Row Versioning-Based Transaction Isolation" by Kimberly L. Tripp and Neal Graves:

ANSI Concurrency Phenomena

By providing a point-in-time view of committed data, snapshot isolation provides protection against all three concurrency phenomena shown there. Dirty reads are prevented because only committed data is visible, and the static nature of the snapshot prevents both non-repeatable reads and phantoms from being encountered.

However, this comparison (and the highlighted section in particular) only shows that the snapshot and serializable isolation levels prevent the same three specific phenomena. It does not mean they are equivalent in all respects. Importantly, the SQL-92 standard does not define serializable isolation in terms of the three phenomena alone. Section 4.28 of the standard gives the full definition:

The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

The extent and importance of the implied guarantees here are often missed. To state it in simple language:

Any serializable transaction that executes correctly when run alone will continue to execute correctly with any combination of concurrent transactions, or it will be rolled back with an error message (typically a deadlock in SQL Server's implementation).

Non-serializable isolation levels, including snapshot isolation, do not provide the same strong guarantees of correctness.

Stale Data

Snapshot isolation seems almost seductively simple. Reads always come from committed data as of a single point in time, and write conflicts are automatically detected and handled. How is this not a perfect solution for all concurrency-related difficulties?

One potential issue is that snapshot reads do not necessarily reflect the current committed state of the database. A snapshot transaction completely ignores any committed changes made by other concurrent transactions after the snapshot transaction begins. Another way to put that is to say a snapshot transaction sees stale, out-of-date data. While this behaviour might be exactly what is needed to generate an accurate point-in-time report, it might not be quite so suitable in other circumstances (for example, when used to enforce a rule in a trigger).

Write Skew

Snapshot isolation is also vulnerable to a somewhat-related phenomenon known as write skew. Reading stale data plays a part in this, but this issue also helps clarify what snapshot 'write conflict detection' does and does not do.

Write skew occurs when two concurrent transactions each read data that the other transaction modifies. No write conflict occurs because the two transactions modify different rows. Neither transaction sees the changes made by the other, because both are reading from a point in time before those changes were made.

A classic example of write skew is the white and black marble problem, but I want to show another simple example here:

-- Create two empty tables
CREATE TABLE A (x integer NOT NULL);
CREATE TABLE B (x integer NOT NULL);

-- Connection 1
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
INSERT A (x) SELECT COUNT_BIG(*) FROM B;

-- Connection 2
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
INSERT B (x) SELECT COUNT_BIG(*) FROM A;
COMMIT TRANSACTION;

-- Connection 1
COMMIT TRANSACTION;

Under snapshot isolation, both tables in that script end up with a single row containing a zero value. This is a correct result, but it is not a serializable one: it does not correspond to any possible serial transaction execution order. In any truly serial schedule, one transaction must complete before the other starts, so the second transaction would count the row inserted by the first. This might sound like a technicality, but remember the powerful serializable guarantees only apply when transactions are truly serializable.

A Conflict Detection Subtlety

A snapshot write conflict occurs whenever a snapshot transaction attempts to modify a row that has been modified by another transaction that committed after the snapshot transaction began. There are two subtleties here:

The transactions do not actually have to change any data values; and
The transactions do not have to modify any common columns.

The following script demonstrates both points:

-- Test table
CREATE TABLE dbo.Conflict
(
    ID1 integer UNIQUE,
    Value1 integer NOT NULL,
    ID2 integer UNIQUE,
    Value2 integer NOT NULL
);

-- Insert one row
INSERT dbo.Conflict
    (ID1, ID2, Value1, Value2)
VALUES
    (1, 1, 1, 1);

-- Connection 1
BEGIN TRANSACTION;

UPDATE dbo.Conflict
SET Value1 = 1
WHERE ID1 = 1;

-- Connection 2
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;

UPDATE dbo.Conflict
SET Value2 = 1
WHERE ID2 = 1;

-- Connection 1
COMMIT TRANSACTION;

Notice the following:

Each transaction locates the same row using a different index
Neither update results in a change to the data already stored
The two transactions 'update' different columns in the row.

In spite of all that, when the first transaction commits the second transaction terminates with an update conflict error:

Summary: Conflict detection always operates at the level of an entire row, and an 'update' does not have to actually change any data. (In case you were wondering, changes to off-row LOB or SLOB data also count as a change to the row for conflict detection purposes).

The Foreign Key Problem

Conflict detection also applies to the parent row in a foreign key relationship. When modifying a child row under snapshot isolation, a change to the parent row in another transaction can trigger a conflict. As before, this logic applies to the whole parent row – the parent update does not have to affect the foreign key column itself. Any operation on the child table that requires an automatic foreign key check in the execution plan can result in an unexpected conflict.

To demonstrate this, first create the following tables and sample data:

CREATE TABLE dbo.Dummy
(
    x integer NULL
);

CREATE TABLE dbo.Parent
(
    ParentID integer PRIMARY KEY,
    ParentValue integer NOT NULL
);

CREATE TABLE dbo.Child 
(
    ChildID integer PRIMARY KEY,
    ChildValue integer NOT NULL,
    ParentID integer NULL FOREIGN KEY REFERENCES dbo.Parent
);

INSERT dbo.Parent 
    (ParentID, ParentValue) 
VALUES (1, 1);

INSERT dbo.Child 
    (ChildID, ChildValue, ParentID) 
VALUES (1, 1, 1);

Now execute the following from two separate connections as indicated in the comments:

-- Connection 1
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
SELECT COUNT_BIG(*) FROM dbo.Dummy;

-- Connection 2 (any isolation level)
UPDATE dbo.Parent SET ParentValue = 1 WHERE ParentID = 1;

-- Connection 1
UPDATE dbo.Child SET ParentID = NULL WHERE ChildID = 1;
UPDATE dbo.Child SET ParentID = 1 WHERE ChildID = 1;

The read from the dummy table is there to ensure the snapshot transaction has officially started. Issuing BEGIN TRANSACTION is not enough to do this; we have to perform some sort of data access on a user table.

The first update to the Child table does not cause a conflict because setting the referencing column to NULL does not require a parent table check in the execution plan (there is nothing to check). The query processor does not touch the parent row in the execution plan, so no conflict arises.

The second update to the Child table does trigger a conflict because a foreign key check is automatically performed. When the Parent row is accessed by the query processor, it is also checked for an update conflict. An error is raised in this case because the referenced Parent row has experienced a committed modification after the snapshot transaction started. Note that the Parent table modification did not affect the foreign key column itself.

An unexpected conflict can also occur if a change to the Child table references a Parent row that was created by a concurrent transaction (and that transaction committed after the snapshot transaction started).

Summary: A query plan that includes an automatic foreign key check can throw a conflict error if the referenced row has experienced any sort of modification (including creation!) since the snapshot transaction started.

The Truncate Table Issue

A snapshot transaction will fail with an error if any table it accesses has been truncated since the transaction began. This applies even if the truncated table had no rows to begin with, as the script below demonstrates:

CREATE TABLE dbo.AccessMe
(
    x integer NULL
);

CREATE TABLE dbo.TruncateMe
(
    x integer NULL
);

-- Connection 1
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
SELECT COUNT_BIG(*) FROM dbo.AccessMe;

-- Connection 2
TRUNCATE TABLE dbo.TruncateMe;

-- Connection 1
SELECT COUNT_BIG(*) FROM dbo.TruncateMe;

The final SELECT fails with the an error:

This is another subtle side-effect to check for before enabling snapshot isolation on an existing database.

Next Time

The next (and final) post in this series will talk about the read uncommitted isolation level (affectionately known as "nolock").

[ See the index for the whole series ]

10 thoughts on “The SNAPSHOT Isolation Level”

Aaron Morelli says:

June 30, 2014 at 5:09 PM

Paul,

Thanks for the article…a good read. I did notice that in the "A Conflict Detection Subtlety" section, the script references the non-existent "ID" column (instead of "ID2") in the statement:
```
UPDATE dbo.Conflict
SET Value2 = 1
WHERE ID = 1;
```
Ciao,
Aaron
Paul White says:

June 30, 2014 at 7:07 PM

Thanks Mr Morelli! I have corrected the typo.
Adrian says:

July 2, 2014 at 11:30 AM

Excellent post. I'm dealing with those cases right now.
James Lupolt says:

July 2, 2014 at 7:24 PM

Thanks for this. Do you think it is intentional that the SNAPSHOT implementation allows that non-updating UPDATEs (I think I'm using the term correctly there) cause write conflicts, or was this a bug/unintended behaviour?
Paul White says:

July 2, 2014 at 9:41 PM

Hi James,

My guess is that it is a natural side-effect of the way conflict detection works, and putting special code in to avoid triggering a conflict when a non-updating update occurs would have been too risky.
John Liu says:

July 15, 2014 at 4:56 AM

Thanks Paul,

Great article. In "A Conflict Detection Subtlety" section, for connection1 update, even though the end result for that Value1 column appears to be not changed it's still an update action on that column/row. I think undercover, SQL using an internal flag to track if and update action been performed on the row or not (regardless if the value been updated is different from original or not) and connection2 update simply check that flag thus error out. To prove that, if you change the update where clause to be just 1 = 2, then connection2 will not error out.

Regards,
John

Yaroslav Schekin says:

September 15, 2014 at 5:03 PM

Thanks Paul, good article.

But this sentence "Providing the whole transaction with an unchanging view of committed data obviously guarantees consistent results for read-only operations…" seems not true to me.

Here is the example (open 3 sessions, execute in written sequence):

------------------------------
CREATE TABLE BankAccounts(
   Account VARCHAR(10) NOT NULL PRIMARY KEY,
   Balance NUMERIC(18,4) NOT NULL
   )

INSERT INTO BankAccounts VALUES('Checking', 0)
INSERT INTO BankAccounts VALUES('Savings', 0)

--------------------------------------------------------------------------
-- Session 1: Deposits 20 to savings account.
-- Session 2: Subtracts 10 from the checking account, considering the
--            withdrawal covered as long as (savings + checking) > 0, but
--            accepting an overdraft with a penalty charge of 1
--            if (savings + checking) goes negative.
-- Session 3: Read-only, reads balances and reports them to the customer.

--------------------------------------------------------------------------
-- Preparation:
-- In all sessions:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT

----- Demonstration ------
-- Session 2 (checks total first, gets 0):
BEGIN TRANSACTION
SELECT Balance FROM BankAccounts WHERE Account = 'Savings'
SELECT Balance FROM BankAccounts WHERE Account = 'Checking'

-- Session 1 (deposits):
BEGIN TRANSACTION
SELECT Balance FROM BankAccounts WHERE Account = 'Savings'
UPDATE BankAccounts SET Balance = 20 WHERE Account = 'Savings'
COMMIT TRANSACTION

-- Session 3 (reads):
BEGIN TRANSACTION
SELECT Balance FROM BankAccounts WHERE Account = 'Savings'
SELECT Balance FROM BankAccounts WHERE Account = 'Checking'
COMMIT TRANSACTION -- Returns: Savings = 20, Checking = 0

-- Session 1 again (as calculated new total  S2 -> S3: Savings = 20, Checking = -10
S1 -> S3 -> S2: Reads: Savings = 20, Checking = 0, Final: 20 and -10
S2 -> S1 -> S3: Savings = 20, Checking = -11
S2 -> S3 ->; S1: Reads: Savings = 0, Checking = -11, Final: 20 and -11
S3 -> S1 -> S2: 0 and 0
S3 -> S2 -> S1: 0 and 0

So, there is NO serial schedule where Session 3 can get:
Savings = 20, Checking = 0 and final result can be: Savings = 20, Checking = -11

To be honest, I've just stolen ;) the example from:
"A Read-Only Transaction Anomaly Under Snapshot Isolation" by Alan Fekete, Elizabeth O'Neil, and Patrick O'Neil

Paul White says:

September 15, 2014 at 5:44 PM

Hello. Yes I had read that paper myself, and if memory serves it features in the PostgreSQL documentation/examples I linked to earlier in the series. Snapshot doesn't guarantee serializable behaviour (depending on implementation) but it does produce "consistent" results in the sense that a transaction will continue to read the same version of the data. I hope the rest of the article makes it clear that snapshot != serializable.
Yaroslav Schekin says:

September 15, 2014 at 6:02 PM

I just wanted to note that this example demonstrates anomaly in/with read-only transaction.
This is interesting (not obvious) case by itself and deserves to be covered too, IMHO. BTW, without this read-only transaction any schedule of S1 and S2 in the example is serializable.
Paul White says:

September 15, 2014 at 6:29 PM

Yep, fair enough.

Comments are closed.