Paul White

The SNAPSHOT Isolation Level

June 30, 2014 by in SQL Performance | 10 Comments
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul White is an independent SQL Server consultant specializing in performance tuning, execution plans, and the query optimizer.

Paul’s Posts

[ See the index for the whole series ]

 

Concurrency problems are hard in the same way that multi-threaded programming is hard. Unless serializable isolation is used, it can be tough to code T-SQL transactions that will always function correctly when other users are making changes to the database at the same time.

The potential problems can be non-trivial even if the 'transaction' in question is a simple single SELECT statement. For complex multi-statement transactions that read and write data, the potential for unexpected results and errors under high concurrency can quickly become overwhelming. Attempting to resolve subtle and hard-to-reproduce concurrency problems by applying random locking hints or other trial-and-error methods can be an extremely frustrating experience.

In many respects, the snapshot isolation level seems like a perfect solution to these concurrency problems. The basic idea is that each snapshot transaction behaves as if it were executed against its own private copy of the committed state of the database, taken at the moment the transaction started. Providing the whole transaction with an unchanging view of committed data obviously guarantees consistent results for read-only operations, but what about transactions that change data?

Snapshot isolation handles data changes optimistically, implicitly assuming that conflicts between concurrent writers will be relatively rare. Where a write conflict does occur, the first committer wins and the losing transaction has its changes rolled back. It is unfortunate for the rolled-back transaction, of course, but if this is a rare enough occurrence the benefits of snapshot isolation can easily outweigh the costs of an occasional failure and retry.

The relatively simple and clean semantics of snapshot isolation (when compared with the alternatives) can be a significant advantage, particularly for people who do not work exclusively in the database world and therefore don't know the various isolation levels well. Even for seasoned database professionals, a relatively 'intuitive' isolation level can be a welcome relief.

Of course, things are rarely as simple as they first appear, and snapshot isolation is no exception. The official documentation does a pretty good job of describing the major advantages and disadvantages of snapshot isolation, so the bulk of this article concentrates on exploring some of the less well-known and surprising issues you may encounter. First, though, a quick look at the logical properties of this isolation level:

ACID Properties and Snapshot Isolation

Snapshot isolation is not one of the isolation levels defined in the SQL Standard, but it is still often compared using the 'concurrency phenomena' defined there. For example, the following comparison table is reproduced from the SQL Server Technical Article, "SQL Server 2005 Row Versioning-Based Transaction Isolation" by Kimberly L. Tripp and Neal Graves:

ANSI Concurrency Phenomena

By providing a point-in-time view of committed data, snapshot isolation provides protection against all three concurrency phenomena shown there. Dirty reads are prevented because only committed data is visible, and the static nature of the snapshot prevents both non-repeatable reads and phantoms from being encountered.

However, this comparison (and the highlighted section in particular) only shows that the snapshot and serializable isolation levels prevent the same three specific phenomena. It does not mean they are equivalent in all respects. Importantly, the SQL-92 standard does not define serializable isolation in terms of the three phenomena alone. Section 4.28 of the standard gives the full definition:

The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

The extent and importance of the implied guarantees here are often missed. To state it in simple language:

Any serializable transaction that executes correctly when run alone will continue to execute correctly with any combination of concurrent transactions, or it will be rolled back with an error message (typically a deadlock in SQL Server's implementation).

Non-serializable isolation levels, including snapshot isolation, do not provide the same strong guarantees of correctness.

Stale Data

Snapshot isolation seems almost seductively simple. Reads always come from committed data as of a single point in time, and write conflicts are automatically detected and handled. How is this not a perfect solution for all concurrency-related difficulties?

One potential issue is that snapshot reads do not necessarily reflect the current committed state of the database. A snapshot transaction completely ignores any committed changes made by other concurrent transactions after the snapshot transaction begins. Another way to put that is to say a snapshot transaction sees stale, out-of-date data. While this behaviour might be exactly what is needed to generate an accurate point-in-time report, it might not be quite so suitable in other circumstances (for example, when used to enforce a rule in a trigger).

Write Skew

Snapshot isolation is also vulnerable to a somewhat-related phenomenon known as write skew. Reading stale data plays a part in this, but this issue also helps clarify what snapshot 'write conflict detection' does and does not do.

Write skew occurs when two concurrent transactions each read data that the other transaction modifies. No write conflict occurs because the two transactions modify different rows. Neither transaction sees the changes made by the other, because both are reading from a point in time before those changes were made.

A classic example of write skew is the white and black marble problem, but I want to show another simple example here:

-- Create two empty tables
CREATE TABLE A (x integer NOT NULL);
CREATE TABLE B (x integer NOT NULL);

-- Connection 1
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
INSERT A (x) SELECT COUNT_BIG(*) FROM B;

-- Connection 2
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
INSERT B (x) SELECT COUNT_BIG(*) FROM A;
COMMIT TRANSACTION;

-- Connection 1
COMMIT TRANSACTION;

Under snapshot isolation, both tables in that script end up with a single row containing a zero value. This is a correct result, but it is not a serializable one: it does not correspond to any possible serial transaction execution order. In any truly serial schedule, one transaction must complete before the other starts, so the second transaction would count the row inserted by the first. This might sound like a technicality, but remember the powerful serializable guarantees only apply when transactions are truly serializable.

A Conflict Detection Subtlety

A snapshot write conflict occurs whenever a snapshot transaction attempts to modify a row that has been modified by another transaction that committed after the snapshot transaction began. There are two subtleties here:

  1. The transactions do not actually have to change any data values; and
  2. The transactions do not have to modify any common columns.

The following script demonstrates both points:

-- Test table
CREATE TABLE dbo.Conflict
(
    ID1 integer UNIQUE,
    Value1 integer NOT NULL,
    ID2 integer UNIQUE,
    Value2 integer NOT NULL
);

-- Insert one row
INSERT dbo.Conflict
    (ID1, ID2, Value1, Value2)
VALUES
    (1, 1, 1, 1);

-- Connection 1
BEGIN TRANSACTION;

UPDATE dbo.Conflict
SET Value1 = 1
WHERE ID1 = 1;

-- Connection 2
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;

UPDATE dbo.Conflict
SET Value2 = 1
WHERE ID2 = 1;

-- Connection 1
COMMIT TRANSACTION;

Notice the following:

  • Each transaction locates the same row using a different index
  • Neither update results in a change to the data already stored
  • The two transactions 'update' different columns in the row.

In spite of all that, when the first transaction commits the second transaction terminates with an update conflict error:

image

Summary: Conflict detection always operates at the level of an entire row, and an 'update' does not have to actually change any data. (In case you were wondering, changes to off-row LOB or SLOB data also count as a change to the row for conflict detection purposes).

The Foreign Key Problem

Conflict detection also applies to the parent row in a foreign key relationship. When modifying a child row under snapshot isolation, a change to the parent row in another transaction can trigger a conflict. As before, this logic applies to the whole parent row – the parent update does not have to affect the foreign key column itself. Any operation on the child table that requires an automatic foreign key check in the execution plan can result in an unexpected conflict.

To demonstrate this, first create the following tables and sample data:

CREATE TABLE dbo.Dummy
(
    x integer NULL
);

CREATE TABLE dbo.Parent
(
    ParentID integer PRIMARY KEY,
    ParentValue integer NOT NULL
);

CREATE TABLE dbo.Child 
(
    ChildID integer PRIMARY KEY,
    ChildValue integer NOT NULL,
    ParentID integer NULL FOREIGN KEY REFERENCES dbo.Parent
);

INSERT dbo.Parent 
    (ParentID, ParentValue) 
VALUES (1, 1);

INSERT dbo.Child 
    (ChildID, ChildValue, ParentID) 
VALUES (1, 1, 1);

Now execute the following from two separate connections as indicated in the comments:

-- Connection 1
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
SELECT COUNT_BIG(*) FROM dbo.Dummy;

-- Connection 2 (any isolation level)
UPDATE dbo.Parent SET ParentValue = 1 WHERE ParentID = 1;

-- Connection 1
UPDATE dbo.Child SET ParentID = NULL WHERE ChildID = 1;
UPDATE dbo.Child SET ParentID = 1 WHERE ChildID = 1;

The read from the dummy table is there to ensure the snapshot transaction has officially started. Issuing BEGIN TRANSACTION is not enough to do this; we have to perform some sort of data access on a user table.

The first update to the Child table does not cause a conflict because setting the referencing column to NULL does not require a parent table check in the execution plan (there is nothing to check). The query processor does not touch the parent row in the execution plan, so no conflict arises.

The second update to the Child table does trigger a conflict because a foreign key check is automatically performed. When the Parent row is accessed by the query processor, it is also checked for an update conflict. An error is raised in this case because the referenced Parent row has experienced a committed modification after the snapshot transaction started. Note that the Parent table modification did not affect the foreign key column itself.

An unexpected conflict can also occur if a change to the Child table references a Parent row that was created by a concurrent transaction (and that transaction committed after the snapshot transaction started).

Summary: A query plan that includes an automatic foreign key check can throw a conflict error if the referenced row has experienced any sort of modification (including creation!) since the snapshot transaction started.

The Truncate Table Issue

A snapshot transaction will fail with an error if any table it accesses has been truncated since the transaction began. This applies even if the truncated table had no rows to begin with, as the script below demonstrates:

CREATE TABLE dbo.AccessMe
(
    x integer NULL
);

CREATE TABLE dbo.TruncateMe
(
    x integer NULL
);

-- Connection 1
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
SELECT COUNT_BIG(*) FROM dbo.AccessMe;

-- Connection 2
TRUNCATE TABLE dbo.TruncateMe;

-- Connection 1
SELECT COUNT_BIG(*) FROM dbo.TruncateMe;

The final SELECT fails with the an error:

image

This is another subtle side-effect to check for before enabling snapshot isolation on an existing database.

Next Time

The next (and final) post in this series will talk about the read uncommitted isolation level (affectionately known as "nolock").

[ See the index for the whole series ]