Changes to a Writable Partition May Fail Unexpectedly

If you use table partitioning with one or more partitions stored on a read-only filegroup, SQL update and delete statements may fail with an error. Of course, this is the expected behaviour if any of the modifications would require writing to a read-only filegroup; however it is also possible to encounter this error condition where the changes are restricted to filegroups marked as read-write.

Sample Database

To demonstrate the issue, we will create a simple database with a single custom filegroup that we will later mark as being read-only. Note that you will need to add in the filename path to suit your test instance.

USE master;
GO
CREATE DATABASE Test;
GO
-- This filegroup will be marked read-only later
ALTER DATABASE Test
ADD FILEGROUP ReadOnlyFileGroup;
GO
-- Add a file to the new filegroup
ALTER DATABASE Test
ADD FILE
(
    NAME = 'Test_RO',
    FILENAME = '<...your path...>\MSSQL\DATA\Test_ReadOnly.ndf'
)
TO FILEGROUP ReadOnlyFileGroup;

Partition function and scheme

We will now create a basic partitioning function and scheme that will direct rows with data before 1 January 2000 to the read-only partition. Later data will be held in the read-write primary filegroup:

USE Test;
GO
CREATE PARTITION FUNCTION PF (datetime)
AS RANGE RIGHT 
FOR VALUES ({D '2000-01-01'});
GO
CREATE PARTITION SCHEME PS
AS PARTITION PF
TO (ReadOnlyFileGroup, [PRIMARY]);

The range right specification means that rows with the boundary value 1 January 2000 will be in the read-write partition.

Partitioned table and indexes

We can now create our test table:

CREATE TABLE dbo.Test
(
    dt datetime NOT NULL,
    c1 integer NOT NULL,
    c2 integer NOT NULL,

    CONSTRAINT PK_dbo_Test__c1_dt
        PRIMARY KEY CLUSTERED (dt)
        ON PS (dt)
)
ON PS (dt);
GO
CREATE NONCLUSTERED INDEX IX_dbo_Test_c1
ON dbo.Test (c1)
ON PS (dt);
GO
CREATE NONCLUSTERED INDEX IX_dbo_Test_c2
ON dbo.Test (c2)
ON PS (dt);

The table has a clustered primary key on the datetime column, and is also partitioned on that column. There are nonclustered indexes on the other two integer columns, which are partitioned in the same way (the indexes are aligned with the base table).

Sample data

Finally, we add a couple of rows of example data, and make the pre-2000 data partition read only:

INSERT dbo.Test WITH (TABLOCKX)
    (dt, c1, c2)
VALUES 
    ({D '1999-12-31'}, 1, 1), -- Read only
    ({D '2000-01-01'}, 2, 2); -- Writable
GO
ALTER DATABASE Test
MODIFY FILEGROUP 
    ReadOnlyFileGroup READ_ONLY;

You can use the following test update statements to confirm that data in the read-only partition cannot be modified, while data with a dt value on or after 1 January 2000 can be written to:

-- Will fail, as expected
UPDATE dbo.Test
SET c2 = 1
WHERE dt = {D '1999-12-31'};

-- Will succeed, as expected
UPDATE dbo.Test
SET c2 = 999
WHERE dt = {D '2000-01-01'};

-- Reset the value of c2
UPDATE dbo.Test
SET c2 = 2
WHERE dt = {D '2000-01-01'};

An Unexpected Failure

We have two rows: one read-only (1999-12-31); and one read-write (2000-01-01):

Table contents

Now try the following query. It identifies the same writable "2000-01-01" row that we just successfully updated, but uses a different where clause predicate:

UPDATE dbo.Test
SET c2 = 2
WHERE c1 = 2;

The estimated (pre-execution) plan is:

The four (!) Compute Scalars are not important for this discussion. They are used to determine if the nonclustered index needs to be maintained for each row that arrives at the Clustered Index Update operator.

The more interesting thing is that this update statement fails with an error similar to:

Msg 652, Level 16, State 1
The index "PK_dbo_Test__c1_dt" for table "dbo.Test" (RowsetId 72057594039042048) resides on a read-only filegroup ("ReadOnlyFileGroup"), which cannot be modified.

Not Partition Elimination

If you have worked with partitioning before, you may be thinking that 'partition elimination' might be the reason. The logic would go something like this:

In the previous statements, a literal value for the partitioning column was provided in the where clause, so SQL Server would be able to determine immediately which partition(s) to access. By changing the where clause to no longer reference the partitioning column, we have forced SQL Server to access every partition using a Clustered Index Scan.

That is all true, in general, but it is not the reason the update statement fails here.

The expected behaviour is that SQL Server should be able to read from any and all partitions during query execution. A data modification operation should only fail if the execution engine actually tries to modify a row stored on a read-only filegroup.

To illustrate, let us make a small change to the previous query:

UPDATE dbo.Test
SET c2 = 2,
    dt = dt
WHERE c1 = 2;

The where clause is exactly the same as before. The only difference is that we are now (deliberately) setting the partitioning column equal to itself. This will not change the value stored in that column, but it does affect the outcome. The update now succeeds (albeit with a more complex execution plan):

The optimizer has introduced new Split, Sort, and Collapse operators, and added the machinery necessary to maintain each potentially-affected nonclustered index separately (using a wide, or per-index strategy).

The Clustered Index Scan properties show that both partitions of the table were accessed when reading:

Clustered Index Scan properties

By contrast, the Clustered Index Update shows that only the read-write partition was accessed for writing:

Clustered Index Update properties

Each of the Nonclustered Index Update operators shows similar information: only the writable partition (#2) was modified at run time, so no error occurred.

The Reason Revealed

The new plan succeeds not because the nonclustered indexes are maintained separately; nor is it (directly) due to the Split-Sort-Collapse combination necessary to avoid transient duplicate key errors in the unique index.

The real reason is something I mentioned briefly in my previous article, "Optimizing Update Queries" – an internal optimization known as Rowset Sharing. When this is used, the Clustered Index Update shares the same underlying storage engine rowset as a Clustered Index Scan, Seek, or Key Lookup on the reading side of the plan.

With the Rowset Sharing optimization, SQL Server checks for offline or read-only filegroups when reading. In plans where the Clustered Index Update uses a separate rowset, the offline/read-only check is only performed for each row at the update (or delete) iterator.

Undocumented Workarounds

Let's get the fun, geeky, but impractical stuff out of the way first.

The shared rowset optimization can only be applied when the route from the clustered index seek, scan, or key lookup is a pipeline. No blocking or semi-blocking operators are allowed. Put another way, each row must be able to get from read source to write destination before the next row is read.

As a reminder, here is the sample data, statement, and execution plan for the failed update again:

Sample data

--Change the read-write row
UPDATE dbo.Test
SET c2 = 2
WHERE c1 = 2;

Halloween Protection

One way to introduce a blocking operator to the plan is to require explicit Halloween Protection (HP) for this update. Separating the read from the write with a blocking operator will prevent the rowset sharing optimization from being used (no pipeline). Undocumented and unsupported (test system only!) trace flag 8692 adds an Eager Table Spool for explicit HP:

-- Works (explicit HP)
UPDATE dbo.Test
SET c2 = 2
WHERE c1 = 2
OPTION (QUERYTRACEON 8692);

The actual execution plan (available because the error is no longer thrown) is:

The Sort in the Split-Sort-Collapse combination seen in the earlier successful update provides the blocking necessary to disable rowset sharing in that instance.

The Anti-Rowset Sharing Trace Flag

There is another undocumented trace flag that disables the rowset sharing optimization. This has the advantage of not introducing a potentially-expensive blocking operator. It cannot be used in practice of course (unless you contact Microsoft Support and get something in writing recommending you enable it, I suppose). Nevertheless, for entertainment purposes, here is trace flag 8746 in action:

-- Works (no rowset sharing)
UPDATE dbo.Test
SET c2 = 2
WHERE c1 = 2
OPTION (QUERYTRACEON 8746);

The actual execution plan for that statement is:

Feel free to experiment with different values (ones that actually change the stored values if you like) to convince yourself of the difference here. As mentioned in my previous post, you can also use undocumented trace flag 8666 to expose the rowset sharing property in the execution plan.

If you want to see the rowset sharing error with a delete statement, simply replace the update and set clauses with a delete, while using the same where clause.

Supported Workarounds

There are any number of potential ways to ensure that rowset sharing is not applied in real-world queries without using trace flags. Now that you know the core issue requires a shared and pipelined clustered index read and write plan, you can probably come up with your own. Even so, there are a couple of examples that are particularly worth looking at here.

Forced Index / Covering Index

One natural idea is to force the reading side of the plan to use a nonclustered index instead of the clustered index. We cannot add an index hint directly to the test query as written, but aliasing the table allow this:

UPDATE T
SET c2 = 2
FROM dbo.Test AS T WITH (INDEX(IX_dbo_Test_c1))
WHERE c1 = 2;

This might seem like the solution the query optimizer should have chosen in the first place, since we have a nonclustered index on the where clause predicate column c1. The execution plan shows why the optimizer chose as it did:

The cost of the Key Lookup is enough to convince the optimizer to use the clustered index for reading. The lookup is needed to fetch the current value of column c2, so the Compute Scalars can decide if the nonclustered index needs to be maintained.

Adding column c2 to the nonclustered index (key or include) would avoid the problem. The optimizer would choose the now-covering index instead of the clustered index.

That said, it is not always possible to anticipate which columns will be needed, or to include them all even if the set is known. Remember, the column is needed because c2 is in the set clause of the update statement. If the queries are ad-hoc (e.g. submitted by users or generated by a tool), every nonclustered index would need to include all columns to make this a robust option.

One interesting thing about the plan with the Key Lookup above is that it does not generate an error. This is despite the Key Lookup and Clustered Index Update using a Shared Rowset. The reason is that the nonclustered Index Seek locates the row with c1 = 2 before the Key Lookup touches the clustered index. The shared rowset check for offline / read-only filegroups is still performed at the lookup, but it does not touch the read-only partition, so no error is thrown. As a final (related) point of interest, note that the Index Seek touches both partitions, but the Key Lookup only hits one.

Excluding the read-only partition

A trivial solution is to rely on partition elimination so the reading side of the plan never touches the read-only partition. This can be done with an explicit predicate, for example either of these:

UPDATE dbo.Test
SET c2 = 2
WHERE c1 = 2
AND dt >= {D '2000-01-01'};

UPDATE dbo.Test
SET c2 = 2
WHERE c1 = 2
AND $PARTITION.PF(dt) > 1; -- Not partition #1

Where it is impossible, or inconvenient, to change every query to add a partition-elimination predicate, other solutions like updating through a view may be suitable. For example:

CREATE VIEW dbo.TestWritablePartitions
WITH SCHEMABINDING
AS
-- Only the writable portion of the table
SELECT
    T.dt,
    T.c1,
    T.c2
FROM dbo.Test AS T
WHERE
    $PARTITION.PF(dt) > 1;
GO
-- Succeeds
UPDATE dbo.TestWritablePartitions
SET c2 = 2
WHERE c1 = 2;

One disadvantage of using a view is that an update or delete that targets the read-only part of the base table will succeed with no rows affected, rather than failing with an error. An instead of trigger on the table or view might be a workaround for that in some situations, but may also introduce more problems…but I digress.

As mentioned previously, there are many potential supported solutions. The point of this article is to show how rowset sharing caused the unexpected update error.

4 thoughts on “Changes to a Writable Partition May Fail Unexpectedly”

Sofiane says:

December 14, 2017 at 7:43 PM

Very helpfull,

Thank you for the quality of this precision which is not documented by the provider (Microsoft)!
Sofiane says:

March 23, 2018 at 2:32 PM

Hi again,

I would like to share with you this time the perfect workaround solution that I put in place in December 2017 and which is running in production since March 2018.

You just put the partition key before the primary key (the unique column)
when you create the Primary Clustered Key/Index.

CREATE CONSTRAINT [PK_Table_Column] PRIMARY KEY CLUSTERED
(
[Partition_Key_Column] ASC,
[ID_Unique_Primary_Key_Column] ASC
)

Sofiane
sqL_handLe says:

June 18, 2018 at 9:10 PM

Hello Paul!
Above you mentioned:
"As mentioned in my previous post, you can also use undocumented trace flag 8666 to expose the rowset sharing property in the execution plan."

I tried to find the previous use of 8666 mentioned, but was only able to find it used in your "Internals of the Seven SQL Server Sorts" series. Did I overlook it somewhere else?
https://sqlperformance.com/2015/04/sql-plan/internals-of-the-seven-sql-server-sorts-part-1
https://sqlperformance.com/2015/05/sql-plan/internals-of-the-seven-sql-server-sorts-part-2
Paul White says:

June 19, 2018 at 9:24 AM

The reference to the "previous post" is the same one linked to a few paragraphs up, https://sqlperformance.com/2015/12/sql-plan/optimizing-update-queries

Comments are closed.