SQL Server 2014 CTP1 introduces low priority lock wait options for use with online index operations and partition switches.
For those leveraging online index management or index partitioning and partition switch operations in SQL Server 2012 Enterprise Edition, you may have at one point experienced blocking of your DDL operation as these operations still have some locking requirements.
To illustrate, imagine that I execute the following single-partition online index rebuild in SQL Server 2014 CTP1:
ALTER INDEX [ClusteredIndex_on_ps_ShipDate] ON [dbo].[FactInternetSales] REBUILD PARTITION = (37) WITH (ONLINE= ON);
And let's take a look at the locks acquired and released during this rebuild operation using Extended Events and the following session definition (this is target-less session and I watched the results via the "Watch Live Data" pane in SQL Server Management Studio):
CREATE EVENT SESSION [Online_Index_Rebuild_Locks_Taken] ON SERVER ADD EVENT sqlserver.lock_acquired( WHERE ([object_id]=(309576141))), ADD EVENT sqlserver.lock_released( WHERE ([object_id]=(309576141))) WITH ( MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF ); GO
The value 309576141 represents the object ID of the FactInternetSales table.
My online index rebuild of a single partition took 56 seconds to complete and after completion, I saw the following lock acquire and release activity:
As you can see from the output, although the rebuild is an online operation, it does involve the acquiring of locks in various modes over the lifecycle of the operation. Ideally the lock duration is minimal (for example – the timestamp is identical for the first
SCH_S lock acquired and released). But even with a minimal amount of locking you can certainly encounter concurrency issues depending on the transactions running against the index being rebuilt or switched into.
I mentioned at the beginning of this post that Microsoft introduced low priority lock wait options for online operations and partition switch operations in SQL Server 2014 CTP1. On the subject of partition switches, imagine I execute the following operation:
ALTER TABLE [AdventureWorksDW2012].[dbo].[FactInternetSales] SWITCH PARTITION 37 TO [AdventureWorksDW2012].[dbo].[staging_FactInternetSales];
To see the locks acquired and released for this operation, I modified my previously defined Extended Event session to include the applicable objects (source and target table). I saw the following:
The switch operation to an empty partition occurred in less than a second, but we still see that
SCH_M locks were required during the operation lifecycle on both the source and destination (309576141 being FactInternetSales and 398624463 being staging_FactInternetSales).
So again, while the duration of the locking can be extremely brief when there are no concurrent transactions accessing the objects in question, we know this isn't always possible and so our online index rebuild and partition switch operations can indeed get blocked.
So with this reality, SQL Server 2014 introduces the
WAIT_AT_LOW_PRIORITY argument that can be adjusted with
ABORT_AFTER_WAIT options for both the
ALTER INDEX and
ALTER TABLE commands that we can use for both online index and partition switch operations.
What does this allow us to do? First of all, let's talk about what the behavior was prior to SQL Server 2014. As an example, imagine that I have the following transaction open and uncommitted:
BEGIN TRANSACTION; DELETE [dbo].[staging_FactInternetSales];
If I tried to perform an
ALTER TABLE SWITCH to the staging_FactInternetSales table as a destination in a separate session, I'm going to be blocked and the request is just going to wait. Specifically for this example, I would be waiting with a
LCK_M_SCH_M wait type. Once I rollback or commit my transaction, the operation can move forward and complete.
Now if I'm using SQL Server 2014's
ABORT_AFTER_WAIT, I can leverage a few different options depending on my application requirements.
MAX_DURATION allows me to specify the number of minutes the online index rebuild or partition switch operation will wait. If the
MAX_DURATION value is reached, we can set what happens next based on the setting of
ABORT_AFTER_WAIT, which can be a value of
NONEmeans that the index operation will continue attempting the operation.
SELFmeans that if the
MAX_DURATIONis reached, the operation (the online index rebuild or partition switch) will be cancelled.
BLOCKERSis used, it will kill any transactions that are blocking the online index rebuild or partition switch operation (not an option, in my opinion, to be used lightly).
ALTER ANY CONNECTIONpermission for the request issuing the online index rebuild or partition switch operation.
The following code examples demonstrate different configuration variations.
Pre-2014 default behavior (wait indefinitely)
Executing the following will result in the behavior we're used to seeing pre-SQL Server 2014 – and it might still be what you'll want or expect for certain scenarios:
ALTER TABLE [AdventureWorksDW2012].[dbo].[FactInternetSales] SWITCH PARTITION 37 TO [AdventureWorksDW2012].[dbo].[staging_FactInternetSales] WITH (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 0 MINUTES, ABORT_AFTER_WAIT = NONE));
Wait 1 Minute and Cancel the DDL Operation
- The following example waits for 1 minute if there is a blocking transaction and will get a "lock request time out period exceeded" for the
SWITCHoperation if the maximum duration is reached:
ALTER TABLE [AdventureWorksDW2012].[dbo].[FactInternetSales] SWITCH PARTITION 37 TO [AdventureWorksDW2012].[dbo].[staging_FactInternetSales] WITH (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 1 MINUTES, ABORT_AFTER_WAIT = SELF));
Wait 1 Minute and Kill the Blocker(s)
- This example waits for 1 minute if there is a blocking transaction and then will kill the blocking transactions (source or destination included), allowing the
SWITCHoperation to complete.
ALTER TABLE [AdventureWorksDW2012].[dbo].[staging_FactInternetSales] SWITCH PARTITION 37 TO [AdventureWorksDW2012].[dbo].[FactInternetSales] WITH (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 1 MINUTES, ABORT_AFTER_WAIT = BLOCKERS));
In my example of a
DELETE inside an uncommitted transaction, there was no error in my SQL Server Management Studio window as I didn't have an actively running statement, but attempting another statement within that session returned the following error message (as my session had been killed):
A transport-level error has occurred when sending the request to the server. (provider: Shared Memory Provider, error: 0 – No process is on the other end of the pipe.)
Kill the Blocker(s) Immediately (Source or Destination for SWITCH)
- The following is an example of killing the blocker immediately – and in my example the switch happened in sub-second and indeed the session that was the blocker did get killed:
ALTER TABLE [AdventureWorksDW2012].[dbo].[FactInternetSales] SWITCH PARTITION 37 TO [AdventureWorksDW2012].[dbo].[staging_FactInternetSales] WITH (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 0 MINUTES, ABORT_AFTER_WAIT = BLOCKERS));
One last positive aspect I wanted to call out…
The SQL Server Error Log provides some by-default auditing of the low priority lock wait usage, including information about the
ABORT_AFTER_WAIT operation inline with the victim information:
Log SQL Server (Current – 9/10/2013 12:03:00 PM)
Process ID 57 was killed by an ABORT_AFTER_WAIT = BLOCKERS DDL statement on database_id = 5, object_id = 309576141.
And you'll also see separate entries for the original operation itself. For example:
This kind of logging is very useful for troubleshooting and auditing purposes and I'm glad to see it.