SQL Server 2014 CTP1 introduces extensions to online operation options that will good news for companies hosting very large databases that require little to no downtime.
To set the context, imagine that you’re using SQL Server 2012 Enterprise Edition for the online index management and index partitioning features and you attempt the following index rebuild on a partitioned table:
ALTER INDEX [PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber]
ON [dbo].[FactInternetSales]
REBUILD PARTITION = ALL
WITH (ONLINE= ON);
Testing this in SQL Server 2012, we are able to rebuild all partitions online without error. But what if we want to specify a specific partition instead of all partitions?
ALTER INDEX [PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber]
ON [dbo].[FactInternetSales]
REBUILD PARTITION = 1
WITH (ONLINE= ON);
Attempting this in SQL Server 2012 or earlier, you’ll see the following error message:
'ONLINE' is not a recognized ALTER INDEX REBUILD PARTITION option.
But starting with SQL Server 2014 (as of CTP1), online single partition index operations are now supported. And this is certainly a big deal for very large table maintenance scenarios where you would prefer, or indeed must break your overall maintenance into smaller pieces over a period of time. You may also want to do partition-level maintenance for only those partitions that actually require it – for example, those partitions that actually exceed a specific fragmentation level.
To test out this SQL Server 2014 CTP1 functionality I used the AdventureWorksDW2012 with a version of FactInternetSales that contains 61,847,552 rows, and partitioned by the ShipDate column.
Rebuilding all partitions online for the table using PARTITION = ALL
in my test environment took 3 minutes and 23 seconds. Regarding overall duration, my tests were for indexes that weren’t all that fragmented, so the 3 minutes and 23 second duration represents an average duration over a few tests. Also keep in mind that I didn’t have competing workloads running at the time, so the online rebuild is happening without having to compete with other significant workloads against the index in question.
The query execution plan shape for the online index rebuild using PARTITION = ALL
was as follows:
Execution plan for online rebuild of all partitions
Notice that the operations are parallel-enabled except for the Constant Scan operator. In the query execution plan you can see 39 rows in the outer reference Constant Scan that are being passed to the Distribute Streams operator and then driving the Nested Loop.
The significance of the 39 rows? The following query validates the maximum partition number count from sys.dm_db_partition_stats
. For my test environment, the result was 39 for the maximum partition number, matching what I saw for the Constant Scan actual rows:
SELECT MAX([partition_number]) AS [max_partition_number]
FROM [sys].[dm_db_partition_stats]
WHERE [object_id] = OBJECT_ID('FactInternetSales');
Now you’ll also notice the Online Index Insert operator in the previous plan. Removing the ONLINE = ON
option from my ALTER INDEX REBUILD
(making it an offline operation), and keeping the PARTITION = ALL
option, the only change was having an “Index Insert” operator instead of an “Online Index Insert” in the query execution plan – and also a reduction in duration, where my test showed a 1 minute and 9 seconds execution duration compared to the online 3 minutes and 23 seconds.
I then tested an online rebuild of one partition with 5,678,080 rows in it instead (remember the total table row count is 61,847,552 rows). For this test, the overall duration took exactly 1 minute and had the following query execution plan shape:
Execution plan for online rebuild of a single partition
The first observation is that this is a serial plan. Also note that I said I picked one partition out of the original 39, although that particular partition did represent ~ 9% of rows in the table overall. Also notice that the Constant Scan shows 1 row instead of 39, as I would expect.
What about the duration of a single partition, offline rebuild? In my test environment, this took 11 seconds compared to the online rebuild 1 minute. The query execution plan shape for the offline rebuild of a single partition was as follows:
Execution plan for offline rebuild of a single partition
Notice there is no Constant Scan or associated Nested Loops process and also notice that this plan now has parallel operators in it vs. the previous serial plan, even though they are both doing a Clustered Index Scan for 5,678,080 rows. Also doing a keyword search of “partition” in the XML plan text for the single partition offline parallel index operation didn’t result in any matches – compared to the serial plan, online single partition index operation which had Partitioned = “true” for the Clustered Index Scan and Online Index Insert physical operators.
Back to the main exploration…
Can I pick a few, but not all partitions in a single execution? Unfortunately not.
The ALTER INDEX
and ALTER TABLE
commands have the PARTITION = ALL
argument and then PARTITION = <partition number>
argument, but not the ability to list multiple partitions for a single rebuild operation. I’m not complaining too loudly about this though, as I’m happy to have the ability to rebuild a single partition online and it isn’t terribly complicated to execute the operation once for each rebuild, however the cumulative impact to duration was something I wanted to explore further.
How long would it take to rebuild all 39 partitions separately and online versus the PARTITION = ALL
duration of 3 minutes and 23 seconds?
We know that a benefit of online rebuilds is the ability to still access the associated table or index during the index operation. But in exchange for that online operation, we’ll lose the performance edge of the rebuild compared to an offline rebuild. And what I wanted to know next was how a one-by-one partition online rebuild would perform versus the PARTITION = ALL
alternative.
Executing 39 separate rebuild operations (one rebuild for each unique partition), the total execution duration was 9 minutes and 54 seconds compared to the PARTITION = ALL
which took 3 minutes and 23 seconds, so clearly the piecemeal approach is cumulatively not as fast as an online rebuild of all partitions in one statement. While I was able to do one partition at a time, the over-arching benefit is the ability to break apart our maintenance activities over time and keep access to the objects as they are being rebuilt, but if you’re looking for a shorter rebuild window, offline options are still the fastest, followed by online for PARTITION = ALL
and then in last place, doing one partition at a time.
The following table recaps the duration comparisons – and again, these tests were based on SQL Server 2014 CTP1 and a very specific table size and VM guest configuration, so pay more attention to the relative durations across tests rather than the durations themselves:
Test Description | Duration |
---|---|
Offline rebuild of all partitions | 1:09 |
Online rebuild of all partitions | 3:23 |
Online rebuild of one partition | 1:00 |
Offline rebuild of one partition | 0:11 |
Online rebuild of all partitions, one partition at a time | 9:54 |
Now there are other aspects to explore on this subject as well. Just because an operation is online doesn’t mean that there aren’t a few moments (or longer) where locks are still be held on the targeted object. Index operations still have locking behavior for online operations – and SQL Server 2014 has provided options for this as well which I’ll explore in a separate post.
Joe, great post. What about rebuilding the partitions in parallel, can you test that (or maybe parallel in batches of 5 if 39 windows takes too much effort)
Hi Denis,
I set up jobs for each partition so that I could kick them off via sp_start_job (leveraging the async nature of that, so everything nearly kicks off within the same timeframe).
Out of the 39 jobs, one succeeded and the other 38 failed with:
Could not proceed with index DDL operation on table 'FactInternetSales' because it conflicts with another concurrent operation that is already in progress on the object. The concurrent operation could be an online index operation on the same object or another concurrent operation that moves index pages like DBCC SHRINKFILE. [SQLSTATE 42000] (Error 1912). The step failed.
:-)
Thanks for the follow up Joe
looking forward to your article on locking behavior of online operations.
thanks for the great write up!
Thanks Theron! That one is arriving next week.