Joe Sack

Exploring SQL Server 2014 SELECT INTO Parallelism

SQL Sentry Essentials
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Register to Download

Featured Author

Erin Stellato is a Principal Consultant with SQLskills and a Microsoft Data Platform MVP.

Erin’s Posts

SQL Server 2014 CTP1 has been out for a few weeks now, and you've likely seen quite a bit of press about memory-optimized tables and updateable columnstore indexes. While these are certainly worthy of attention, in this post I wanted to explore the new SELECT … INTO parallelism improvement. The improvement is one of those ready-to-wear changes that, from the looks of it, will not require significant code changes in order to start benefiting from it. My explorations were performed using version Microsoft SQL Server 2014 (CTP1) – 11.0.9120.5 (X64), Enterprise Evaluation Edition.

Parallel SELECT … INTO

SQL Server 2014 introduces parallel-enabled SELECT ... INTO for databases and to test this feature I used the AdventureWorksDW2012 database and a version of the FactInternetSales table that had 61,847,552 rows in it (I was responsible for adding those rows; they don't come with the database by default).

Because this feature, as of CTP1, requires database compatibility level 110, for testing purposes I set the database to compatibility level 100 and executed the following query for my first test:

SELECT  [ProductKey],
        [OrderDateKey],
        [DueDateKey],
        [ShipDateKey],
        [CustomerKey],
        [PromotionKey],
        [CurrencyKey],
        [SalesTerritoryKey],
        [SalesOrderNumber],
        [SalesOrderLineNumber],
        [RevisionNumber],
        [OrderQuantity],
        [UnitPrice],
        [ExtendedAmount],
        [UnitPriceDiscountPct],
        [DiscountAmount],
        [ProductStandardCost],
        [TotalProductCost],
        [SalesAmount],
        [TaxAmt],
        [Freight],
        [CarrierTrackingNumber],
        [CustomerPONumber],
        [OrderDate],
        [DueDate],
        [ShipDate]
INTO dbo.FactInternetSales_V2
FROM dbo.FactInternetSales;

The query execution duration was 3 minutes and 19 seconds on my test VM and the actual query execution plan produced was as follows:

G1

SQL Server used a serial plan, as I expected. Notice also that my table had a nonclustered columnstore index on it which was scanned (I created this nonclustered columnstore index for use with other tests, but I'll show you the clustered columnstore index query execution plan later as well). The plan did not use parallelism and the Columnstore Index Scan used row execution mode instead of batch execution mode.

So next, I modified the database compatibility level (and note that there isn't a SQL Server 2014 compatibility level in CTP1 yet):

USE [master];
GO
ALTER DATABASE [AdventureWorksDW2012] SET COMPATIBILITY_LEVEL = 110;
GO

I dropped the FactInternetSales_V2 table and then re-executed my original SELECT ... INTO operation. This time the query execution duration was 1 minute and 7 seconds and the actual query execution plan was as follows:

G2

We now have a parallel plan and the only change I had to make was to the database compatibility level for AdventureWorksDW2012. My test VM has four vCPUs allocated to it, and the query execution plan distributed rows across four threads:

G3

The nonclustered Columnstore Index Scan, while using parallelism, didn't use batch execution mode. Instead it used row execution mode.

Here is a table to show the test results so far:

Scan Type Compatibility level Parallel SELECT … INTO Execution Mode Duration
Nonclustered Columnstore Index Scan 100 No Row 3:19
Nonclustered Columnstore Index Scan 110 Yes Row 1:07

 
So as a next test, I dropped the nonclustered columnstore index and re-executed the SELECT ... INTO query using both the database compatibility level 100 and 110.

The compatibility level 100 test took 5 minutes and 44 seconds to run, and the following plan was generated:

G4

The serial Clustered Index Scan took 2 minutes and 25 seconds longer than the serial nonclustered Columnstore Index Scan.

Using compatibility level 110, the query took 1 minute and 55 seconds to run, and the following plan was generated:

G5

Similar to the parallel nonclustered Columnstore Index Scan test, the parallel Clustered Index Scan distributed rows across four threads:

G6

The following table summarizes these two aforementioned tests:

Scan Type Compatibility level Parallel SELECT … INTO Execution Mode Duration
Clustered Index Scan 100 No Row (N/A) 5:44
Clustered Index Scan 110 Yes Row (N/A) 1:55

 
So then I wondered about the performance for a clustered columnstore index (new in SQL Server 2014), so I dropped the existing indexes and created a clustered columnstore index on the FactInternetSales table. I also had to drop the eight different foreign key constraints defined on the table before I could create the clustered columnstore index.

The discussion becomes somewhat academic, since I'm comparing SELECT ... INTO performance at database compatibility levels that didn't offer clustered columnstore indexes in the first place – nor did the earlier tests for nonclustered columnstore indexes at database compatibility level 100 – and yet it is interesting to see and compare the overall performance characteristics.

CREATE CLUSTERED COLUMNSTORE INDEX [CCSI_FactInternetSales] 
ON [dbo].[FactInternetSales] 
WITH (DROP_EXISTING = OFF);
GO

As an aside, the operation to create the clustered columnstore index on a 61,847,552 million row table took 11 minutes and 25 seconds with four available vCPUs (of which the operation leveraged them all), 4GB of RAM and virtual guest storage on OCZ Vertex SSDs. During that time the CPUs weren't pegged the entire time, but rather displayed peaks and valleys (a sampling of 60 seconds of CPU activity shown below):

G7

After the clustered columnstore index was created, I re-executed the two SELECT ... INTO tests. The compatibility level 100 test took 3 minutes and 22 seconds to run, and the plan was a serial one as expected (I'm showing the SQL Server Management Studio version of the plan since the clustered Columnstore Index Scan, as of SQL Server 2014 CTP1, is not yet fully recognized by Plan Explorer):

G8

Next I changed the database compatibility level to 110 and re-ran the test, which this time the query took 1 minute and 11 seconds and had the following actual execution plan:

G9

The plan distributed rows across four threads, and just like the nonclustered columnstore index, the execution mode of the clustered Columnstore Index Scan was row and not batch.

The following table summarizes all the tests within this post (in order of duration, low to high):

Scan Type Compatibility level Parallel SELECT … INTO Execution Mode Duration
Nonclustered Columnstore Index Scan 110 Yes Row 1:07
Clustered Columnstore Index Scan 110 Yes Row 1:11
Clustered Index Scan 110 Yes Row (N/A) 1:55
Nonclustered Columnstore Index Scan 100 No Row 3:19
Clustered Columnstore Index Scan 100 No Row 3:22
Clustered Index Scan 100 No Row (N/A) 5:44

 
A few observations:

  • I'm not sure if the difference between a parallel SELECT ... INTO operation against a nonclustered columnstore index versus clustered columnstore index is statistically significant. I'd need to do more tests, but I think I would wait to perform those until RTM.
  • I can safely say that the parallel SELECT ... INTO did significantly outperform the serial equivalents across a clustered index, nonclustered columnstore and clustered columnstore index tests.

It is worth mentioning that these results are for a CTP version of the product, and my tests should be viewed as something that could shift in behavior by RTM – so I was less interested in the standalone durations versus how those durations compared between serial and parallel conditions.

Some performance features require significant refactoring – but for the SELECT ... INTO improvement, all I had to do was bump up the database compatibility level in order to start seeing the benefits, which is definitely something I appreciate.