Testing DML Statements for In-Memory OLTP

SQL Server introduced In-Memory OLTP objects in SQL Server 2014. There were many limitations in the initial release; some have been addressed in SQL Server 2016, and it’s expected that more will be addressed in the next release as the feature continues to evolve. Thus far, adoption of In-Memory OLTP does not seem very widespread, but as the feature matures, I expect more clients will start to ask about implementation. As with any major schema or code change, I recommend thorough testing to determine if In-Memory OLTP will provide the benefits expected. With that in mind, I was interested in seeing how performance changed for very simple INSERT, UPDATE, and DELETE statements with In-Memory OLTP. I was hopeful that if I could demonstrate latching or locking as a problem with disk-based tables, then the in-memory tables would provide a solution, as they are lock- and latch-free.
I developed the following test cases:

A disk-based table with traditional stored procedures for DML.
An In-Memory table with traditional stored procedures for DML.
An In-Memory table with natively compiled procedures for DML.

I was interested in comparing performance of traditional stored procedures and natively compiled procedures, because one restriction of a natively compiled procedure is that any tables referenced must be In-Memory. While single-row, solitary modifications may be common in some systems, I often see modifications occurring within a larger stored procedure with multiple statements (SELECT and DML) accessing one or more tables. The In-Memory OLTP documentation strongly recommends using natively compiled procedures to get the most benefit in terms of performance. I wanted to understand how much it improved performance.

The Set Up

I created a database with a memory-optimized filegroup and then created three different tables in the database (one disk-based, two in-memory):

DiskTable
InMemory_Temp1
InMemory_Temp2

The DDL was nearly the same for all objects, accounting for on-disk versus in-memory where appropriate. DiskTable DDL vs. In-Memory DDL:

CREATE TABLE [dbo].[DiskTable] (
	[ID] INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, 
	[Name] VARCHAR (100) NOT NULL, [Type] INT NOT NULL,
	[c4] INT NULL, [c5] INT NULL, [c6] INT NULL, [c7] INT NULL, 
	[c8] VARCHAR(255) NULL, [c9] VARCHAR(255) NULL,	[c10] VARCHAR(255) NULL, [c11] VARCHAR(255) NULL)
ON [DiskTables];
GO

CREATE TABLE [dbo].[InMemTable_Temp1]
(
	[ID] INT IDENTITY(1,1) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=1000000), 
	[Name] VARCHAR (100) NOT NULL, [Type] INT NOT NULL,
	[c4] INT NULL, [c5] INT NULL, [c6] INT NULL, [c7] INT NULL, 
	[c8] VARCHAR(255) NULL, [c9] VARCHAR(255) NULL,	[c10] VARCHAR(255) NULL, [c11] VARCHAR(255) NULL)
WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_AND_DATA);
GO

I also created nine stored procedures – one for each table/modification combination.

DiskTable_Insert
DiskTable_Update
DiskTable_Delete
InMemRegularSP_Insert
InMemRegularSP _Update
InMemRegularSP _Delete
InMemCompiledSP_Insert
InMemCompiledSP_Update
InMemCompiledSP_Delete

Each stored procedure accepted an integer input to loop for that number of modifications. The stored procedures followed the same format, variations were just the table accessed and whether the object was natively compiled or not. The complete code to create the database and objects can be found here , with example INSERT and UPDATE statements below:

CREATE PROCEDURE dbo.[DiskTable_Inserts]
	@NumRows INT
AS
BEGIN 
  SET NOCOUNT ON;

  DECLARE @Name INT;
  DECLARE @Type INT;
  DECLARE @ColInt INT;
  DECLARE @ColVarchar VARCHAR(255)
  DECLARE @RowLoop INT = 1;

  WHILE (@RowLoop <= @NumRows)
	BEGIN

		SET @Name = CONVERT (INT, RAND () * 1000) + 1;
		SET @Type = CONVERT (INT, RAND () * 100) + 1;
		SET @ColInt = CONVERT (INT, RAND () * 850) + 1
		SET @ColVarchar = CONVERT (INT, RAND () * 1300) + 1


		INSERT INTO [dbo].[DiskTable] (
			[Name], [Type], [c4], [c5], [c6], [c7], [c8], [c9],	[c10], [c11]
			)
		VALUES (@Name, @Type, @ColInt, @ColInt + (CONVERT (INT, RAND () * 20) + 1), 
		@ColInt + (CONVERT (INT, RAND () * 30) + 1), @ColInt + (CONVERT (INT, RAND () * 40) + 1),
		@ColVarchar, @ColVarchar + (CONVERT (INT, RAND () * 20) + 1), @ColVarchar + (CONVERT (INT, RAND () * 30) + 1),
		@ColVarchar + (CONVERT (INT, RAND () * 40) + 1))
		
		SELECT @RowLoop = @RowLoop + 1
	END
END
GO

CREATE PROCEDURE [InMemUpdates_CompiledSP]
	@NumRows INT
	WITH
		NATIVE_COMPILATION,
		SCHEMABINDING
AS
BEGIN ATOMIC
	WITH
		(TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')

	DECLARE @RowLoop INT = 1;
	DECLARE @ID INT
	DECLARE @RowNum INT = @@SPID * (CONVERT (INT, RAND () * 1000) + 1)

	WHILE (@RowLoop <= @NumRows)
	BEGIN

		SELECT @ID = ID 
		FROM [dbo].[IDs_InMemTable2]
		WHERE RowNum = @RowNum

		UPDATE [dbo].[InMemTable_Temp2]
		SET [c4] = [c5] * 2
		WHERE [ID] = @ID

		SET @RowLoop = @RowLoop + 1
		SET @RowNum = @RowNum + (CONVERT (INT, RAND () * 10) + 1)

	END
END
GO

Note: The IDs_* tables were repopulated after each set of INSERTs completed, and were specific to the three different scenarios.

Testing Methodology

Testing was done using .cmd scripts which used sqlcmd to call a script which executed the stored procedure, for example:

sqlcmd -S CAP\ROGERS -i"C:\Temp\SentryOne\InMemTable_RegularDeleteSP_100.sql"
exit

I used this approach to create one or more connections to the database that would run concurrently. In addition to understanding basic changes to performance, I also wanted to examine the effect of different workloads. These scripts were initiated from a separate machine to eliminate the overhead of instantiating connections. Each stored procedure was executed 1000 times by a connection, and I tested 1 connection, 10 connections, and 100 connections (1000, 10000, and 100000 modifications, respectively). I captured performance metrics using Query Store, and also captured Wait Statistics. With Query Store I could capture average duration and CPU for each stored procedure. Wait statistics data was captured for each connection using dm_exec_session_wait_stats, then aggregated for the entire test.

I ran each test four times and then calculated the overall averages for the data used in this post. Scripts used for workload testing can be downloaded from here.

Results

As one would predict, performance with In-Memory objects was better than with disk-based objects. However, an In-Memory table with a regular stored procedure sometimes had comparable or only slightly better performance compared to a disk-based table with a regular stored procedure. Remember: I was interested in understanding whether I really needed a compiled stored procedure to get a large benefit with an in-memory table. For this scenario, I did. In all cases, the in-memory table with the natively compiled procedure had significantly better performance. The two graphs below show the same data, but with different scales for the x-axis, to demonstrate that performance for regular stored procedures that modify data degraded with more concurrent connections.

DML Performance by Test and Workload

DML Performance by Test and Workload [Modified scale]

The exception is INSERTs into the In-Memory table with the regular stored procedure. With 100 connections the average duration is over 8ms for a disk-based table, but less than 100 microseconds for the In-Memory table. The likely reason is the absence of locking and latching with the In-Memory table, and this is supported with wait statistics data:

Test	INSERT	UPDATE	DELETE
Disk Table - 1000	WRITELOG	WRITELOG	WRITELOG
InMemTable_RegularSP - 1000	WRITELOG	WRITELOG	WRITELOG
InMemTable_CompiledSP - 1000	WRITELOG	MEMORY_ALLOCATION_EXT	MEMORY_ALLOCATION_EXT
Disk Table - 10,000	WRITELOG	WRITELOG	WRITELOG
InMemTable_RegularSP - 10,000	WRITELOG	WRITELOG	WRITELOG
InMemTable_CompiledSP - 10,000	WRITELOG	WRITELOG	MEMORY_ALLOCATION_EXT
Disk Table - 100,000	PAGELATCH_EX	WRITELOG	WRITELOG
InMemTable_RegularSP - 100,000	WRITELOG	WRITELOG	WRITELOG
InMemTable_CompiledSP - 100,000	WRITELOG	WRITELOG	WRITELOG

Wait Statistics By Test

Wait statistics data is listed here based on Total Resource Wait Time (which generally also translated to highest average resource time, but there were exceptions). The WRITELOG wait type is the limiting factor in this system the majority of the time. However, the PAGELATCH_EX waits for 100 concurrent connections running INSERT statements suggests that with additional load the locking and latching behavior that exists with disk-based tables could be limiting factor. In the UPDATE and DELETE scenarios with 10 and 100 connections for the disk-based table tests, the Average Resource Wait Time was highest for locks (LCK_M_X).

Conclusion

In-Memory OLTP can absolutely provide a performance boost for the right workload. The examples tested here, though, are extremely simple, and should not be judged as reason alone to migrate to an In-Memory solution. There are multiple limitations that still exist which must be considered, and thorough testing must be done before a migration occurs (particularly because migrating to an In-Memory table is an offline process). But for the right scenario, this new feature can provide a performance boost. As long as you understand that some underlying limitations will still exist, such as transaction log speed for durable tables, though most likely in a reduced manner – regardless of whether the table exists on disk or in-memory.

9 thoughts on “Testing DML Statements for In-Memory OLTP”

Robert L Davis says:

June 30, 2017 at 2:41 PM

Thanks for the article! I definitely need to test this out with my in-memory workloads.
Paul Brewer says:

June 30, 2017 at 9:08 PM

I tried to do something similar to this in 2015 using a T-SQL interpretation on Conways 'Game of Life' https://paulbrewer.wordpress.com/2015/06/27/hekaton-the-good-the-bad-and-the-ugly/

The T-SQL is here – https://paulbrewer.wordpress.com/ca_stress_sql2014/

it was hard work with all the limitations, no outer joins, no CTE's and so on.

Your article here is very good.

Paul
James says:

July 3, 2017 at 6:25 PM

Thanks for creating this and sharing the execution time and wait stat data. I'm also curious, did CPU usage reach 100% during some of the tests?
Erin Stellato says:

July 3, 2017 at 6:55 PM

Robert – will be interested in what you find out – definitely let me know! I'd like to develop some more complex tests. Just need to think of good real-world examples!
Erin Stellato says:

July 3, 2017 at 6:57 PM

Thanks for reading Paul, and great post!
Erin Stellato says:

July 3, 2017 at 7:07 PM

James – great question. I had captured PerfMon during the tests, but organizing that data with the Query Store and Wait Stats data wasn't easy, so I didn't include it here. But yes, during the 100 concurrent connections, for the DELETEs, CPU hit 100 for both disk-based tables and in-memory tables with regular stored procedures. For that same test, in-memory tables with natively compiled SPs did not get close to 100.
gotqn says:

July 5, 2017 at 12:32 PM

I guess inserting one row several times (the insertion of row in loop) is not so real scenario, because in real life if someone is inserting many rows, he/she will try to make the insertion in batch.

I have tested a simple insertion of one row using SP – compiled vs ordinary and get no differences. I used the Adam Machanic SQL Query Stress to simulate 200 threads calling the SP 50 times (10 000 call of the SP) and cannot tell I get something better using the in-memory OLTP. This was strange having the fact in the official documentation is said that "Systems which must processes large numbers of SQL INSERTs concurrently are excellent candidates for the OLTP features.".
Aaron Bertrand says:

July 6, 2017 at 2:02 AM

…not necessarily.

First, a lot of applications, frameworks (think EF), and even developers, won't know to operate in batches.

Second, it is pretty common to have applications with high concurrency where many different users are inserting individual rows. It is tough to simulate that in T-SQL but the same kind of waiting between singleton operations will still apply.

As for your specific tests, I'm not quite sure. In-Memory OLTP can be good for *some* scenarios and workloads, but not all.
Erin Stellato says:

July 6, 2017 at 2:36 AM

I'd be interested in seeing your tests for the insertion using Query Stress, and how you measured performance. In all the testing I've done, compiled SPs significantly outperform regular SPs, so I'm very curious to see how your scenario doesn't generate that result.

Comments are closed.