Erin Stellato

Testing DML Statements for In-Memory OLTP

SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Paul White is an independent SQL Server consultant specializing in performance tuning, execution plans, and the query optimizer.

Paul’s Posts

SQL Server introduced In-Memory OLTP objects in SQL Server 2014. There were many limitations in the initial release; some have been addressed in SQL Server 2016, and it’s expected that more will be addressed in the next release as the feature continues to evolve. Thus far, adoption of In-Memory OLTP does not seem very widespread, but as the feature matures, I expect more clients will start to ask about implementation. As with any major schema or code change, I recommend thorough testing to determine if In-Memory OLTP will provide the benefits expected. With that in mind, I was interested in seeing how performance changed for very simple INSERT, UPDATE, and DELETE statements with In-Memory OLTP. I was hopeful that if I could demonstrate latching or locking as a problem with disk-based tables, then the in-memory tables would provide a solution, as they are lock- and latch-free.
I developed the following test cases:

  1. A disk-based table with traditional stored procedures for DML.
  2. An In-Memory table with traditional stored procedures for DML.
  3. An In-Memory table with natively compiled procedures for DML.

I was interested in comparing performance of traditional stored procedures and natively compiled procedures, because one restriction of a natively compiled procedure is that any tables referenced must be In-Memory. While single-row, solitary modifications may be common in some systems, I often see modifications occurring within a larger stored procedure with multiple statements (SELECT and DML) accessing one or more tables. The In-Memory OLTP documentation strongly recommends using natively compiled procedures to get the most benefit in terms of performance. I wanted to understand how much it improved performance.

The Set Up

I created a database with a memory-optimized filegroup and then created three different tables in the database (one disk-based, two in-memory):

  • DiskTable
  • InMemory_Temp1
  • InMemory_Temp2

The DDL was nearly the same for all objects, accounting for on-disk versus in-memory where appropriate. DiskTable DDL vs. In-Memory DDL:

CREATE TABLE [dbo].[DiskTable] (
	[ID] INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, 
	[Name] VARCHAR (100) NOT NULL, [Type] INT NOT NULL,
	[c4] INT NULL, [c5] INT NULL, [c6] INT NULL, [c7] INT NULL, 
	[c8] VARCHAR(255) NULL, [c9] VARCHAR(255) NULL,	[c10] VARCHAR(255) NULL, [c11] VARCHAR(255) NULL)
ON [DiskTables];
GO

CREATE TABLE [dbo].[InMemTable_Temp1]
(
	[ID] INT IDENTITY(1,1) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=1000000), 
	[Name] VARCHAR (100) NOT NULL, [Type] INT NOT NULL,
	[c4] INT NULL, [c5] INT NULL, [c6] INT NULL, [c7] INT NULL, 
	[c8] VARCHAR(255) NULL, [c9] VARCHAR(255) NULL,	[c10] VARCHAR(255) NULL, [c11] VARCHAR(255) NULL)
WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_AND_DATA);
GO

I also created nine stored procedures – one for each table/modification combination.

  • DiskTable_Insert
  • DiskTable_Update
  • DiskTable_Delete
  • InMemRegularSP_Insert
  • InMemRegularSP _Update
  • InMemRegularSP _Delete
  • InMemCompiledSP_Insert
  • InMemCompiledSP_Update
  • InMemCompiledSP_Delete

Each stored procedure accepted an integer input to loop for that number of modifications. The stored procedures followed the same format, variations were just the table accessed and whether the object was natively compiled or not. The complete code to create the database and objects can be found here , with example INSERT and UPDATE statements below:

CREATE PROCEDURE dbo.[DiskTable_Inserts]
	@NumRows INT
AS
BEGIN 
  SET NOCOUNT ON;

  DECLARE @Name INT;
  DECLARE @Type INT;
  DECLARE @ColInt INT;
  DECLARE @ColVarchar VARCHAR(255)
  DECLARE @RowLoop INT = 1;

  WHILE (@RowLoop <= @NumRows)
	BEGIN

		SET @Name = CONVERT (INT, RAND () * 1000) + 1;
		SET @Type = CONVERT (INT, RAND () * 100) + 1;
		SET @ColInt = CONVERT (INT, RAND () * 850) + 1
		SET @ColVarchar = CONVERT (INT, RAND () * 1300) + 1


		INSERT INTO [dbo].[DiskTable] (
			[Name], [Type], [c4], [c5], [c6], [c7], [c8], [c9],	[c10], [c11]
			)
		VALUES (@Name, @Type, @ColInt, @ColInt + (CONVERT (INT, RAND () * 20) + 1), 
		@ColInt + (CONVERT (INT, RAND () * 30) + 1), @ColInt + (CONVERT (INT, RAND () * 40) + 1),
		@ColVarchar, @ColVarchar + (CONVERT (INT, RAND () * 20) + 1), @ColVarchar + (CONVERT (INT, RAND () * 30) + 1),
		@ColVarchar + (CONVERT (INT, RAND () * 40) + 1))
		
		SELECT @RowLoop = @RowLoop + 1
	END
END
GO

CREATE PROCEDURE [InMemUpdates_CompiledSP]
	@NumRows INT
	WITH
		NATIVE_COMPILATION,
		SCHEMABINDING
AS
BEGIN ATOMIC
	WITH
		(TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')

	DECLARE @RowLoop INT = 1;
	DECLARE @ID INT
	DECLARE @RowNum INT = @@SPID * (CONVERT (INT, RAND () * 1000) + 1)

	WHILE (@RowLoop <= @NumRows)
	BEGIN

		SELECT @ID = ID 
		FROM [dbo].[IDs_InMemTable2]
		WHERE RowNum = @RowNum

		UPDATE [dbo].[InMemTable_Temp2]
		SET [c4] = [c5] * 2
		WHERE [ID] = @ID

		SET @RowLoop = @RowLoop + 1
		SET @RowNum = @RowNum + (CONVERT (INT, RAND () * 10) + 1)

	END
END
GO

Note: The IDs_* tables were repopulated after each set of INSERTs completed, and were specific to the three different scenarios.

Testing Methodology

Testing was done using .cmd scripts which used sqlcmd to call a script which executed the stored procedure, for example:

sqlcmd -S CAP\ROGERS -i"C:\Temp\SentryOne\InMemTable_RegularDeleteSP_100.sql"
exit

I used this approach to create one or more connections to the database that would run concurrently. In addition to understanding basic changes to performance, I also wanted to examine the effect of different workloads. These scripts were initiated from a separate machine to eliminate the overhead of instantiating connections. Each stored procedure was executed 1000 times by a connection, and I tested 1 connection, 10 connections, and 100 connections (1000, 10000, and 100000 modifications, respectively). I captured performance metrics using Query Store, and also captured Wait Statistics. With Query Store I could capture average duration and CPU for each stored procedure. Wait statistics data was captured for each connection using dm_exec_session_wait_stats, then aggregated for the entire test.

I ran each test four times and then calculated the overall averages for the data used in this post. Scripts used for workload testing can be downloaded from here.

Results

As one would predict, performance with In-Memory objects was better than with disk-based objects. However, an In-Memory table with a regular stored procedure sometimes had comparable or only slightly better performance compared to a disk-based table with a regular stored procedure. Remember: I was interested in understanding whether I really needed a compiled stored procedure to get a large benefit with an in-memory table. For this scenario, I did. In all cases, the in-memory table with the natively compiled procedure had significantly better performance. The two graphs below show the same data, but with different scales for the x-axis, to demonstrate that performance for regular stored procedures that modify data degraded with more concurrent connections.

    DML Performance by Test and Workload
    DML Performance by Test and Workload

    DML Performance by Test and Workload [Modified scale]
    DML Performance by Test and Workload [Modified scale]

The exception is INSERTs into the In-Memory table with the regular stored procedure. With 100 connections the average duration is over 8ms for a disk-based table, but less than 100 microseconds for the In-Memory table. The likely reason is the absence of locking and latching with the In-Memory table, and this is supported with wait statistics data:

Test INSERT UPDATE DELETE
Disk Table - 1000 WRITELOG WRITELOG WRITELOG
InMemTable_RegularSP - 1000 WRITELOG WRITELOG WRITELOG
InMemTable_CompiledSP - 1000 WRITELOG MEMORY_ALLOCATION_EXT MEMORY_ALLOCATION_EXT
Disk Table - 10,000 WRITELOG WRITELOG WRITELOG
InMemTable_RegularSP - 10,000 WRITELOG WRITELOG WRITELOG
InMemTable_CompiledSP - 10,000 WRITELOG WRITELOG MEMORY_ALLOCATION_EXT
Disk Table - 100,000 PAGELATCH_EX WRITELOG WRITELOG
InMemTable_RegularSP - 100,000 WRITELOG WRITELOG WRITELOG
InMemTable_CompiledSP - 100,000 WRITELOG WRITELOG WRITELOG

Wait Statistics By Test

Wait statistics data is listed here based on Total Resource Wait Time (which generally also translated to highest average resource time, but there were exceptions). The WRITELOG wait type is the limiting factor in this system the majority of the time. However, the PAGELATCH_EX waits for 100 concurrent connections running INSERT statements suggests that with additional load the locking and latching behavior that exists with disk-based tables could be limiting factor. In the UPDATE and DELETE scenarios with 10 and 100 connections for the disk-based table tests, the Average Resource Wait Time was highest for locks (LCK_M_X).

Conclusion

In-Memory OLTP can absolutely provide a performance boost for the right workload. The examples tested here, though, are extremely simple, and should not be judged as reason alone to migrate to an In-Memory solution. There are multiple limitations that still exist which must be considered, and thorough testing must be done before a migration occurs (particularly because migrating to an In-Memory table is an offline process). But for the right scenario, this new feature can provide a performance boost. As long as you understand that some underlying limitations will still exist, such as transaction log speed for durable tables, though most likely in a reduced manner – regardless of whether the table exists on disk or in-memory.