Aaron Bertrand

Hekaton with a twist: In-memory TVPs – Part 2

SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Paul White is an independent SQL Server consultant specializing in performance tuning, execution plans, and the query optimizer.

Paul’s Posts

In my last post, I demonstrated that at small volumes, a memory-optimized TVP can deliver substantial performance benefits to typical query patterns.

To test at slightly higher scale, I made a copy of the SalesOrderDetailEnlarged table, which I had expanded to roughly 5,000,000 rows thanks to this script by Jonathan Kehayias (blog | @SQLPoolBoy)).

DROP TABLE dbo.SalesOrderDetailEnlarged;
GO

SELECT * INTO dbo.SalesOrderDetailEnlarged 
  FROM AdventureWorks2012.Sales.SalesOrderDetailEnlarged; -- 4,973,997 rows

CREATE CLUSTERED INDEX PK_SODE 
  ON dbo.SalesOrderDetailEnlarged(SalesOrderID, SalesOrderDetailID);

I also created three in-memory versions of this table, each with a different bucket count (fishing for a "sweet spot") – 16,384, 131,072, and 1,048,576. (You can use rounder numbers, but they get rounded up to the next power of 2 anyway.) Example:

CREATE TABLE [dbo].[SalesOrderDetailEnlarged_InMem_16K] -- and _131K and _1MM
(
	[SalesOrderID] [int] NOT NULL,
	[SalesOrderDetailID] [int] NOT NULL,
	[CarrierTrackingNumber] [nvarchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[OrderQty] [smallint] NOT NULL,
	[ProductID] [int] NOT NULL,
	[SpecialOfferID] [int] NOT NULL,
	[UnitPrice] [money] NOT NULL,
	[UnitPriceDiscount] [money] NOT NULL,
	[LineTotal] [numeric](38, 6) NOT NULL,
	[rowguid] [uniqueidentifier] NOT NULL,
	[ModifiedDate] [datetime] NOT NULL
 PRIMARY KEY NONCLUSTERED HASH 
 (
	[SalesOrderID],
	[SalesOrderDetailID]
 ) WITH ( BUCKET_COUNT = 16384) -- and 131072 and 1048576
) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA );
GO

INSERT dbo.SalesOrderDetailEnlarged_InMem_16K
  SELECT * FROM dbo.SalesOrderDetailEnlarged;

INSERT dbo.SalesOrderDetailEnlarged_InMem_131K
  SELECT * FROM dbo.SalesOrderDetailEnlarged;

INSERT dbo.SalesOrderDetailEnlarged_InMem_1MM
  SELECT * FROM dbo.SalesOrderDetailEnlarged;
GO

Notice I changed the bucket size from the previous example (256). When building the table, you want to pick the "sweet spot" for bucket size – you want to optimize the hash index for point lookups, meaning you want as many buckets as possible with as few rows in each bucket as possible. Of course if you create ~5 million buckets (since in this case, perhaps not a very good example, there are ~5 million unique combinations of values), you will have some memory utilization and garbage collection trade-offs to deal with. However if you try to stuff ~5 million unique values into 256 buckets, you're also going to experience some problems. In any case, this discussion goes way beyond the scope of my tests for this post.

To test against the standard table, I made similar stored procedures as in the previous tests:

CREATE PROCEDURE dbo.SODE_InMemory
  @InMemory dbo.InMemoryTVP READONLY
AS
BEGIN
  SET NOCOUNT ON;

  DECLARE @tn NVARCHAR(25);

  SELECT @tn = CarrierTrackingNumber
    FROM dbo.SalesOrderDetailEnlarged AS sode
    WHERE EXISTS (SELECT 1 FROM @InMemory AS t
    WHERE sode.SalesOrderID = t.Item);
END
GO

CREATE PROCEDURE dbo.SODE_Classic
  @Classic dbo.ClassicTVP READONLY
AS
BEGIN
  SET NOCOUNT ON;

  DECLARE @tn NVARCHAR(25);

  SELECT @tn = CarrierTrackingNumber
    FROM dbo.SalesOrderDetailEnlarged AS sode
    WHERE EXISTS (SELECT 1 FROM @Classic AS t
    WHERE sode.SalesOrderID = t.Item);
END
GO

So first, to look at the plans for, say, 1,000 rows being inserted into the table variables, and then running the procedures:

DECLARE @InMemory dbo.InMemoryTVP;
INSERT @InMemory SELECT TOP (1000) SalesOrderID
  FROM dbo.SalesOrderDetailEnlarged
  GROUP BY SalesOrderID ORDER BY NEWID();

DECLARE @Classic dbo.ClassicTVP;
INSERT @Classic SELECT Item FROM @InMemory;

EXEC dbo.SODE_Classic  @Classic  = @Classic;
EXEC dbo.SODE_InMemory @InMemory = @InMemory;

This time, we see that in both cases, the optimizer has chosen a clustered index seek against the base table and a nested loops join against the TVP. Some costing metrics are different, but otherwise the plans are quite similar:

Similar plans for in-memory TVP vs. classic TVP at higher scaleSimilar plans for in-memory TVP vs. classic TVP at higher scale

Comparing seek operator costs - Classic on the left, In-Memory on the rightComparing seek operator costs – Classic on the left, In-Memory on the right

The absolute value of the costs makes it seem like the classic TVP would be a lot less efficient than the In-Memory TVP. But I wondered if this would be true in practice (especially since the Estimated Number of Executions figure on the right seemed suspect), so of course, I ran some tests. I decided to check against 100, 1,000, and 2,000 values to be sent to the procedure.

DECLARE @values INT = 100; -- 1000, 2000

DECLARE @Classic dbo.ClassicTVP;
DECLARE @InMemory dbo.InMemoryTVP;

INSERT @Classic(Item) 
SELECT TOP (@values) SalesOrderID
  FROM dbo.SalesOrderDetailEnlarged
  GROUP BY SalesOrderID ORDER BY NEWID();

INSERT @InMemory(Item) SELECT Item FROM @Classic;

DECLARE @i INT = 1;

SELECT SYSDATETIME();

WHILE @i <= 10000
BEGIN
  EXEC dbo.SODE_Classic  @Classic  = @Classic;
  SET @i += 1;
END

SELECT SYSDATETIME();

SET @i = 1;

WHILE @i <= 10000
BEGIN
  EXEC dbo.SODE_InMemory @InMemory = @InMemory;
  SET @i += 1;
END

SELECT SYSDATETIME();

The performance results show that, at larger numbers of point lookups, using an In-Memory TVP leads to slightly diminishing returns, being slightly slower every time:

Results of 10,000 executions using classic and in-memory TVPs
Results of 10,000 executions using classic and in-memory TVPs

So, contrary to the impression you may have taken from my previous post, using an in-memory TVP is not necessarily beneficial in all cases.

Earlier I also looked at natively compiled stored procedures and in-memory tables, in combination with in-memory TVPs. Could this make a difference here? Spoiler: absolutely not. I created three procedures like this:

CREATE PROCEDURE [dbo].[SODE_Native_InMem_16K] -- and _131K and _1MM
  @InMemory dbo.InMemoryTVP READONLY
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER 
AS 
  BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english');
 
  DECLARE @tn NVARCHAR(25);
 
  SELECT @tn = CarrierTrackingNumber
    FROM dbo.SalesOrderDetailEnlarged_InMem_16K AS sode -- and _131K and _1MM
    INNER JOIN @InMemory AS t -- no EXISTS allowed here
    ON sode.SalesOrderID = t.Item;
END
GO

Another spoiler: I wasn't able to run these 9 tests with an iteration count of 10,000 - it took way too long. Instead I looped through and ran each procedure 10 times, ran that set of tests 10 times, and took the average. Here are the results:

InMem2_b
Results of 10 executions using in-memory TVPs and natively compiled stored procedures

Overall, this experiment was rather disappointing. Just looking at the sheer magnitude of the difference, with an on-disk table, the average stored procedure call was completed in an average of 0.0036 seconds. However, when everything was using in-memory technologies, the average stored procedure call was 1.1662 seconds. Ouch. It is highly likely I have just chosen a poor use case to demo overall, but it seemed at the time to be an intuitive "first try."

Conclusion

There is plenty more to test around this scenario, and I have more blog posts to follow. I haven't yet identified the optimal use case for in-memory TVPs at a larger scale, but hope that this post serves as a reminder that even though a solution seems optimal in one case, it is never safe to assume it is equally applicable to different scenarios. This is exactly how In-Memory OLTP should be approached: as a solution with a narrow set of use cases that absolutely must be validated before implemented in production.