Aaron Bertrand

Partitioning on a Budget

December 4, 2015 by in SQL Indexes | 6 Comments
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

Last year, I presented a solution to simulate Availability Group readable secondaries without investing in Enterprise Edition. Not to stop people from buying Enterprise Edition, as there are a lot of benefits outside of AGs, but more so for those that have no chance of ever having Enterprise Edition in the first place:

I try to be a relentless advocate for the Standard Edition customer; it is almost a running joke that surely – given the number of features it gets in each new release – that edition as a whole is on the deprecation path. In private meetings with Microsoft I have pushed for features to also be included in Standard Edition, especially with features that are much more beneficial to small businesses than those with unlimited hardware budget.

Enterprise Edition customers enjoy the manageability and performance benefits offered by table partitioning, but this feature is not available in Standard Edition. An idea struck me recently that there is a way to achieve at least some of partitioning's upsides on any edition, and it doesn't involve partitioned views. This is not to say that partitioned views are not a viable option worth considering; these are described well by others, including Daniel Hutmacher (Partitioned views over table partitioning) and Kimberly Tripp (Partitioned Tables v. Partitioned Views–Why are they even still around?). My idea is just a little simpler to implement.

Your New Hero: Filtered Indexes

Now, I know, this feature is a four-letter word to some; before you go any further, you should be happily comfortable with filtered indexes, or at least aware of their limitations. Some reading to give you some fair balance before I try to sell you on them:

Read all those? And you're still here? Great.

The TL;DR of this is that you can use filtered indexes to keep all of your "hot data" in a separate physical structure, and even on separate underlying hardware (you may have a fast SSD or PCIe drive available, but it can't hold the whole table).

A Quick Example

There are many use cases where a portion of the data is queried much more frequently than the rest – think of a retail store managing orders, a bakery scheduling wedding cake deliveries, or a football stadium measuring attendance and concession data. In these cases, most or all of the everyday query activity is concerned with "current" data.

Let's keep it simple; we'll create a database with a very narrow Orders table:

CREATE DATABASE PoorManPartition;
GO

USE PoorManPartition;
GO

CREATE TABLE dbo.Orders
(
  OrderID    INT IDENTITY(1,1) PRIMARY KEY,
  OrderDate  DATE NOT NULL DEFAULT SYSUTCDATETIME(),
  OrderTotal DECIMAL(8,2) --, ...other columns...
);

Now, let's say you have enough space on your fast storage to keep a month of data (with plenty of headroom to account for seasonality and future growth). We can add a new filegroup, and place a data file on the fast drive.

ALTER DATABASE PoorManPartition ADD FILEGROUP HotData;
GO

ALTER DATABASE PoorManPartition ADD FILE 
(
  Name       = N'HotData',
  FileName   = N'Z:\folder\HotData.mdf', 
  Size       = 100MB,
  FileGrowth = 25MB
) 
TO FILEGROUP HotData;

Now, let's create a filtered index on our HotData filegroup, where the filter includes everything from the beginning of November 2015, and the common columns involved in time-based queries are in the key or include list:

CREATE INDEX FilteredIndex 
  ON dbo.Orders(OrderDate) 
  INCLUDE(OrderTotal)
  WHERE OrderDate >= '20151101'
    AND OrderDate <  '20151201'
  ON HotData;

We can insert a few rows and check the execution plan to be sure that covered queries can, in fact, use the index:

INSERT dbo.Orders(OrderDate) VALUES('20151001'),('20151103'),('20151127');
GO

SELECT index_id, rows
  FROM sys.partitions
  WHERE object_id = OBJECT_ID(N'dbo.Orders');

/*
  Results:

   index_id    rows
   --------    ----
          1       3
          2       2
*/

SELECT OrderID, OrderDate, OrderTotal
  FROM dbo.Orders
  WHERE OrderDate >= '20151102'
    AND OrderDate <  '20151106';

The resulting execution plan, sure enough, uses the filtered index (even though the filter predicate in the query does not match the index definition exactly):

Plan using the filtered index

Now, December 1st rolls around, and it's time to swap out our November data and replace it with December. We can just re-create the filtered index with a new filter predicate, and use the DROP_EXISTING option:

CREATE INDEX FilteredIndex 
  ON dbo.Orders(OrderDate) 
  INCLUDE(OrderTotal)
  WHERE OrderDate >= '20151201'
    AND OrderDate <  '20160101'
  WITH (DROP_EXISTING = ON)
  ON HotData;

Now, we can add a few more rows, check the partition stats, and run our previous query and a new one to check the indexes used:

INSERT dbo.Orders(OrderDate) VALUES('20151202'),('20151205');
GO

SELECT index_id, rows
  FROM sys.partitions
  WHERE object_id = OBJECT_ID(N'dbo.Orders');

/*
  Results:

   index_id    rows
   --------    ----
          1       5
          2       2
*/

SELECT OrderID, OrderDate, OrderTotal
  FROM dbo.Orders
  WHERE OrderDate >= '20151102'
    AND OrderDate <  '20151106';

SELECT OrderID, OrderDate, OrderTotal
  FROM dbo.Orders
  WHERE OrderDate >= '20151202'
    AND OrderDate <  '20151204';

In this case we get a clustered index scan with the November query:

Clustered Index Scan

(But that would be different if we had a separate, non-filtered index with OrderDate as the key.)

And I won't show it again, but with the December query, we get the same filtered index seek as before.

You could also maintain multiple indexes, one for the current month, one for the previous month, and so on, and you can just manage them separately (on December 1st you just drop the index from October, and leave November's alone, for example). You could also maintain multiple indexes of shorter or longer time spans (current and previous week, current and previous quarter), etc. The solution is pretty flexible.

Due to the limitations of filtered indexes, I will not try to push this as a perfect solution, nor a complete replacement for table partitioning or partitioned views. Switching out a partition, for example, is a metadata operation, while re-creating an index with DROP_EXISTING can have a lot of logging (and since you're not on Enterprise Edition, can't be run online). You may also find that partitioned views are more your speed - there is more work around maintaining separate physical tables and the constraints that make the partitioned view possible, but the payoff in terms of query performance might be better in some cases.

Automation

The act of re-creating the index can be automated quite easily, using a simple job that does something like this once a month (or whatever your "hot" window size is):

DECLARE @sql NVARCHAR(MAX), 
  @dt DATE = DATEADD(DAY, 1-DAY(GETDATE()), GETDATE());

SET @sql = N'CREATE INDEX FilteredIndex 
  ON dbo.Orders(OrderDate) 
  INCLUDE(OrderTotal)
  WHERE OrderDate >= ''' + CONVERT(CHAR(8), @dt, 112)  + N'''
  WITH (DROP_EXISTING = ON)
  ON HotData;';

EXEC PoorManPartition.sys.sp_executesql @sql;

You could also be creating multiple indexes months in advance, much like creating future partitions in advance - after all, the future indexes won't occupy any space until there is data relevant to their predicates. And you can just drop the indexes that were segmenting the older data that you now want to become cold.

Hindsight

After I finished this article, of course, I came across another of Kimberly Tripp's posts, that you should read before proceeding with anything I'm advocating here (and which I had read before I started):

For multiple reasons, Kimberly is much more in favor of partitioned views to implement something similar to partitioning in Standard Edition; however, for certain scenarios, the use of filtered indexes still intrigues me enough to continue with my experimentation. One of the areas where filtered indexes can be beneficial is when your "hot" data has multiple criteria - not just sliced by date, but also by other attributes (maybe you want quick queries against all orders from this month that are for a specific tier of customer or above a certain dollar amount).

Up Next...

In a future post, I'll play with this concept on a higher-end system, with some real-world volume and workload. I want to discover performance differences between this solution, a non-filtered covering index, a partitioned view, and a partitioned table. Inside a VM on a laptop with only SSDs available would probably not yield realistic or fair tests at scale.