Aaron Bertrand

Performance Myths : Clustered vs. Non-Clustered Indexes

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul White is an independent SQL Server consultant specializing in performance tuning, execution plans, and the query optimizer.

Paul’s Posts

Performance Myths : Clustered vs. Non-Clustered Indexes

I was recently scolded for suggesting that, in some cases, a non-clustered index will perform better for a particular query than the clustered index. This person stated that the clustered index is always best because it is always covering by definition, and that any non-clustered index with some or all of the same key columns was always redundant.

I will happily agree that the clustered index is always covering (and to avoid any ambiguity here, we're going to stick to disk-based tables with traditional B-tree indexes).

Warehouse.StockItemTransactions column listI disagree, though, that a clustered index is always faster than a non-clustered index. I also disagree that it is always redundant to create a non-clustered index or unique constraint consisting of the same (or some of the same) columns in the clustering key.

Let's take this example, Warehouse.StockItemTransactions, from WideWorldImporters. The clustered index is implemented through a primary key on just the StockItemTransactionID column (pretty typical when you have some kind of surrogate ID generated by an IDENTITY or a SEQUENCE).

It's a pretty common thing to require a count of the whole table (though in many cases there are better ways). This can be for casual inspection or as part of a pagination procedure. Most people will do it this way:

SELECT COUNT(*) 
  FROM Warehouse.StockItemTransactions;

With the current schema, this will use a non-clustered index:

Plan for COUNT query

We know that the non-clustered index does not contain all of the columns in the clustered index. The count operation only needs to be sure that all rows are included, without caring about which columns are present, so SQL Server will usually choose the index with the smallest number of pages (in this case, the index chosen has ~414 pages).

Now let's run the query again, this time comparing it to a hinted query that forces the use of the clustered index.

SELECT COUNT(*) 
  FROM Warehouse.StockItemTransactions;
  
SELECT COUNT(*) /* with hint */
  FROM Warehouse.StockItemTransactions WITH (INDEX(1));

We get an almost identical plan shape, but we can see a huge difference in reads (414 for the chosen index vs. 1,982 for the clustered index):

Plan and runtime metrics for COUNT queries (with and without index hints)

Duration is slightly higher for the clustered index, but the difference is negligible when we're dealing with a small amount of cached data on a fast disk. That discrepancy would be much more pronounced with more data, on a slow disk, or on a system with memory pressure.

If we look at the tooltips for the scan operations, we can see that while the number of rows and estimated CPU costs are identical, the big difference comes from the estimated I/O cost (because SQL Server knows that there are more pages in the clustered index than the non-clustered index):

Tooltips for COUNT queries (with and without index hints)

We can see this difference even more clearly if we create a new, unique index on just the ID column (making it "redundant" with the clustered index, right?):

CREATE UNIQUE INDEX NewUniqueNCIIndex 
  ON Warehouse.StockItemTransactions(StockItemTransactionID);

Tooltip for scan of unique non-clustered indexRunning a similar query with an explicit index hint produces the same plan shape, but an even lower estimated I/O cost (and even lower durations) – see image at right. And if you run the original query without the hint, you'll see that SQL Server now chooses this index too.

It might seem obvious, but a lot of people would believe that the clustered index is the best choice here. SQL Server is almost always going to heavily favor whatever method will provide the cheapest way to perform all of the I/O, and in the case of a full scan, that's going to be the "skinniest" index. This can also happen with both types of seeks (singleton and range scans), at least when the index is covering.

Now, as always, that does not in any way mean that you should go and create additional indexes on all of your tables to satisfy count queries. Not only is that an inefficient way to check table size (again, see this article), but an index to support that would have to mean that you're running that query more often than you're updating the data. Remember that every index requires space on disk, space on memory, and all of the writes against the table must also touch every index (filtered indexes aside).

Summary

I could come up with many other examples that show when a non-clustered can be useful and worth the cost of maintenance, even when duplicating the key column(s) of the clustered index. Non-clustered indexes can be created with the same key columns but in a different key order, or with different ASC/DESC on the columns themselves to better support an alternate presentation order. You can also have non-clustered indexes that only carry a small subset of the rows through the use of a filter. Finally, if you can satisfy your most common queries with skinnier, non-clustered indexes, that is better for memory consumption as well.

But really, my point of this series is merely to show a counter-example that illustrates the folly of making blanket statements like this one. I'll leave you with an explanation from Paul White who, in a DBA.SE answer, explains why such a non-clustered index can in fact perform much better than a clustered index. This is true even when both use either type of seek: