Nov 202012
 

Primary and foreign keys are fundamental characteristics of relational databases, as originally noted in E.F. Codd’s paper, “A Relational Model of Data for Large Shared Data Banks”, published in 1970. The quote often repeated is, "The key, the whole key, and nothing but the key, so help me Codd."

Background : Primary Keys

A primary key is a constraint in SQL Server, which acts to uniquely identify each row in a table. The key can be defined as a single non-NULL column, or a combination of non-NULL columns which generates a unique value, and is used to enforce entity integrity for a table. A table can only have one primary key, and when a primary key constraint is defined for a table, a unique index is created. That index will be a clustered index by default, unless specified as a nonclustered index when the primary key constraint is defined.

Consider the Sales.SalesOrderHeader table in the AdventureWorks2012 database. This table holds basic information about a sales order, including order date and customer ID, and each sale is uniquely identified by a SalesOrderID, which is the primary key for the table. Every time a new row is added to the table, the primary key constraint (named PK_SalesOrderHeader_SalesOrderID) is checked to ensure that no row already exists with the same value for SalesOrderID.

Foreign Keys

Separate from primary keys, but very much related, are foreign keys. A foreign key is a column or combination of columns that is the same as the primary key, but in a different table. Foreign keys are used to define a relationship and enforce integrity between two tables.

To continue using the aforementioned example, the SalesOrderID column exists as a foreign key in the Sales.SalesOrderDetail table, where additional information about the sale is stored, such as product ID and price. When a new sale is added to the SalesOrderHeader table, it is not required to add a row for that sale to the SalesOrderDetail table  However, when adding a row to the SalesOrderDetail table, a corresponding row for the SalesOrderID must exist in the SalesOrderHeader table.

Conversely, when deleting data, a row for a specific SalesOrderID can be deleted at any time from the SalesOrderDetail table, but in order for a row to be deleted from the SalesOrderHeader table, associated rows from SalesOrderDetail will need to be deleted first.

Unlike primary key constraints, when a foreign key constraint is defined for a table, an index is not created by default by SQL Server. However, it's not uncommon for developers and database administrators to add them manually. The foreign key may be part of a composite primary key for the table, in which case a clustered index would exist with the foreign key as part of the clustering key. Alternatively, queries may require an index that includes the foreign key and one or more additional columns in the table, so a nonclustered index would be created to support those queries. Further, indexes on foreign keys can provide performance benefits for table joins involving the primary and foreign key, and they can impact performance when the primary key value is updated, or if the row is deleted.

In the AdventureWorks2012 database, there is one table, SalesOrderDetail, with SalesOrderID as a foreign key. For the SalesOrderDetail table, SalesOrderID and SalesOrderDetailID combine to form the primary key, supported by a clustered index. If the SalesOrderDetail table did not have an index on the SalesOrderID column, then when a row is deleted from SalesOrderHeader, SQL Server would have to verify that no rows for the same SalesOrderID value exist. Without any indexes that contain the SalesOrderID column, SQL Server would need to perform a full table scan of SalesOrderDetail.  As you can imagine, the larger the referenced table, the longer the delete will take.

An Example

We can see this in the following example, which uses copies of the aforementioned tables from the AdventureWorks2012 database that have been expanded using a script which can be found here. The script was developed by Jonathan Kehayias (blog | @SQLPoolBoy) and creates a SalesOrderHeaderEnlarged table with 1,258,600 rows, and a SalesOrderDetailEnlarged table with 4,852,680 rows. After the script was run, the foreign key constraint was added using the statements below. Note that the constraint is created with the ON DELETE CASCADE option. With this option, when an update or delete is issued against the SalesOrderHeaderEnlarged table, rows in the corresponding table(s) – in this case just SalesOrderDetailEnlarged – are updated or deleted.

In addition, the default, clustered index for SalesOrderDetailEnglarged was dropped and recreated to just have SalesOrderDetailID as the primary key, as it represents a typical design.

USE [AdventureWorks2012];
GO
 
/* remove original clustered index */
ALTER TABLE [Sales].[SalesOrderDetailEnlarged] 
  DROP CONSTRAINT [PK_SalesOrderDetailEnlarged_SalesOrderID_SalesOrderDetailID];
GO
 
/* re-create clustered index with SalesOrderDetailID only */
ALTER TABLE [Sales].[SalesOrderDetailEnlarged] 
  ADD CONSTRAINT [PK_SalesOrderDetailEnlarged_SalesOrderDetailID] PRIMARY KEY CLUSTERED
  (
    [SalesOrderDetailID] ASC
  )
  WITH
  (
     PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, 
     IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
  ) ON [PRIMARY];
GO
 
/* add foreign key constraint for SalesOrderID */
ALTER TABLE [Sales].[SalesOrderDetailEnlarged] WITH CHECK 
  ADD CONSTRAINT [FK_SalesOrderDetailEnlarged_SalesOrderHeaderEnlarged_SalesOrderID] 
  FOREIGN KEY([SalesOrderID])
  REFERENCES [Sales].[SalesOrderHeaderEnlarged] ([SalesOrderID])
  ON DELETE CASCADE;
GO
 
ALTER TABLE [Sales].[SalesOrderDetailEnlarged] 
  CHECK CONSTRAINT [FK_SalesOrderDetailEnlarged_SalesOrderHeaderEnlarged_SalesOrderID];
GO

With the foreign key constraint and no supporting index, a single delete was issued against the SalesOrderHeaderEnlarged table, which resulted in the removal one row from SalesOrderHeaderEnlarged and 72 rows from SalesOrderDetailEnlarged:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
 
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
 
USE [AdventureWorks2012];
GO
 
DELETE FROM [Sales].[SalesOrderHeaderEnlarged] WHERE [SalesOrderID] = 292104;

The statistics IO and timing information showed the following:

SQL Server parse and compile time:

CPU time = 8 ms, elapsed time = 8 ms.

Table 'SalesOrderDetailEnlarged'. Scan count 1, logical reads 50647, physical reads 8, read-ahead reads 50667, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeaderEnlarged'. Scan count 0, logical reads 15, physical reads 14, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1045 ms,  elapsed time = 1898 ms.

Using SQL Sentry Plan Explorer, the execution plan shows a clustered index scan against SalesOrderDetailEnlarged as there is no index on SalesOrderID:

Query Plan with No Index on the Foreign Key
Query Plan with No Index on the Foreign Key

The nonclustered index to support SalesOrderDetailEnlarged was then created using the following statement:

USE [AdventureWorks2012];
GO
 
/* create nonclustered index */
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetailEnlarged_SalesOrderID] ON [Sales].[SalesOrderDetailEnlarged]
(
  [SalesOrderID] ASC
)
WITH
(
  PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, 
  ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
)
ON [PRIMARY];
 
GO

Another delete was executed for a SalesOrderID that affected one row in SalesOrderHeaderEnlarged and 72 rows in SalesOrderDetailEnlarged:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
 
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
 
USE [AdventureWorks2012];
GO
 
DELETE FROM [Sales].[SalesOrderHeaderEnlarged] WHERE [SalesOrderID] = 697505;

The statistics IO and timing information showed a dramatic improvement:

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 7 ms.

Table 'SalesOrderDetailEnlarged'. Scan count 1, logical reads 48, physical reads 13, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeaderEnlarged'. Scan count 0, logical reads 15, physical reads 15, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms,  elapsed time = 27 ms.

And the query plan showed an index seek of the nonclustered index on SalesOrderID, as expected:

Query Plan with Index on the Foreign Key
Query Plan with Index on the Foreign Key

The query execution time dropped from 1898 ms to 27 ms – a 98.58% reduction, and reads for the SalesOrderDetailEnlarged table decreased from 50647 to 48 – a 99.9% improvement. Percentages aside, consider the I/O alone generated by the delete. The SalesOrderDetailEnlarged table is only 500 MB in this example, and for a system with 256 GB of available memory, a table taking up 500 MB in the buffer cache doesn’t seem like a terrible situation. But a table of 5 million rows is relatively small; most large OLTP systems have tables with hundreds of millions rows. In addition, it is not uncommon for multiple foreign key references to exist for a primary key, where a delete of the primary key requires deletes from multiple related tables. In that case, it is possible to see extended durations for deletes which is not only a performance issue, but a blocking issue as well, depending on isolation level.

Conclusion

It is generally recommended to create an index which leads on the foreign key column(s), to support not only joins between the primary and foreign keys, but also updates and deletes. Note that this is a general recommendation, as there are edge case scenarios where the additional index on the foreign key was not used due to extremely small table size, and the additional index updates actually negatively impacted performance. As with any schema modifications, index additions should be tested and monitored after implementation. It is important to ensure that the additional indexes produce the desired effects and do not negatively impact solution performance. It is also worth noting how much additional space is required by the indexes for the foreign keys. This is essential to consider before creating the indexes, and if they do provide a benefit, must be considered for capacity planning going forward.

  12 Responses to “The Benefits of Indexing Foreign Keys”

  1. Excellent post Erin. Very clear and concise. Thanks.

  2. We had a detailed discussion on this here: http://sqlblog.com/blogs/greg_low/archive/2008/07/29/indexing-foreign-keys-should-sql-server-do-that-automatically.aspx
    I've come to the conclusion that SQL Server would be better if it automatically indexed foreign keys unless you used the WITH I_KNOW_WHAT_IM_DOING option. Too many problems are caused by the lack of these indexes.

    • I remember reading that Greg, thanks for including the link! I think an option (say, instance-level) to automatically index foreign keys would be an interesting enhancement for the SQL Server to consider. Thanks for your comment!

  3. Very nice and detailed sample on this common indexing problem. Want to read more :)

  4. Nice post, Erin.
    I'm always creating FK-indexes ( exactly matching the PK they refer to (col-order/sort) ) on all dbs that have FK-constraints until it's proven they hurt the system.
    Then the will be disabled ( to keep it documented they have been in place and were disabled for a specific reason )

    Good job on the article. :-)

    • Thanks Johan! I see why you keep the indexes around, but disabled. But, question for you…do you ever worry about the size the disabled FK-index consumes? Obviously it won't affect performance, but just wondering if you've ever found that they take up too much space.

      • Ha, yes, indeed they keep their space, wasted in the db and of course included with every full backup.
        Should we submit a connect item for that ? If it's disabled, just handle the index like a table is being handled using a truncate statement ?

        I've not been fortunate enough to work with TB db systems, as the matter of facts, I've only been asked to "remove" the FK-index on very few cases and those cases weren't on our largest db systems.
        It took me some time ( couple of years ) to persuade the DBA of our ERP-software vendor to actually implement FKs and the corresponding indexes to support their data system. That actually got their software that much performance gain, they didn't drop any of the FK-indexes at all. :-)

  5. Yep, same here. I suggest people always create them during the design phase and only disable them when there is real proof that they are doing more harm than good.

    As part of any health check, we look for a number of "code smells". One of these is any declared foreign keys where the key columns aren't the left-most columns in at least one non-clustered index.

  6. [...] The Benefits of Indexing Foreign Keys:http://www.sqlperformance.com/2012/11/t-sql-queries/benefits-indexing-foreign-keys [...]

 Leave a Reply

(required)

(required)