Guest Author : Bert Wagner (@bertwagner)
Join elimination is one of the many techniques that the SQL Server query optimizer uses to create efficient query plans. Specifically, join elimination occurs when SQL Server can establish equality by using query logic or trusted database constraints to eliminate unnecessary joins. See a full video version of this post on my YouTube channel.
Join Elimination In Action
The simplest way to explain join elimination is through a series of demos. For these examples I’ll be using the WideWorldImporters demo database.
To start things off, we’ll look at how join elimination works when a foreign key is present:
SELECT
il.*
FROM
Sales.InvoiceLines il
INNER JOIN Sales.Invoices i
ON il.InvoiceID = i.InvoiceID;
In this example, we are returning data only from Sales.InvoiceLines where a matching InvoiceID is found in Sales.Invoices. While you might expect the execution plan to show a join operator on the Sales.InvoiceLines and Sales.Invoices tables, SQL Server never bothers looking at Sales.Invoices at all:
SQL Server avoids joining to the Sales.Invoices table because it trusts the referential integrity maintained by the foreign key constraint defined on InvoiceID between Sales.InvoiceLines and Sales.Invoices; if a row exists in Sales.InvoiceLines, a row with the matching value for InvoiceID must exist in Sales.Invoices. And since we are only returning data from the Sales.InvoiceLines table, SQL Server doesn’t need to read any pages from Sales.Invoices at all.
We can verify that SQL Server is using the foreign key constraint to eliminate the join by dropping the constraint and running our query again:
ALTER TABLE [Sales].[InvoiceLines]
DROP CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];
With no information about the relationship between our two tables, SQL Server is forced to perform a join, scanning an index on our Sales.Invoices table to find matching InvoiceIDs.
From an I/O standpoint, SQL Server must read an extra 124 pages from an index on the Sales.Invoices table, and that’s only because it is able to use a narrow (single column) index created by a different foreign key constraint. This scenario could play out much worse on larger tables or tables that are not indexed appropriately.
Limitations
While the previous example shows the basics of how join elimination works, we need to be aware of a few caveats.
First, let’s add back our foreign key constraint:
ALTER TABLE [Sales].[InvoiceLines]
WITH NOCHECK ADD CONSTRAINT
[FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices] FOREIGN KEY([InvoiceID])
REFERENCES [Sales].[Invoices] ([InvoiceID]);
If we run our sample query again, we’ll notice that we don’t get a plan that exhibits join elimination; instead we get a plan that scans both of our joined tables.
The reason this occurs is because, when we re-added our foreign key constraint, SQL Server doesn’t know if any data has been modified in the meantime. Any new or changed data may not adhere to this constraint, so SQL Server can’t trust the validity of our data:
SELECT
f.name AS foreign_key_name
,OBJECT_NAME(f.parent_object_id) AS table_name
,COL_NAME(fc.parent_object_id, fc.parent_column_id) AS constraint_column_name
,OBJECT_NAME (f.referenced_object_id) AS referenced_object
,COL_NAME(fc.referenced_object_id, fc.referenced_column_id) AS referenced_column_name
,f.is_not_trusted
FROM
sys.foreign_keys AS f
INNER JOIN sys.foreign_key_columns AS fc
ON f.object_id = fc.constraint_object_id
WHERE
f.parent_object_id = OBJECT_ID('Sales.InvoiceLines');
To re-establish SQL Server’s trust of this constraint, we must check its validity:
ALTER TABLE [Sales].[InvoiceLines]
WITH CHECK CHECK CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];
On large tables, this operation may take some time, not to mention the overhead of SQL Server validating this data during every insert/update/delete modification going forward.
Another limitation is that SQL Server cannot eliminate joined tables when the query needs to return any data from those potential elimination candidates:
SELECT
il.*,
i.InvoiceDate
FROM
Sales.InvoiceLines il
INNER JOIN Sales.Invoices i
ON il.InvoiceID = i.InvoiceID;
Join elimination doesn’t occur in the query above because we are requesting that data from Sales.Invoices is returned, forcing SQL Server to read data from that table.
Finally, it’s important to note that join elimination will not occur when the foreign key has multiple columns, or if the tables are in tempdb. The latter is one of several reasons you shouldn’t try to solve optimization issues by copying your tables into tempdb.
Additional Scenarios
Multiple Tables
Join elimination isn’t only limited to two-table inner joins and tables with foreign key constraints.
For example, we can create an additional table that references our Sales.Invoices.InvoiceID column:
CREATE TABLE Sales.InvoiceClickTracking
(
InvoiceClickTrackingID bigint IDENTITY PRIMARY KEY,
InvoiceID int
-- other fields would go here
);
GO
ALTER TABLE [Sales].[InvoiceClickTracking] WITH CHECK
ADD CONSTRAINT [FK_Sales_InvoiceClickTracking_InvoiceID_Sales_Invoices]
FOREIGN KEY([InvoiceID])
REFERENCES [Sales].[Invoices] ([InvoiceID]);
Joining this table into our original sample query will also allow SQL Server to eliminate our Sales.Invoices table:
SELECT
il.InvoiceID,
ict.InvoiceID
FROM
Sales.InvoiceLines il
INNER JOIN Sales.Invoices i
ON il.InvoiceID = i.InvoiceID
INNER JOIN Sales.InvoiceClickTracking ict
ON i.InvoiceID = ict.InvoiceID;
SQL Server can eliminate the Sales.Invoices table because of the transitive association between these tables’ relationships.
Unique Constraints
Instead of a foreign key constraint, SQL Server will also perform join elimination if it can trust the data relationship with a unique constraint:
ALTER TABLE [Sales].[InvoiceClickTracking]
DROP CONSTRAINT [FK_Sales_InvoiceClickTracking_InvoiceID_Sales_Invoices];
GO
ALTER TABLE Sales.InvoiceClickTracking
ADD CONSTRAINT UQ_InvoiceID UNIQUE (InvoiceID);
GO
SELECT
i.InvoiceID
FROM
Sales.InvoiceClickTracking ict
RIGHT JOIN Sales.Invoices i
ON ict.InvoiceID = i.InvoiceID;
Outer Joins
As long as SQL Server can infer relationship constraints, other types of joins can experience table elimination as well. For example:
SELECT
il.InvoiceID
FROM
Sales.InvoiceLines il
LEFT JOIN Sales.Invoices i
ON il.InvoiceID = i.InvoiceID
Since we still have our foreign key constraint enforcing that every InvoiceID in Sales.InvoiceLines must have a corresponding InvoiceID in Sales.Invoices, SQL Server has no problem returning everything from Sales.InvoiceLInes without the need to join to Sales.Invoices:
No Constraint Required
If SQL Server can guarantee that it won’t need data from a certain table, it can potentially eliminate a join.
No join elimination occurs in this query because SQL Server can’t identify if the relationship between Sales.Invoices and Sales.InvoiceLines is 1-to-1, 1-to-0, or 1-to-many. It is forced to read Sales.InvoiceLines to determine if any matching rows are found:
SELECT
i.InvoiceID
FROM
Sales.InvoiceLines il
RIGHT JOIN Sales.Invoices i
ON il.InvoiceID = i.InvoiceID;
However, if we specify that we want a DISTINCT set of i.InvoiceIDs, every unique value from Sales.Invoices returns from SQL Server regardless of what relationship those rows have with Sales.InvoiceLines.
-- Just to prove no foreign key is at play here
ALTER TABLE [Sales].[InvoiceLines]
DROP CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];
GO
-- Our distinct result set
SELECT DISTINCT
i.InvoiceID
FROM
Sales.InvoiceLines il
RIGHT JOIN Sales.Invoices i
ON il.InvoiceID = i.InvoiceID;
Views
One advantage of join elimination is that it can work with views, even if the underlying view query is not able to use join elimination:
-- Add back our FK
ALTER TABLE [Sales].[InvoiceLines]
WITH CHECK ADD CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices]
FOREIGN KEY([InvoiceID])
REFERENCES [Sales].[Invoices] ([InvoiceID]);
GO
-- Create our view using a query that cannot use join elimination
CREATE VIEW Sales.vInvoicesAndInvoiceLines
AS
SELECT
i.InvoiceID,
i.InvoiceDate,
il.Quantity,
il.TaxRate
FROM
Sales.InvoiceLines il
INNER JOIN Sales.Invoices i
ON il.InvoiceID = i.InvoiceID;
GO
-- Join elimination works because we do not select any
-- columns from the underlying Sales.Invoices table
SELECT Quantity, TaxRate FROM Sales.vInvoicesAndInvoiceLines;
Conclusion
Join elimination is an optimization that SQL Server performs when it determines it can provide an accurate result set without needing to read data from all tables specified in the submitted query. This optimization can provide significant performance improvements by reducing the number of pages SQL Server has to read, however it often comes at the expense of needing to maintain certain database constraints. We can refactor queries to achieve the simpler execution plans that join elimination provides, however having the query optimizer automatically simplify our plans by removing unnecessary joins is a nice benefit.
Again, I invite you to watch the full video version of this post.