Itzik Ben-Gan

NULL complexities – Part 4, Missing standard unique constraint

March 11, 2020 by in T-SQL Queries | No Comments
SentryOne Monitor : Save 40% for a Limited Time!
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Register to Download

Featured Author

Jonathan Kehayias is a Principal Consultant with SQLskills and the youngest MCM ever.

Jonathan’s Posts

This article is Part 4 in a series about NULL complexities. In the previous articles (Part 1, Part 2, and Part 3), I covered the meaning of the NULL as a marker for a missing value, how NULLs behave in comparisons and in other query elements, and standard NULL handling features that are not yet available in T-SQL. This month I cover the difference between the way a unique constraint is defined in the ISO/IEC SQL standard and the way it works in T-SQL. I'll also provide customized solutions that you can implement if you need the standard functionality.

Standard UNIQUE constraint

SQL Server handles NULLs just like non-NULL values for the purpose of enforcing a unique constraint. That is, a unique constraint on T is satisfied if and only if there do not exist two rows R1 and R2 of T such that R1 and R2 have the same combination of NULLs and non-NULL values in the unique columns. For example, suppose that you define a unique constraint on col1, which is a NULLable column of an INT datatype. An attempt to modify the table in a way that would result in more than one row with a NULL in col1 will be rejected, just like a modification that would result in more than one row with the value 1 in col1 will be rejected.

NULL is the empty set.

Suppose that you define a composite unique constraint on the combination of NULLable INT columns col1 and col2. An attempt to modify the table in a way that would result in more than one occurrence of any of the following combinations of (col1, col2) values will be rejected: (NULL, NULL), (3, NULL), (NULL, 300), (1, 100).

So as you can see, the T-SQL implementation of the unique constraint treats NULLs just like non-NULL values for the purpose of enforcement of uniqueness.

If you want to define a foreign key on some table X referencing some table Y, you must enforce uniqueness on the referenced column(s) with one of the following options:

  • Primary key
  • Unique constraint
  • Nonfiltered unique index

A primary key is not allowed on NULLable columns. Both a unique constraint (which creates an index beneath the covers) and an explicitly created unique index are allowed on NULLable columns, and enforce their uniqueness in T-SQL using the aforementioned logic. The referencing table is allowed to have rows with a NULL in the referencing column, irrespective of whether the referenced table has a row with a NULL in the referenced column. The idea is to support an optional relationship. Some rows in the referencing table could be ones that are not related to any rows in the referenced table. You'll implement this by using a NULL in the referencing column.

To demonstrate the T-SQL implementation of a unique constraint, run the following code, which creates a table called T3 with a unique constraint defined on the NULLable INT column col1, and populates it with a few sample rows:

USE tempdb;
GO
 
DROP TABLE IF EXISTS dbo.T3;
GO
 
CREATE TABLE dbo.T3(col1 INT NULL, col2 INT NULL, CONSTRAINT UNQ_T3 UNIQUE(col1));
 
INSERT INTO dbo.T3(col1, col2) VALUES(1, 100),(2, -1),(NULL, -1),(3, 300);

Use the following code to query table:

SELECT * FROM dbo.T3;

This query generates the following output:

col1        col2
----------- -----------
1           100
2           -1
NULL        -1
3           300

Attempt to insert a second row with a NULL in col1:

INSERT INTO dbo.T3(col1, col2) VALUES(NULL, 400);

This attempt is rejected and you get the following error:

Msg 2627, Level 14, State 1
Violation of UNIQUE KEY constraint 'UNQ_T3'. Cannot insert duplicate key in object 'dbo.T3'. The duplicate key value is (<NULL>).

The standard unique constraint definition is a bit different than the T-SQL version. The main difference has to do with the NULL handling. Here's the unique constraint definition from the standard:

"A unique constraint on T is satisfied if and only if there do not exist two rows R1 and R2 of T such that R1 and R2 have the same non-NULL values in the unique columns."

So, a table T with a unique constraint on col1 will allow multiple rows with a NULL in col1, but disallow multiple rows with the same non-NULL value in col1.

What's a bit trickier to explain is what happens according to the standard with a composite unique constraint. Say that you have a unique constraint defined on (col1, col2). You can have multiple rows with (NULL, NULL), but you cannot have multiple rows with (3, NULL), just like you cannot have multiple rows with (1, 100). Similarly, you cannot have multiple rows with (NULL, 300). The point is that you're not allowed to have multiple rows with the same non-NULL values in the unique columns. As for a foreign key, you can have any number of rows in the referencing table with NULLs in all referencing columns, irrespective of what exists in the referenced table. Such rows are not related to any rows in the referenced table (optional relationship). However, if you have any non-NULL value in any of the referencing columns, there must exist a row in the referenced table with the same non-NULL values in the referenced columns.

Suppose that you have a database in a platform that supports the standard unique constraint and you need to migrate that database to SQL Server. You may face issues with the enforcement of unique constraints in SQL Server if the unique columns support NULLs. Data that was considered valid in the source system may be considered invalid in SQL Server. In the following sections I'll explore a number of possible workarounds in SQL Server.

Solution 1, using filtered index or indexed view

A common workaround in T-SQL for enforcing the standard unique constraint functionality when there's only one target column involved is to use a unique filtered index that filters only the rows where the target column is not NULL. The following code drops the existing unique constraint from T3 and implements such an index:

ALTER TABLE dbo.T3 DROP CONSTRAINT UNQ_T3;
 
CREATE UNIQUE NONCLUSTERED INDEX idx_col1_notnull ON dbo.T3(col1) WHERE col1 IS NOT NULL;

Since the index filters only rows where col1 is not NULL, its UNIQUE property is enforced only on the non-NULL col1 values.

Recall that T3 already has a row with a NULL in col1. To test this solution, use the following code to add a second row with a NULL in col1:

INSERT INTO dbo.T3(col1, col2) VALUES(NULL, 400);

This code runs successfully.

Recall that T3 already has a row with the value 1 in col1. Run the following code to attempt to add a second row with 1 in col1:

INSERT INTO dbo.T3(col1, col2) VALUES(1, 500);

As expected, this attempt fails with the following error:

Msg 2601, Level 14, State 1
Cannot insert duplicate key row in object 'dbo.T3' with unique index 'idx_col1_notnull'. The duplicate key value is (1).

Use the following code to query T3:

SELECT * FROM dbo.T3;

This code generates the following output showing two rows with a NULL in col1:

col1        col2
----------- -----------
1           100
2           -1
NULL        -1
3           300
NULL        400

This solution works well when you need to enforce uniqueness on only one column, and when you don't need to enforce referential integrity with a foreign key pointing to that column.

The issue with the foreign key is that SQL Server requires a primary key or a unique constraint or a unique nonfiltered index defined on the referenced column. It doesn't work when there's only a unique filtered index defined on the referenced column. Let's try creating a table with a foreign key referencing T3.col1. First, use the following code to create the table T3:

DROP TABLE IF EXISTS dbo.T3FK;
GO
 
CREATE TABLE dbo.T3FK
(
  id INT NOT NULL IDENTITY CONSTRAINT PK_T3FK PRIMARY KEY,
  col1 INT NULL, 
  col2 INT NULL, 
  othercol VARCHAR(10) NOT NULL
);

Then try running the following code in attempt to add a foreign key pointing from T3FK.col1 to T3.col1:

ALTER TABLE dbo.T3FK ADD CONSTRAINT FK_T3_T3FK
  FOREIGN KEY(col1) REFERENCES dbo.T3(col1);

This attempt fails with the following error:

Msg 1776, Level 16, State 0
There are no primary or candidate keys in the referenced table 'dbo.T3' that match the referencing column list in the foreign key 'FK_T3_T3FK'.

Msg 1750, Level 16, State 1
Could not create constraint or index. See previous errors.

At this point, drop the existing filtered index for cleanup:

DROP INDEX idx_col1_notnull ON dbo.T3;

Don't drop the table T3FK, since you will use it in later examples.

The other issue with the filtered index solution, assuming you don't need a foreign key, is that it doesn't work when you need to enforce the standard unique constraint functionality on multiple columns, for instance on the combination (col1, col2). Remember that the standard unique constraint disallows duplicate non-NULL combinations of values in the unique columns. To implement this logic with a filtered index, you need to filter only rows where any of the unique columns is not NULL. Phrased differently, you need to filter only rows that don't have NULLs in all unique columns. Unfortunately, filtered indexes allow only very simple expressions. They don't support OR, NOT or manipulation on the columns. So none of the following index definitions is currently supported:

CREATE UNIQUE NONCLUSTERED INDEX idx_customunique ON dbo.T3(col1, col2)
  WHERE col1 IS NOT NULL OR col2 IS NOT NULL;
 
CREATE UNIQUE NONCLUSTERED INDEX idx_customunique ON dbo.T3(col1, col2)
  WHERE NOT (col1 IS NULL AND col2 IS NULL);
 
CREATE UNIQUE NONCLUSTERED INDEX idx_customunique ON dbo.T3(col1, col2)
  WHERE COALESCE(col1, col2) IS NOT NULL;

The workaround in such a case is to create an indexed view based on a query that returns col1 and col2 from T3 with one of the WHERE clauses above, with a unique clustered index on (col1, col2), like so:

CREATE VIEW dbo.T3CustomUnique WITH SCHEMABINDING
AS
  SELECT col1, col2 FROM dbo.T3 WHERE col1 IS NOT NULL OR col2 IS NOT NULL;
GO
 
CREATE UNIQUE CLUSTERED INDEX idx_col1_col2 ON dbo.T3CustomUnique(col1, col2);
GO

You will be allowed to add multiple rows with (NULL, NULL) in (col1, col2), but you won't be allowed to add multiple occurrences of non-NULL combinations of values in (col1, col2), such as (3, NULL) or (NULL, 300) or (1, 100). Still, this solution does not support a foreign key.

At this point, run the following code for cleanup:

DROP VIEW IF EXISTS dbo.T3CustomUnique;

Solution 2, using surrogate key and computed column

The solutions with the filtered index and the indexed view are good as long as you don't need to support a foreign key. But what if you do need to enforce referential integrity? One option is to keep using the filtered index or indexed view solution to enforce uniqueness, and use triggers to enforce referential integrity. However, this option is quite expensive.

Another option is to use a completely different solution for the uniqueness part that does support a foreign key. The solution involves adding two columns to the referenced table (T3 in our case). One column called id is a surrogate key with an identity property. Another column called flag is a persisted computed column that returns id when col1 is NULL and 0 when it's not NULL. You then enforce a unique constraint on the combination of col1 and flag. Here's the code to add the two columns and the unique constraint:

ALTER TABLE dbo.T3
  ADD id INT NOT NULL IDENTITY,
      flag AS CASE WHEN col1 IS NULL THEN id ELSE 0 END PERSISTED,
      CONSTRAINT UNQ_T3_col1_flag UNIQUE(col1, flag);

Use the following code to query T3:

SELECT * FROM dbo.T3;

This code generates the following output:

col1        col2        id          flag
----------- ----------- ----------- -----------
1           100         1           0
2           -1          2           0
NULL        -1          3           3
3           300         4           0
NULL        400         5           5

As for the referencing table (T3FK in our case), you add a computed column called flag that is always set to 0, and a foreign key defined on (col1, flag) pointing to T3's unique columns (col1, flag), like so:

ALTER TABLE dbo.T3FK
  ADD flag AS 0 PERSISTED,
      CONSTRAINT FK_T3_T3FK
        FOREIGN KEY(col1, flag) REFERENCES dbo.T3(col1, flag);

Let's test this solution.

Try to add the following rows:

INSERT INTO dbo.T3FK(col1, col2, othercol) VALUES
  (1, 100, 'A'),
  (2, -1, 'B'),
  (3, 300, 'C');

These rows are added successfully, as they should, since all have corresponding referenced rows.

Query the table T3FK:

SELECT * FROM dbo.T3FK;

You get the following output:

id          col1        col2        othercol   flag
----------- ----------- ----------- ---------- -----------
1           1           100         A          0
2           2           -1          B          0
3           3           300         C          0

Try to add a row that doesn't have a corresponding row in the referenced table:

INSERT INTO dbo.T3FK(col1, col2, othercol) VALUES
  (4, 400, 'D');

The attempt is rejected, as it should be, with the following error:

Msg 547, Level 16, State 0
The INSERT statement conflicted with the FOREIGN KEY constraint "FK_T3_T3FK". The conflict occurred in database "TSQLV5", table "dbo.T3".

Try adding a row to T3FK with a NULL in col1:

INSERT INTO dbo.T3FK(col1, col2, othercol) VALUES
  (NULL, NULL, 'E');

This row is considered to not be related to any row in T3FK (optional relationship) and, according to the standard, should be allowed irrespective of whether a NULL exists in the referenced table in col1. T-SQL does support this scenario, and the row is added successfully.

Query the table T3FK:

SELECT * FROM dbo.T3FK;

This code generates the following output:

id          col1        col2        othercol   flag
----------- ----------- ----------- ---------- -----------
1           1           100         A          0
2           2           -1          B          0
3           3           300         C          0
5           NULL        NULL        E          0

The solution works well when you need to enforce the standard uniqueness functionality on a single column. But it has a problem when you need to enforce uniqueness on multiple columns. To demonstrate the problem, first drop the tables T3 and T3FK:

DROP TABLE IF EXISTS dbo.T3FK, dbo.T3;

Use the following code to recreate T3 with a composite unique constraint on (col1, col2, flag):

CREATE TABLE dbo.T3
(
  col1 INT NULL,
  col2 INT NULL,
  id INT NOT NULL IDENTITY,
  flag AS CASE WHEN col1 IS NULL AND col2 IS NULL THEN id ELSE 0 END PERSISTED,
  CONSTRAINT UNQ_T3 UNIQUE(col1, col2, flag)
);

Notice that the flag is set to id when both col1 and col2 are NULLs and 0 otherwise.

The unique constraint itself works well.

Run the following code to add a few rows to T3, including multiple occurrences of (NULL, NULL) in (col1, col2):

INSERT INTO dbo.T3(col1, col2) VALUES(1, 100),(1, 200),(NULL, NULL),(NULL, NULL);

These rows are added successfully as they should.

Try adding two occurrences of (1, NULL) in (col1, col2):

INSERT INTO dbo.T3(col1, col2) VALUES(1, NULL),(1, NULL);

This attempt fails as it should with the following error:

Msg 2627, Level 14, State 1
Violation of UNIQUE KEY constraint 'UNQ_T3'. Cannot insert duplicate key in object 'dbo.T3'. The duplicate key value is (1, <NULL>, 0).

Try adding two occurrences of (NULL, 100) in (col1, col2):

INSERT INTO dbo.T3(col1, col2) VALUES(NULL, 100),(NULL, 100);

This attempt also fails as it should with the following error:

Msg 2627, Level 14, State 1
Violation of UNIQUE KEY constraint 'UNQ_T3'. Cannot insert duplicate key in object 'dbo.T3'. The duplicate key value is (<NULL>, 100, 0).

Try adding the following two rows, where no violation should occur:

INSERT INTO dbo.T3(col1, col2) VALUES(3, NULL),(NULL, 300);

These rows are added successfully.

Query the table T3 at this point:

SELECT * FROM dbo.T3;

You get the following output:

col1        col2        id          flag
----------- ----------- ----------- -----------
1           100         1           0
1           200         2           0
NULL        NULL        3           3
NULL        NULL        4           4
3           NULL        9           0
NULL        300         10          0

So far so good.

Next, run the following code to create the table T3FK with a composite foreign key referencing T3's unique columns:

CREATE TABLE dbo.T3FK
(
  id INT NOT NULL IDENTITY CONSTRAINT PK_T3FK PRIMARY KEY,
  col1 INT NULL, 
  col2 INT NULL, 
  othercol VARCHAR(10) NOT NULL,
  flag AS 0 PERSISTED,
  CONSTRAINT FK_T3_T3FK
    FOREIGN KEY(col1, col2, flag) REFERENCES dbo.T3(col1, col2, flag)
);

This solution naturally allows adding rows to T3FK with (NULL, NULL) in (col1, col2). The problem is that it also allows adding rows a NULL in either col1 or col2, even when the other column is not NULL, and the referenced table T3 doesn't have such a key combination. For example, try adding the following row to T3FK:

INSERT INTO dbo.T3FK(col1, col2, othercol) VALUES(5, NULL, 'A');

This row is added successfully even though there's no related row in T3. According to the standard, this row should not be allowed.

Back to the drawing board…

Solution 3, using surrogate key and computed column

The problem with the previous solution (Solution 2) arises when you need to support a composite foreign key. It allows rows in the referencing table that have a NULL in at list one referencing column, even when there are non-NULL values in other referencing columns, and no related row in the referenced table. To address this, you can use a variation of the previous solution, which we will call Solution 3.

First, use the following code to drop the existing tables:

DROP TABLE IF EXISTS dbo.T3FK, dbo.T3;

In the new solution in the referenced table (T3 in our case), you still use the identity-based id surrogate key column. You also use a persisted computed column called unqpath. When all of the unique columns (col1 and col2 in our example) are NULL, you set unqpath to a character string representation of id (no separators). When any of the unique columns is not NULL, you set unqpath to a character string representation of a separated list of the unique column values using the CONCAT function. This function substitutes a NULL with an empty string. What's important is to make sure to use a separator that cannot normally appear in the data itself. For example, with integer col1 and col2 values you have only digits, so any separator other than a digit would work. In my example I'll use a dot (.). You then enforce a unique constraint on unqpath. You will never have a conflict between the unqpath value when all unique columns are NULL (set to id) versus when any of the unique columns is not NULL because in the former case unqpath does not contain a separator, and in the latter case it does. Remember that you will use Solution 3 when you have a composite key case, and likely prefer Solution 2, which is simpler, when you have a single column key case. If you want to use Solution 3 also with a single-column key and not Solution 2, just make sure that you do add the separator when the unique column is not NULL even though there's only one value involved. This way you won't have a conflict when id in a row where col1 is NULL is equal to col1 in another row, since the former will have no separator and the latter will.

Here's the code to create T3 with the aforementioned additions:

CREATE TABLE dbo.T3
(
  col1 INT NULL,
  col2 INT NULL,
  id INT NOT NULL IDENTITY,
  unqpath AS CASE WHEN col1 IS NULL AND col2 IS NULL THEN CAST(id AS VARCHAR(10)) 
                  ELSE CONCAT(CAST(col1 AS VARCHAR(11)), '.', CAST(col2 AS VARCHAR(11)))
             END PERSISTED,
  CONSTRAINT UNQ_T3 UNIQUE(unqpath)
);

Before dealing with a foreign key and the referencing table, let's test the unique constraint. Remember, it's supposed to disallow duplicate combinations of non-NULL values in the unique columns, but it's supposed to allow multiple occurrences of all-NULLs in the unique columns.

Run the following code to add a few rows, including two occurrences of (NULL, NULL) in (col1, col2):

INSERT INTO dbo.T3(col1, col2) VALUES(1, 100),(1, 200),(NULL, NULL),(NULL, NULL);

This code completes successfully as it should.

Try to add two occurrences of (1, NULL) in (col1, col2):

INSERT INTO dbo.T3(col1, col2) VALUES(1, NULL),(1, NULL);

This code fails with the following error as it should:

Msg 2627, Level 14, State 1
Violation of UNIQUE KEY constraint 'UNQ_T3'. Cannot insert duplicate key in object 'dbo.T3'. The duplicate key value is (1.).

Similarly, the following attempt is also rejected:

INSERT INTO dbo.T3(col1, col2) VALUES(NULL, 100),(NULL, 100);

You get the following error:

Msg 2627, Level 14, State 1
Violation of UNIQUE KEY constraint 'UNQ_T3'. Cannot insert duplicate key in object 'dbo.T3'. The duplicate key value is (.100).

Run the following code to add a couple more rows:

INSERT INTO dbo.T3(col1, col2) VALUES(3, NULL),(NULL, 300);

This code runs successfully as it should.

At this point, query T3:

SELECT * FROM dbo.T3;

You get the following output:

col1        col2        id          unqpath
----------- ----------- ----------- -----------------------
1           100         1           1.100
1           200         2           1.200
NULL        NULL        3           3
NULL        NULL        4           4
3           NULL        9           3.
NULL        300         10          .300

Observe the unqpath values and make sure you understand the logic behind their construction, and the difference between a case where all unique columns are NULL (no separator) versus when at least one is not NULL (separator exists).

As for the referencing table, T3FK; you also define a computed column called unqpath, but in the case where all referencing columns are NULL you set the column to NULL—not to id. When any of the referencing columns is not NULL, you construct the same separated list of values like you did in T3. You then define a foreign key on T3FK.unqpath pointing to T3.unqpath, like so:

CREATE TABLE dbo.T3FK
(
  id INT NOT NULL IDENTITY CONSTRAINT PK_T3FK PRIMARY KEY,
  col1 INT NULL, 
  col2 INT NULL, 
  othercol VARCHAR(10) NOT NULL,
  unqpath AS CASE WHEN col1 IS NULL AND col2 IS NULL THEN NULL
                  ELSE CONCAT(CAST(col1 AS VARCHAR(11)), '.', CAST(col2 AS VARCHAR(11)))
             END PERSISTED,
  CONSTRAINT FK_T3_T3FK
    FOREIGN KEY(unqpath) REFERENCES dbo.T3(unqpath)
);

This foreign key will reject rows in T3FK where any of the referencing columns is not NULL, and there's no related row in the referenced table T3, as the following attempt shows:

INSERT INTO dbo.T3FK(col1, col2, othercol) VALUES(5, NULL, 'A');

This code generates the following error:

Msg 547, Level 16, State 0
The INSERT statement conflicted with the FOREIGN KEY constraint "FK_T3_T3FK". The conflict occurred in database "TSQLV5", table "dbo.T3", column 'unqpath'.

This solution will rows in T3FK where any of the referencing columns is not NULL as long as a related row in T3 exists, as well as rows with NULLs in all referencing columns, since such rows are considered to be unrelated to any rows in T3. The following code adds such valid rows to T3FK:

INSERT INTO dbo.T3FK(col1, col2, othercol) VALUES
  (1   , 100 , 'A'),
  (1   , 200 , 'B'),
  (3   , NULL, 'C'),
  (NULL, 300 , 'D'),
  (NULL, NULL, 'E'),
  (NULL, NULL, 'F');

This code completes successfully.

Run the following code to query T3FK:

SELECT * FROM dbo.T3FK;

You get the following output:

id          col1        col2        othercol   unqpath
----------- ----------- ----------- ---------- -----------------------
2           1           100         A          1.100
3           1           200         B          1.200
4           3           NULL        C          3.
5           NULL        300         D          .300
6           NULL        NULL        E          NULL
7           NULL        NULL        F          NULL

So it took a little bit of creativity, but now you have a workaround for the standard unique constraint, including foreign key support.

Conclusion

You would think that a unique constraint is a straightforward feature, but it can get a bit tricky when you need to support NULLs in the unique columns. It gets more complex when you need to implement the standard unique constraint functionality in T-SQL, since the two use different rules in terms of how they handle NULLs. In this article I explained the difference between the two and provided workarounds that do work in T-SQL. You can use a simple filtered index when you need to enforce uniqueness on only one NULLable column, and you don't need to support a foreign key that references that column. However, if you either need to support a foreign key or a composite unique constraint with the standard functionality, you'll need a more complex implementation with a surrogate key and a computed column.