Aaron Bertrand

Minimizing impact of widening an IDENTITY column – part 2

Pragmatic Works Software is now part of SentryOne
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Register to Download

Featured Author

Jonathan Kehayias is a Principal Consultant with SQLskills and the youngest MCM ever.

Jonathan’s Posts

In the first part of this series, I showed what happens to a physical page when changing an IDENTITY column from an int to a bigint. To keep things simple, I created a very simple heap with no indexes or constraints. Unfortunately, most of us don't have that kind of luxury – an important table that needs to change but can't simply be re-created from scratch likely has multiple attributes standing directly in our way. In this post, I wanted to show the more common ones, without even getting into exotic things like In-Memory OLTP and Columnstore.

Primary Key

Hopefully all of your tables have a primary key; if the IDENTITY column is involved, however, it won't be so easy to alter the underlying data type. Take these simple examples, both clustered and nonclustered primary keys:

CREATE TABLE dbo.Test1
(
  ID INT IDENTITY(1,1),
  CONSTRAINT PK_1 PRIMARY KEY NONCLUSTERED (ID)
);
 
CREATE TABLE dbo.Test2
(
  ID INT IDENTITY(1,1),
  CONSTRAINT PK_2 PRIMARY KEY CLUSTERED (ID)
);

If I try to change the column:

ALTER TABLE dbo.Test1 ALTER COLUMN ID BIGINT;
GO
ALTER TABLE dbo.Test2 ALTER COLUMN ID BIGINT;

I get a pair of error messages for each ALTER (just showing the first pair):

Msg 5074, Level 16, State 1
The object 'PK_1' is dependent on column 'ID'.
Msg 4922, Level 16, State 9
ALTER TABLE ALTER COLUMN ID failed because one or more objects access this column.

Summary: We will need to drop the primary key, whether or not is is clustered.
 

Indexes

First let's take a couple of tables like above, and using a unique index instead of a primary key:

CREATE TABLE dbo.Test3
(
  ID INT IDENTITY(1,1),
  INDEX IX_3 UNIQUE NONCLUSTERED (ID)
);
 
CREATE TABLE dbo.Test4
(
  ID INT IDENTITY(1,1),
  INDEX IX_4 UNIQUE CLUSTERED (ID) 
);

Running similar ALTER commands above, leads to the same error messages. This remains true even if I disable the indexes:

ALTER INDEX IX_3 ON dbo.Test3 DISABLE;
GO
ALTER INDEX IX_4 ON dbo.Test4 DISABLE;

Similar results for various other types of index combinations, such as an included column or a filter:

CREATE TABLE dbo.Test5
(
  ID INT IDENTITY(1,1),
  x CHAR(1)
);
CREATE INDEX IX_5 ON dbo.Test5(x) INCLUDE(ID);
 
CREATE TABLE dbo.Test6
(
  ID INT IDENTITY(1,1),
  x CHAR(1)
);
CREATE INDEX IX_6 ON dbo.Test6(x) WHERE ID > 0;

Summary: We will need to drop and re-create any indexes, clustered or not, that reference the IDENTITY column – in the key or the INCLUDE. If the IDENTITY column is part of the clustered index, this means all indexes, since they will all reference the clustering key by definition. And disabling them isn't enough.

Computed Columns

While this should be relatively rare, I have seen computed columns based on the IDENTITY column. For example:

CREATE TABLE dbo.Test7
(
  ID INT IDENTITY(1,1),
  NextID AS (ID + 1)
);

This time, when we try to alter, we get the same pair of errors, but with slightly different text:

Msg 5074, Level 16, State 1
The column 'NextID' is dependent on column 'ID'.
Msg 4922, Level 16, State 9
ALTER TABLE ALTER COLUMN ID failed because one or more objects access this column.

This is even true if we change the computed column definition to match the target data type:

CREATE TABLE dbo.Test8
(
  ID INT IDENTITY(1,1),
  NextID AS (CONVERT(BIGINT, ID) + 1)
);

Summary: We will need to change the definitions of computed columns, or drop them altogether.

Indexed Views

Indexed views also see their fair share of usage. Let's construct an indexed view that doesn't even reference the IDENTITY column (note no other indexes or constraints on the base table):

CREATE TABLE dbo.Test9
(
  ID INT IDENTITY(1,1),
  x CHAR(1)
);
GO
 
CREATE VIEW dbo.vTest9A
WITH SCHEMABINDING
AS
  SELECT x, c = COUNT_BIG(*)
    FROM dbo.Test9
    GROUP BY x;
GO
 
CREATE UNIQUE CLUSTERED INDEX IX_9A ON dbo.vTest9A(x);

Once again, we'll try the ALTER, and this time it succeeds. I'll confess I was surprised by this, since SCHEMABINDING is supposed to prevent any changes to the underlying table, but in this case it only applies to columns explicitly referenced in the view. If we create a slightly different view:

CREATE VIEW dbo.vTest9B
WITH SCHEMABINDING
AS
  SELECT ID, c = COUNT_BIG(*)
    FROM dbo.Test9
    GROUP BY ID;
GO
CREATE UNIQUE CLUSTERED INDEX IX_9B ON dbo.vTest9B(ID);

Now we will fail due to the column dependency:

Msg 5074, Level 16, State 1
The object 'vTest9B' is dependent on column 'ID'.
Msg 4922, Level 16, State 9
ALTER TABLE ALTER COLUMN ID failed because one or more objects access this column.

Summary: We will need to drop all indexes on any views that explicitly reference the IDENTITY column, as well as all indexes on any view that references the IDENTITY column in its clustered index.

Inbound Foreign Keys

Probably the most problematic aspect of IDENTITY primary keys is that by the very nature of surrogates the whole point is often to use this surrogate key in multiple related tables. Now, I am not about to advocate avoiding referential integrity, but it is potentially going to stand in our way a little here, too. We know from above that we can't change a column that is part of a primary key or unique constraint, and in order for another table to point here with a foreign key constraint, one of those two things has to exist. So let's say we have the following two tables:

CREATE TABLE dbo.TestParent
(
  ID INT IDENTITY(1,1),
  CONSTRAINT PK_Parent PRIMARY KEY CLUSTERED(ID)
);
GO
 
CREATE TABLE dbo.TestChild
(
  ParentID INT NOT NULL,
  CONSTRAINT FK_Parent FOREIGN KEY(ParentID) REFERENCES dbo.TestParent(ID)
);

Before we can even consider changing the column's data type, we need to drop the constraint:

ALTER TABLE dbo.TestParent DROP CONSTRAINT PK_Parent;

And of course we can't, without also dropping the foreign key constraint, because this yields the following error message:

Msg 3725, Level 16, State 0
The constraint 'PK_Parent' is being referenced by table 'TestChild', foreign key constraint 'FK_Parent'.
Msg 3727, Level 16, State 0
Could not drop constraint. See previous errors.

This error remains even if we first disable the foreign key constraint:

ALTER TABLE dbo.TestChild NOCHECK CONSTRAINT FK_Parent;

On top of this, consider that you'll need the referencing columns to change their data type too. And further, those columns likely participate in some of the above elements that might similarly prevent the change on the child tables. To get things completely copacetic and in sync, we're going to have to:

  • drop the relevant constraints and indexes on the parent table
  • drop the relevant foreign key constraints on the child tables
  • drop any indexes on child tables that reference the FK column (and deal with any relevant computed columns / indexed views)
  • alter the data type on parent and all child tables
  • re-create everything

Summary: We will need to drop incoming foreign keys and, potentially, this will have a whole slew of cascading effects. Simply disabling the foreign keys isn't enough, and wouldn't be a permanent solution anyway, because the data type will have to change in the child tables eventually as well.

Conclusion

I know it seems like we're moving slowly, and I concede that in this post I seem to be pulling away from a solution rather than toward one. I will get there, there's just a lot of information to present first, including the things that make this type of change hard. Scraped from the summaries above, we will need to:

  • drop and re-create relevant indexes on the main table
  • change or drop computed columns that involve the IDENTITY column
  • drop indexes on indexed views that reference the IDENTITY column
  • deal with inbound foreign keys that point to the IDENTITY column

Unfortunately, a lot of these things are catch-22. You can't change a column because an index relies on it, and you can't change the index until the column has changed. Wouldn't it be great if ALTER INDEX supported REBUILD WITH (ONLINE = ON, CHANGE_COLUMN (COLUMN = ID, NEW_TYPE = BIGINT))? And CASCADE_CHANGE_TO_REFERENCING_KEYS,COLUMNS,INDEXES,VIEWS,ETC? Well, it doesn't (I checked). So, we need to find ways to make these things easier. Please stay tuned for Part 3.

[ Part 1 | Part 2 | Part 3 | Part 4 ]