Aaron Bertrand

Minimizing impact of widening an IDENTITY column – part 3

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Itzik is a T-SQL trainer, a co-founder of SolidQ, and blogs about T-SQL fundamentals and query tuning.

Itzik’s Posts

[ Part 1 | Part 2 | Part 3 | Part 4 ]

So far in this series, I have demonstrated the direct physical impact to the page when upsizing from int to bigint, and then iterated through several of the common blockers to this operation. In this post, I wanted to examine two potential workarounds: one simple, and one incredibly convoluted.

The Easy Way

I was robbed of my thunder a little bit in a comment on my previous post – Keith Monroe suggested that you could just reseed the table to the lower negative bound of the integer data type, doubling your capacity for new values. You can do this with DBCC CHECKIDENT:

DBCC CHECKIDENT(N'dbo.TableName', RESEED, -2147483648);

This could work, assuming the surrogate values don't have meaning to end users (or, if they do, that users won't be freaked out by suddenly getting negative numbers). I suppose you could fool them with a view:

CREATE VIEW dbo.ViewName
AS
  SELECT ID = CONVERT(bigint, CASE WHEN ID < 0 
    THEN (2147483648*2) - 1 + CONVERT(bigint, ID)
    ELSE ID END) 
  FROM dbo.TableName;

This means that the user who added ID = -2147483648 would actually see +2147483648, the user who added ID = -2147483647 would see +2147483649, and so on. You would have to adjust other code though to be sure to make the reverse calculation when the user passes in that ID, e.g.

ALTER PROCEDURE dbo.GetRowByID
  @ID bigint
AS
BEGIN
  SET NOCOUNT ON;
 
  DECLARE @RealID bigint;

  SET @RealID = CASE WHEN @ID > 2147483647 
    THEN @ID - (2147483648*2) + 1
    ELSE @ID END;
 
  SELECT ID, @ID /*, other columns */ 
   FROM dbo.TableName 
   WHERE ID = @RealID;
END
GO

I'm not crazy about this obfuscation. At all. It's messy, misleading, and error-prone. And it encourages having visibility into surrogate keys - generally, IDENTITY values should not be exposed to end users, so they really shouldn't care if they are customer 24, 642, -376, or much larger numbers on either side of zero.

This "solution" also assumes that you don't have code anywhere that orders by the IDENTITY column in order to present the most recently inserted rows first, or infers that the highest IDENTITY value must be the newest row. Code that does rely on the sort order of the IDENTITY column, either explicitly or implicitly (which might be more than you think if it's the clustered index), will no longer present the rows in expected order - it will show all the rows created after the RESEED, starting with the first, and then it will show all the rows created before the RESEED, starting with the first.

The primary benefit of this approach is that it doesn't require you to change the data type, and as a result, the RESEED change does not require any changes to indexes, constraints, or inbound foreign keys.

The downside - in addition to the code changes mentioned above, of course - is that this only buys you time in the short term. Eventually you'll exhaust all of the negative integers available, too. And don't think this doubles the useful life of the current version of the table in terms of time - in a lot of cases, data growth is accelerating, not staying constant, so you'll use up the next 2 billion rows a lot faster than the first 2 billion.

A Harder Way

Another approach you could take is to stop using an IDENTITY column altogether; instead you could convert to using a SEQUENCE. You could create a new bigint column, set the default to the next value from a SEQUENCE, update all of those values with the values from the original column (in batches if necessary), drop the original column, and rename the new column. Let's create this fictitious table and insert a single row:

CREATE TABLE dbo.SequenceDemo
(
  ID int IDENTITY(1,1),
  x char(1),
  CONSTRAINT PK_SD_Identity PRIMARY KEY CLUSTERED (ID)
);
GO

INSERT dbo.SequenceDemo(x) VALUES('x');

Next, we'll create a SEQUENCE that starts just beyond the upper bound of an int:

CREATE SEQUENCE dbo.BeyondInt
AS bigint
START WITH 2147483648 INCREMENT BY 1;

Next, the changes in the table necessary to switch to using the SEQUENCE for the new column:

BEGIN TRANSACTION;

-- add a new "identity" column:
ALTER TABLE dbo.SequenceDemo ADD ID2 bigint;
GO

-- set the new column equal to the existing identity values
-- for large tables, may need to do this in batches:
UPDATE dbo.SequenceDemo SET ID2 = ID;

-- now make it not nullable and add the default from our SEQUENCE:
ALTER TABLE dbo.SequenceDemo ALTER COLUMN ID2 bigint NOT NULL;
ALTER TABLE dbo.SequenceDemo ADD CONSTRAINT DF_SD_Identity DEFAULT NEXT VALUE FOR dbo.BeyondInt FOR ID2;

-- need to drop the existing PK (and any indexes):
ALTER TABLE dbo.SequenceDemo DROP CONSTRAINT PK_SD_Identity;

-- drop the old column and rename the new one:
ALTER TABLE dbo.SequenceDemo DROP COLUMN ID;
EXEC sys.sp_rename N'dbo.SequenceDemo.ID2', N'ID', 'COLUMN';

-- now put the PK back up:
ALTER TABLE dbo.SequenceDemo ADD CONSTRAINT PK_SD_Identity PRIMARY KEY CLUSTERED (ID);

COMMIT TRANSACTION;

In this case, the next insert would yield the following results (note that SCOPE_IDENTITY() no longer returns a valid value):

INSERT dbo.SequenceDemo(x) VALUES('y');
SELECT Si = SCOPE_IDENTITY();
SELECT ID, x FROM dbo.SequenceDemo;

/* results

Si
----
NULL

ID           x
----------   -
1            x
2147483648   y           */

If the table is large and you need to update the new column in batches instead of the above one-shot transaction, as I've described here - allowing users to interact with the table in the meantime - you'll need to have a trigger in place to override the SEQUENCE value for any new rows that are inserted, so that they continue to match what is output to any calling code. (This also assumes that you still have some room in the integer range to continue accepting some updates; otherwise, if you've already exhausted the range, you'll have to take some downtime - or use the easy solution above in the short term.)

Let's drop everything and start over, then just add the new column:

DROP TABLE dbo.SequenceDemo;
DROP SEQUENCE dbo.BeyondInt;
GO

CREATE TABLE dbo.SequenceDemo
(
  ID int IDENTITY(1,1),
  x char(1),
  CONSTRAINT PK_SD_Identity PRIMARY KEY CLUSTERED (ID)
);
GO

INSERT dbo.SequenceDemo(x) VALUES('x');
GO

CREATE SEQUENCE dbo.BeyondInt
AS bigint
START WITH 2147483648 INCREMENT BY 1;
GO

ALTER TABLE dbo.SequenceDemo ADD ID2 bigint;
GO

And here is the trigger we'll add:

CREATE TRIGGER dbo.After_SequenceDemo
ON dbo.SequenceDemo
AFTER INSERT
AS
BEGIN
  UPDATE sd SET sd.ID2 = sd.ID
    FROM dbo.SequenceDemo AS sd
    INNER JOIN inserted AS i
    ON sd.ID = i.ID;
END

This time, the next insert will continue to generate rows in the lower range of integers for both columns, until all of the pre-existing values have been updated and the rest of the changes have been committed:

INSERT dbo.SequenceDemo(x) VALUES('y');
SELECT Si = SCOPE_IDENTITY();
SELECT ID, ID2, x FROM dbo.SequenceDemo;

/* results

Si
----
2

ID    ID2   x
----  ----  --
1     NULL  x
2     2     y          */

Now, we can continue updating the existing ID2 values while new rows continue getting inserted within the lower range:

SET NOCOUNT ON;
 
DECLARE @r INT = 1;
 
WHILE @r > 0
BEGIN
  BEGIN TRANSACTION;
 
  UPDATE TOP (10000)
    dbo.SequenceDemo
    SET ID2 = ID WHERE ID2 IS NULL;
 
  SET @r = @@ROWCOUNT;
 
  COMMIT TRANSACTION;
 
  -- CHECKPOINT;    -- if simple
  -- BACKUP LOG ... -- if full
END

Once we have updated all of the existing rows, we can continue with the rest of the changes, and then drop the trigger:

BEGIN TRANSACTION;
ALTER TABLE dbo.SequenceDemo ALTER COLUMN ID2 BIGINT NOT NULL;
ALTER TABLE dbo.SequenceDemo ADD CONSTRAINT DF_SD_Identity DEFAULT NEXT VALUE FOR dbo.BeyondInt FOR ID2;
ALTER TABLE dbo.SequenceDemo DROP CONSTRAINT PK_SD_Identity;
ALTER TABLE dbo.SequenceDemo DROP COLUMN ID;
EXEC sys.sp_rename N'dbo.SequenceDemo.ID2', N'ID', 'COLUMN';
ALTER TABLE dbo.SequenceDemo ADD CONSTRAINT PK_SD_Identity PRIMARY KEY CLUSTERED (ID);
DROP TRIGGER dbo.InsteadOf_SequenceDemo
COMMIT TRANSACTION;

Now, the next insert will generate these values:

INSERT dbo.SequenceDemo(x) VALUES('z');
SELECT Si = SCOPE_IDENTITY();
SELECT ID, x FROM dbo.SequenceDemo;

/* results

Si
----
NULL

ID            x
----------    -
1             x
2             y
2147483648    z          */

If you have code that relies on SCOPE_IDENTITY(), @@IDENTITY, or IDENT_CURRENT(), it would also have to change, as those values are no longer be populated after an insert - though the OUTPUT clause should continue to work correctly in most scenarios. If you need your code to continue believing the table generates an IDENTITY value, then you could use a trigger to fake this out - however it would only be able to populate @@IDENTITY on insert, not SCOPE_IDENTITY(). This may still require changes, because in most cases, you don't want to rely on @@IDENTITY for anything (so, if you're going to make changes, remove all assumptions about an IDENTITY column at all).

CREATE TRIGGER dbo.FakeIdentity
ON dbo.SequenceDemo
INSTEAD OF INSERT
AS
BEGIN
  SET NOCOUNT ON;
  DECLARE @lowestID bigint = (SELECT MIN(id) FROM inserted);
  DECLARE @sql nvarchar(max) = N'DECLARE @foo TABLE(ID bigint IDENTITY(' 
    + CONVERT(varchar(32), @lowestID) + N',1));';
  SELECT @sql += N'INSERT @foo DEFAULT VALUES;' FROM inserted;
  EXEC sys.sp_executesql @sql;
  INSERT dbo.SequenceDemo(ID, x) SELECT ID, x FROM inserted;
END

Now, the next insert will generate these values:

INSERT dbo.SequenceDemo(x) VALUES('a');
SELECT Si = SCOPE_IDENTITY(), Ident = @@IDENTITY;
SELECT ID, x FROM dbo.SequenceDemo;

/* results

Si      Ident
----    -----
NULL    2147483649

ID            x
----------    -
1             x
2             y
2147483648    z
2147483649    a         */

With this workaround, you would still need to deal with other constraints, indexes, and tables with inbound foreign keys. Local constraints and indexes are pretty straightforward, but I'll deal with the more complex situation with foreign keys in the next part of this series.

One That Won't Work, But I Wish It Would

ALTER TABLE SWITCH can be a very powerful way to make some metadata changes that are difficult to accomplish otherwise. And contrary to popular belief, this doesn't just involve partitioning, and isn't restricted to Enterprise Edition. The following code will work on Express, and is a method people have used to add or remove the IDENTITY property on a table (again, not accounting for foreign keys and all those other pesky blockers).

CREATE TABLE dbo.WithIdentity
(
  ID int IDENTITY(1,1) NOT NULL
);

CREATE TABLE dbo.WithoutIdentity
(
  ID int NOT NULL
);

ALTER TABLE dbo.WithIdentity SWITCH TO dbo.WithoutIdentity;
GO

DROP TABLE dbo.WithIdentity;
EXEC sys.sp_rename N'dbo.WithoutIdentity', N'dbo.WithIdentity', 'OBJECT';

This works because the data types and nullability match exactly, and no attention is paid to the IDENTITY attribute. Try to mix data types, though, and things don't work so well:

CREATE TABLE dbo.SourceTable
(
  ID int IDENTITY(1,1) NOT NULL
);

CREATE TABLE dbo.TrySwitch
(
  ID bigint IDENTITY(1,1) NOT NULL
);

ALTER TABLE dbo.SourceTable SWITCH TO dbo.TrySwitch;

This results in:

Msg 4944, Level 16, State 1
ALTER TABLE SWITCH statement failed because column 'ID' has data type int in source table 'dbo.SourceTable' which is different from its type bigint in target table 'dbo.TrySwitch'.

It would be fantastic if a SWITCH operation could be used in a scenario like this, where the only difference in schema didn't actually *require* any physical changes to accommodate (again, as I showed in part 1, the data is re-written onto new pages, even though there is no need to do so).

Conclusion

This post investigated two potential workarounds to either buy you time before changing your existing IDENTITY column, or abandoning IDENTITY altogether right now in favor of a SEQUENCE. If neither of these workarounds are acceptable to you, please watch for part 4, where we'll tackle this problem head-on.

--

[ Part 1 | Part 2 | Part 3 | Part 4 ]