[ Part 1 | Part 2 | Part 3 | Part 4 ]
So far in this series, I have demonstrated the direct physical impact to the page when upsizing from
bigint, and then iterated through several of the common blockers to this operation. In this post, I wanted to examine two potential workarounds: one simple, and one incredibly convoluted.
The Easy Way
I was robbed of my thunder a little bit in a comment on my previous post – Keith Monroe suggested that you could just reseed the table to the lower negative bound of the integer data type, doubling your capacity for new values. You can do this with
DBCC CHECKIDENT(N'dbo.TableName', RESEED, -2147483648);
This could work, assuming the surrogate values don't have meaning to end users (or, if they do, that users won't be freaked out by suddenly getting negative numbers). I suppose you could fool them with a view:
CREATE VIEW dbo.ViewName AS SELECT ID = CONVERT(bigint, CASE WHEN ID < 0 THEN (2147483648*2) - 1 + CONVERT(bigint, ID) ELSE ID END) FROM dbo.TableName;
This means that the user who added
ID = -2147483648 would actually see
+2147483648, the user who added
ID = -2147483647 would see
+2147483649, and so on. You would have to adjust other code though to be sure to make the reverse calculation when the user passes in that
ALTER PROCEDURE dbo.GetRowByID @ID bigint AS BEGIN SET NOCOUNT ON; DECLARE @RealID bigint; SET @RealID = CASE WHEN @ID > 2147483647 THEN @ID - (2147483648*2) + 1 ELSE @ID END; SELECT ID, @ID /*, other columns */ FROM dbo.TableName WHERE ID = @RealID; END GO
I'm not crazy about this obfuscation. At all. It's messy, misleading, and error-prone. And it encourages having visibility into surrogate keys – generally,
IDENTITY values should not be exposed to end users, so they really shouldn't care if they are customer 24, 642, -376, or much larger numbers on either side of zero.
This "solution" also assumes that you don't have code anywhere that orders by the
IDENTITY column in order to present the most recently inserted rows first, or infers that the highest
IDENTITY value must be the newest row. Code that does rely on the sort order of the
IDENTITY column, either explicitly or implicitly (which might be more than you think if it's the clustered index), will no longer present the rows in expected order – it will show all the rows created after the
RESEED, starting with the first, and then it will show all the rows created before the
RESEED, starting with the first.
The primary benefit of this approach is that it doesn't require you to change the data type, and as a result, the
RESEED change does not require any changes to indexes, constraints, or inbound foreign keys.
The downside – in addition to the code changes mentioned above, of course – is that this only buys you time in the short term. Eventually you'll exhaust all of the negative integers available, too. And don't think this doubles the useful life of the current version of the table in terms of time – in a lot of cases, data growth is accelerating, not staying constant, so you'll use up the next 2 billion rows a lot faster than the first 2 billion.
A Harder Way
Another approach you could take is to stop using an
IDENTITY column altogether; instead you could convert to using a
SEQUENCE. You could create a new
bigint column, set the default to the next value from a
SEQUENCE, update all of those values with the values from the original column (in batches if necessary), drop the original column, and rename the new column. Let's create this fictitious table and insert a single row:
CREATE TABLE dbo.SequenceDemo ( ID int IDENTITY(1,1), x char(1), CONSTRAINT PK_SD_Identity PRIMARY KEY CLUSTERED (ID) ); GO INSERT dbo.SequenceDemo(x) VALUES('x');
Next, we'll create a
SEQUENCE that starts just beyond the upper bound of an int:
CREATE SEQUENCE dbo.BeyondInt AS bigint START WITH 2147483648 INCREMENT BY 1;
Next, the changes in the table necessary to switch to using the
SEQUENCE for the new column:
BEGIN TRANSACTION; -- add a new "identity" column: ALTER TABLE dbo.SequenceDemo ADD ID2 bigint; GO -- set the new column equal to the existing identity values -- for large tables, may need to do this in batches: UPDATE dbo.SequenceDemo SET ID2 = ID; -- now make it not nullable and add the default from our SEQUENCE: ALTER TABLE dbo.SequenceDemo ALTER COLUMN ID2 bigint NOT NULL; ALTER TABLE dbo.SequenceDemo ADD CONSTRAINT DF_SD_Identity DEFAULT NEXT VALUE FOR dbo.BeyondInt FOR ID2; -- need to drop the existing PK (and any indexes): ALTER TABLE dbo.SequenceDemo DROP CONSTRAINT PK_SD_Identity; -- drop the old column and rename the new one: ALTER TABLE dbo.SequenceDemo DROP COLUMN ID; EXEC sys.sp_rename N'dbo.SequenceDemo.ID2', N'ID', 'COLUMN'; -- now put the PK back up: ALTER TABLE dbo.SequenceDemo ADD CONSTRAINT PK_SD_Identity PRIMARY KEY CLUSTERED (ID); COMMIT TRANSACTION;
In this case, the next insert would yield the following results (note that
SCOPE_IDENTITY() no longer returns a valid value):
INSERT dbo.SequenceDemo(x) VALUES('y'); SELECT Si = SCOPE_IDENTITY(); SELECT ID, x FROM dbo.SequenceDemo; /* results Si ---- NULL ID x ---------- - 1 x 2147483648 y */
If the table is large and you need to update the new column in batches instead of the above one-shot transaction, as I've described here – allowing users to interact with the table in the meantime – you'll need to have a trigger in place to override the
SEQUENCE value for any new rows that are inserted, so that they continue to match what is output to any calling code. (This also assumes that you still have some room in the integer range to continue accepting some updates; otherwise, if you've already exhausted the range, you'll have to take some downtime – or use the easy solution above in the short term.)
Let's drop everything and start over, then just add the new column:
DROP TABLE dbo.SequenceDemo; DROP SEQUENCE dbo.BeyondInt; GO CREATE TABLE dbo.SequenceDemo ( ID int IDENTITY(1,1), x char(1), CONSTRAINT PK_SD_Identity PRIMARY KEY CLUSTERED (ID) ); GO INSERT dbo.SequenceDemo(x) VALUES('x'); GO CREATE SEQUENCE dbo.BeyondInt AS bigint START WITH 2147483648 INCREMENT BY 1; GO ALTER TABLE dbo.SequenceDemo ADD ID2 bigint; GO
And here is the trigger we'll add:
CREATE TRIGGER dbo.After_SequenceDemo ON dbo.SequenceDemo AFTER INSERT AS BEGIN UPDATE sd SET sd.ID2 = sd.ID FROM dbo.SequenceDemo AS sd INNER JOIN inserted AS i ON sd.ID = i.ID; END
This time, the next insert will continue to generate rows in the lower range of integers for both columns, until all of the pre-existing values have been updated and the rest of the changes have been committed:
INSERT dbo.SequenceDemo(x) VALUES('y'); SELECT Si = SCOPE_IDENTITY(); SELECT ID, ID2, x FROM dbo.SequenceDemo; /* results Si ---- 2 ID ID2 x ---- ---- -- 1 NULL x 2 2 y */
Now, we can continue updating the existing
ID2 values while new rows continue getting inserted within the lower range:
SET NOCOUNT ON; DECLARE @r INT = 1; WHILE @r > 0 BEGIN BEGIN TRANSACTION; UPDATE TOP (10000) dbo.SequenceDemo SET ID2 = ID WHERE ID2 IS NULL; SET @r = @@ROWCOUNT; COMMIT TRANSACTION; -- CHECKPOINT; -- if simple -- BACKUP LOG ... -- if full END
Once we have updated all of the existing rows, we can continue with the rest of the changes, and then drop the trigger:
BEGIN TRANSACTION; ALTER TABLE dbo.SequenceDemo ALTER COLUMN ID2 BIGINT NOT NULL; ALTER TABLE dbo.SequenceDemo ADD CONSTRAINT DF_SD_Identity DEFAULT NEXT VALUE FOR dbo.BeyondInt FOR ID2; ALTER TABLE dbo.SequenceDemo DROP CONSTRAINT PK_SD_Identity; ALTER TABLE dbo.SequenceDemo DROP COLUMN ID; EXEC sys.sp_rename N'dbo.SequenceDemo.ID2', N'ID', 'COLUMN'; ALTER TABLE dbo.SequenceDemo ADD CONSTRAINT PK_SD_Identity PRIMARY KEY CLUSTERED (ID); DROP TRIGGER dbo.InsteadOf_SequenceDemo COMMIT TRANSACTION;
Now, the next insert will generate these values:
INSERT dbo.SequenceDemo(x) VALUES('z'); SELECT Si = SCOPE_IDENTITY(); SELECT ID, x FROM dbo.SequenceDemo; /* results Si ---- NULL ID x ---------- - 1 x 2 y 2147483648 z */
If you have code that relies on
IDENT_CURRENT(), it would also have to change, as those values are no longer be populated after an insert – though the
OUTPUT clause should continue to work correctly in most scenarios. If you need your code to continue believing the table generates an
IDENTITY value, then you could use a trigger to fake this out – however it would only be able to populate
@@IDENTITY on insert, not
SCOPE_IDENTITY(). This may still require changes, because in most cases, you don't want to rely on
@@IDENTITY for anything (so, if you're going to make changes, remove all assumptions about an
IDENTITY column at all).
CREATE TRIGGER dbo.FakeIdentity ON dbo.SequenceDemo INSTEAD OF INSERT AS BEGIN SET NOCOUNT ON; DECLARE @lowestID bigint = (SELECT MIN(id) FROM inserted); DECLARE @sql nvarchar(max) = N'DECLARE @foo TABLE(ID bigint IDENTITY(' + CONVERT(varchar(32), @lowestID) + N',1));'; SELECT @sql += N'INSERT @foo DEFAULT VALUES;' FROM inserted; EXEC sys.sp_executesql @sql; INSERT dbo.SequenceDemo(ID, x) SELECT ID, x FROM inserted; END
Now, the next insert will generate these values:
INSERT dbo.SequenceDemo(x) VALUES('a'); SELECT Si = SCOPE_IDENTITY(), Ident = @@IDENTITY; SELECT ID, x FROM dbo.SequenceDemo; /* results Si Ident ---- ----- NULL 2147483649 ID x ---------- - 1 x 2 y 2147483648 z 2147483649 a */
With this workaround, you would still need to deal with other constraints, indexes, and tables with inbound foreign keys. Local constraints and indexes are pretty straightforward, but I'll deal with the more complex situation with foreign keys in the next part of this series.
One That Won't Work, But I Wish It Would
ALTER TABLE SWITCH can be a very powerful way to make some metadata changes that are difficult to accomplish otherwise. And contrary to popular belief, this doesn't just involve partitioning, and isn't restricted to Enterprise Edition. The following code will work on Express, and is a method people have used to add or remove the
IDENTITY property on a table (again, not accounting for foreign keys and all those other pesky blockers).
CREATE TABLE dbo.WithIdentity ( ID int IDENTITY(1,1) NOT NULL ); CREATE TABLE dbo.WithoutIdentity ( ID int NOT NULL ); ALTER TABLE dbo.WithIdentity SWITCH TO dbo.WithoutIdentity; GO DROP TABLE dbo.WithIdentity; EXEC sys.sp_rename N'dbo.WithoutIdentity', N'dbo.WithIdentity', 'OBJECT';
This works because the data types and nullability match exactly, and no attention is paid to the
IDENTITY attribute. Try to mix data types, though, and things don't work so well:
CREATE TABLE dbo.SourceTable ( ID int IDENTITY(1,1) NOT NULL ); CREATE TABLE dbo.TrySwitch ( ID bigint IDENTITY(1,1) NOT NULL ); ALTER TABLE dbo.SourceTable SWITCH TO dbo.TrySwitch;
This results in:
ALTER TABLE SWITCH statement failed because column 'ID' has data type int in source table 'dbo.SourceTable' which is different from its type bigint in target table 'dbo.TrySwitch'.
It would be fantastic if a
SWITCH operation could be used in a scenario like this, where the only difference in schema didn't actually *require* any physical changes to accommodate (again, as I showed in part 1, the data is re-written onto new pages, even though there is no need to do so).
This post investigated two potential workarounds to either buy you time before changing your existing
IDENTITY column, or abandoning
IDENTITY altogether right now in favor of a
SEQUENCE. If neither of these workarounds are acceptable to you, please watch for part 4, where we'll tackle this problem head-on.
6 thoughts on “Minimizing impact of widening an IDENTITY column – part 3”
Yay for sequences! They have control over caching, can be easily modified, and can be shared between several tables. Gotta love them.
(2147483648*2) – 1 + CONVERT(bigint, ID)
This actually is of type numeric(20, 0) because 2147483648 is not an int constant. This is probably unintentional.
Thanks tobi, missed that, you can only really see that if you perform a SELECT INTO (or probably if you overflow 20 digits, which I'll confess I did not think to try to force).
Great article! Hitting this issue at work right now so really looking forward to part 4.
Thank you very much for another great article.
I think that the trigger should use MAX inserted value and not MIN:
DECLARE @HighestID bigint = (SELECT MAX(id) FROM inserted);
DECLARE @sql nvarchar(max) = N'DECLARE @foo TABLE(ID bigint IDENTITY('
+ CONVERT(varchar(32), @HighestID) + N',1)); INSERT @foo DEFAULT VALUES;';
Otherwise if you insert more than one record @@IDENTITYwill be wrong:
INSERT dbo.SequenceDemo(x) VALUES('a'),('b');
SELECT Si = SCOPE_IDENTITY(), Ident = @@IDENTITY;
SELECT ID, x FROM dbo.SequenceDemo;
Thanks Arik, you're right that it didn't handle multi-row inserts, but I solved it in a slightly different way.
Comments are closed.