The ALTER TABLE ... ALTER COLUMN
command is very powerful. You can use it to change a column’s data type, length, precision, scale, nullability, collation…and many other things besides.
It is certainly more convenient than the alternative: Creating a new table and migrating the data each time a change is necessary. Nevertheless, there is only so much that can be done to hide the underlying complexity. Along with a large number of restrictions on what is even possible with this command, there is always the question of performance.
Ultimately, tables are stored as a sequence of bytes with some metadata elsewhere in the system to describe what each of those bytes mean, and how they relate to each of the table’s various columns. When we ask SQL Server to change some aspect of a column’s definition, it needs to check that the existing data is compatible with the new definition. It also needs to determine if the current physical layout needs to change.
Depending on the type of change and the configuration of the database, an ALTER COLUMN
command will need to perform one of the following actions:
- Change metadata in system tables only.
- Check all the existing data for compatibility, then change metadata.
- Rewrite some or all of the stored data to match the new definition.
Option 1 represents the ideal case from a performance point of view. It requires only a few changes to system tables, and a minimal amount of logging. The operation will still require a restrictive schema modification Sch-M
lock, but the metadata changes themselves will complete very quickly, regardless of the size of the table.
Metadata-Only Changes
There are a number of special cases to watch out for, but as a general summary, the following actions only require changes to metadata:
- Going from
NOT NULL
toNULL
for the same data type. - Increasing the maximum size of a
varchar
,nvarchar
, orvarbinary
column (except tomax
).
Improvements in SQL Server 2016
The subject of this post is the additional changes that are enabled for metadata-only from SQL Server 2016 onward. No changes to syntax are needed, and no configuration settings need to be modified. You get these undocumented improvements for free.
The new capabilities target a subset of the fixed-length data types. The new abilities apply to row-store tables in the following circumstances:
- Compression must be enabled:
- On all indexes and partitions, including the base heap or clustered index.
- Either
ROW
orPAGE
compression. - Indexes and partitions may use a mixture of these compression levels. The important thing is there are no uncompressed indexes or partitions.
- Changing from
NULL
toNOT NULL
is not allowed. - The following integer type changes are supported:
smallint
tointeger
orbigint
.integer
tobigint
.smallmoney
tomoney
(uses integer representation internally).
- The following string and binary type changes are supported:
char(n)
tochar(m)
orvarchar(m)
nchar(n)
tonchar(m)
ornvarchar(m)
binary(n)
tobinary(m)
orvarbinary(m)
- All of the above only for
n < m
andm != max
- Collation changes are not allowed
These changes can be metadata-only because the underlying binary data layout does not change when Column Descriptor row format is used (hence the need for compression). Without compression, row store uses the original FixedVar representation, which cannot accommodate these fixed-length data type changes without rewriting the physical layout.
You may notice that tinyint
is omitted from the integer types list. This is because it is unsigned, while the other integer types are all signed, so a metadata-only change is not possible. For example, a value of 255 can fit in one byte for tinyint
, but requires two bytes in any of the signed formats. The signed formats can hold -128 to +127 in one byte when compressed.
Integer Example
One very handy application of this improvement is changing the data type of a column with the IDENTITY
property.
Say we have the following heap table using row compression (page compression would also work):
DROP TABLE IF EXISTS dbo.Test;
GO
CREATE TABLE dbo.Test
(
id integer IDENTITY NOT NULL,
some_value integer NOT NULL
)
WITH (DATA_COMPRESSION = ROW);
Let’s add 5 million rows of data. This will be enough to make it obvious (from a performance standpoint) whether changing the column data type is a metadata-only operation or not:
WITH Numbers AS
(
SELECT
n = ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM sys.all_columns AS AC1
CROSS JOIN sys.all_columns AS AC2
ORDER BY n
OFFSET 0 ROWS
FETCH FIRST 5 * 1000 * 1000 ROWS ONLY
)
INSERT dbo.Test
WITH (TABLOCKX)
(
some_value
)
SELECT
N.n
FROM Numbers AS N;
Next we will reseed the IDENTITY
to make it seem like we are almost at the point of running out of values that will fit in an integer
:
DBCC CHECKIDENT
(
N'dbo.Test',
RESEED,
2147483646
);
We can add one more row successfully:
INSERT dbo.Test
(some_value)
VALUES
(123456);
But attempting to add another row:
INSERT dbo.Test
(some_value)
VALUES
(7890);
Results in an error message:
Arithmetic overflow error converting IDENTITY to data type int.
We can fix that by converting the column to bigint
:
ALTER TABLE dbo.Test
ALTER COLUMN id bigint NOT NULL;
Thanks to the improvements in SQL Server 2016, this command changes metadata only, and completes immediately. The previous INSERT
statement (the one that threw the arithmetic overflow error) now completes successfully.
This new ability does not solve all the issues around changing the type of a column with the IDENTITY
property. We will still need to drop and recreate any indexes on the column, recreate any referencing foreign keys, and so on. That is a bit outside the scope of this post (though Aaron Bertrand has written about it before). Being able to change the type as a metadata-only operation certainly doesn’t hurt. With careful planning, the other steps required can be made as efficient as possible, for example using minimally-logged or ONLINE
operations.
Be Careful with Syntax
Be sure to always specify NULL
or NOT NULL
when changing data types with ALTER COLUMN
. Say for example we wanted to also change the data type of the some_value
column in our test table from integer NOT NULL
to bigint NOT NULL
.
When we write the command, we omit the NULL
or NOT NULL
qualifier:
ALTER TABLE dbo.Test
ALTER COLUMN some_value bigint;
This command completes successfully as a metadata-only change, but also removes the NOT NULL
constraint. The column is now bigint NULL
, which is not what we intended. This behaviour is documented, but it is easy to overlook.
We might try to fix our mistake with:
ALTER TABLE dbo.Test
ALTER COLUMN some_value bigint NOT NULL;
This is not a metadata-only change. We are not allowed to change from NULL
to NOT NULL
(refer back to the earlier table if you need a refresher on the conditions). SQL Server will need to check all the existing values to ensure no nulls are present. It will then physically rewrite every row of the table. As well as being slow in itself, these actions generate a great deal of transaction log, which can have knock-on effects.
As a side note, this same mistake is not possible for columns with the IDENTITY
property. If we write an ALTER COLUMN
statement without NULL
or NOT NULL
in that case, the engine helpfully assumes we meant NOT NULL
because the identity property is not allowed on nullable columns. It is still a great idea not to rely on this behaviour.
Always specify NULL
or NOT NULL
with ALTER COLUMN
.
Collation
Particular care is needed when altering a string column that has a collation not matching the default for the database.
For example, say we have a table with a case- and accent-sensitive collation (assume the database default is different):
DROP TABLE IF EXISTS dbo.Test2;
GO
CREATE TABLE dbo.Test2
(
id integer IDENTITY NOT NULL,
some_string char(8) COLLATE Latin1_General_100_CS_AS NOT NULL
)
WITH (DATA_COMPRESSION = ROW);
Add 5 million rows of data:
WITH Numbers AS
(
SELECT
n = ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM sys.all_columns AS AC1
CROSS JOIN sys.all_columns AS AC2
ORDER BY n
OFFSET 0 ROWS
FETCH FIRST 5 * 1000 * 1000 ROWS ONLY
)
INSERT dbo.Test2
WITH (TABLOCKX)
(
some_string
)
SELECT
CONVERT(char(8), N.n) COLLATE Latin1_General_100_CS_AS
FROM Numbers AS N;
Double the length of the string column using the following command:
ALTER TABLE dbo.Test2
ALTER COLUMN some_string char(16) NOT NULL;
We remembered to specify NOT NULL
, but forgot about the non-default collation. SQL Server assumes we meant to change collation to the database default (Latin1_General_CI_AS
for my test database). Changing collation prevents the operation from being metadata-only, and so the operation runs for several minutes, generating heaps of log.
Recreate the table and data using the previous script, then try the ALTER COLUMN
command again, but specifying the existing non-default collation as part of the command:
ALTER TABLE dbo.Test2
ALTER COLUMN some_string
char(16) COLLATE Latin1_General_100_CS_AS NOT NULL;
The change now completes immediately, as a metadata-only operation. As with the NULL
and NOT NULL
syntax, it pays to be explicit to avoid accidents. This is good advice in general, not just for ALTER COLUMN
.
Compression
Please be aware that compression needs to be explicitly specified for each index, and separately for the base table if it is a heap. This is another example where using abbreviated syntax or shortcuts can prevent the desired outcome.
For example, the following table does not specify explicit compression for either the primary key or in-line index definition:
CREATE TABLE dbo.Test
(
id integer IDENTITY NOT NULL PRIMARY KEY,
some_value integer NOT NULL
INDEX [IX dbo.Test some_value]
)
WITH (DATA_COMPRESSION = PAGE);
The PRIMARY KEY
will have a name assigned, default to CLUSTERED
,and be PAGE
compressed. The in-line index will be NONCLUSTERED
and not compressed at all. This table will not be enabled for any of the new optimizations because not all indexes and partitions are compressed.
A much better, and more explicit table definition would be:
CREATE TABLE dbo.Test
(
id integer IDENTITY NOT NULL
CONSTRAINT [PK dbo.Test id]
PRIMARY KEY CLUSTERED
WITH (DATA_COMPRESSION = PAGE),
some_value integer NOT NULL
INDEX [IX dbo.Test some_value]
NONCLUSTERED
WITH (DATA_COMPRESSION = ROW)
);
This table will qualify for the new optimizations because all indexes and partitions are compressed. As noted previously, mixing compression types is fine.
There are a variety of ways to write this CREATE TABLE
statement in an explicit way, so there is an element of personal preference. The important takeaway point is to always be explicit about what you want. This applies to separate CREATE INDEX
statements as well.
Extended Events and Trace Flag
There is an extended event specifically for the new metadata-only ALTER COLUMN
operations supported in SQL Server 2016 onward.
The extended event is compressed_alter_column_is_md_only
in the Debug channel. Its event fields are object_id
, column_id
, and is_md_only
(true/false).
This event only indicates if an operation is metadata-only due to the new abilities of SQL Server 2016. Column alterations that were metadata-only before 2016 will show is_md_only = false
despite still being metadata-only.
Other extended events useful for tracking ALTER COLUMN
operations include metadata_ddl_alter_column
and alter_column_event
, both in the Analytic channel.
Should you need to disable the new SQL Server 2016 capabilities for any reason, undocumented global (or start-up) trace flag 3618 can be used. This trace flag is not effective when used at the session level. There is no way to specify a query-level trace flag with an ALTER COLUMN
command.
Final Thoughts
Being able to change some fixed-length integer data types with a metadata-only change is a very welcome product improvement. It does require that the table is already fully compressed, but that is becoming more of a common thing anyway. This is especially true since compression was enabled in all editions starting with SQL Server 2016 Service Pack 1.
Fixed-length string type columns are probably much less common. Some of this may be due to somewhat out-of-date considerations like space usage. When compressed, fixed-length string columns do not store trailing blanks, making them just as efficient as variable-length string columns from a storage point of view. It can be annoying to trim spaces for manipulation or display, but if the data usually occupies most of the maximum length, fixed-length types can have important advantages, not least regarding memory grants for things like sorting and hashing.
It’s not all good news with compression enabled. I mentioned earlier that SQL Server can sometimes perform a metadata-only change after checking that all existing values will convert successfully to the new type. This is the case when using ALTER COLUMN
to change from integer
to smallint
for example. Unfortunately, these operations are not currently metadata-only for compressed objects.
Acknowledgements
Special thanks to Panagiotis Antonopoulos (Principal Software Engineer) and Mirek Sztajno (Senior Program Manager) from the SQL Server product team for their assistance and guidance during the research and writing of this article.
None of the details given in this work should be regarded as official Microsoft documentation or product statements.
That's a pretty cool and important improvement!
A quick tip: if you enable STATISTICS IO, a metadata-only operation will not report any output.
Excellent write-up, thanks!
This is an excellent mechanism on one hand…
However, it's completely useless when the column you want to change has a CLUSTERED INDEX defined on it.
Such a scenario would especially be common with IDENTITY columns.
First, changing a column's data type is not allowed as long as there's any index defined for it.
Secondly, dropping and re-creating a CLUSTERED index would be equivalent to (or even worse than) rebuilding the ENTIRE table. It would rebuild all non-clustered indexes in order to replace their CLUSTERED key with RID. Then, when you eventually re-create the clustered index, it would mean rebuilding all non-clustered indexes AGAIN in order to replace RID with the CLUSTERED key again.
Worst of all: NONE of these operations can truly be ONLINE. All of them are BLOCKING operations.
So, what are we left with? What would be our best method of operations?
1. Drop all non-clustered indexes.
2. Drop clustered index.
3. Compress the table.
4. Change the column data type.
5. Re-create the clustered index.
6. Re-create the non-clustered indexes.
But, now that it has come to this, we have basically lost any benefit that this "meta-data change only" feature could give us.
We're back to square one again.
Might as well do it the "old-fashioned" way, which would be much better. i.e. create a new table with the new data type, gradually migrate the old data into it, and then rename the tables to switch places.
This would be a much more "ONLINE" operation than the "new" method.