ALTER TABLE ... ALTER COLUMN command is very powerful. You can use it to change a column’s data type, length, precision, scale, nullability, collation…and many other things besides.
It is certainly more convenient than the alternative: Creating a new table and migrating the data each time a change is necessary. Nevertheless, there is only so much that can be done to hide the underlying complexity. Along with a large number of restrictions on what is even possible with this command, there is always the question of performance.
Ultimately, tables are stored as a sequence of bytes with some metadata elsewhere in the system to describe what each of those bytes mean, and how they relate to each of the table’s various columns. When we ask SQL Server to change some aspect of a column’s definition, it needs to check that the existing data is compatible with the new definition. It also needs to determine if the current physical layout needs to change.
Depending on the type of change and the configuration of the database, an
ALTER COLUMN command will need to perform one of the following actions:
- Change metadata in system tables only.
- Check all the existing data for compatibility, then change metadata.
- Rewrite some or all of the stored data to match the new definition.
Option 1 represents the ideal case from a performance point of view. It requires only a few changes to system tables, and a minimal amount of logging. The operation will still require a restrictive schema modification
Sch-M lock, but the metadata changes themselves will complete very quickly, regardless of the size of the table.
There are a number of special cases to watch out for, but as a general summary, the following actions only require changes to metadata:
- Going from
NULLfor the same data type.
- Increasing the maximum size of a
varbinarycolumn (except to
Improvements in SQL Server 2016
The subject of this post is the additional changes that are enabled for metadata-only from SQL Server 2016 onward. No changes to syntax are needed, and no configuration settings need to be modified. You get these undocumented improvements for free.
The new capabilities target a subset of the fixed-length data types. The new abilities apply to row-store tables in the following circumstances:
- Compression must be enabled:
- On all indexes and partitions, including the base heap or clustered index.
- Indexes and partitions may use a mixture of these compression levels. The important thing is there are no uncompressed indexes or partitions.
- Changing from
NOT NULLis not allowed.
- The following integer type changes are supported:
money(uses integer representation internally).
- The following string and binary type changes are supported:
- All of the above only for
n < mand
m != max
- Collation changes are not allowed
These changes can be metadata-only because the underlying binary data layout does not change when Column Descriptor row format is used (hence the need for compression). Without compression, row store uses the original FixedVar representation, which cannot accommodate these fixed-length data type changes without rewriting the physical layout.
You may notice that
tinyint is omitted from the integer types list. This is because it is unsigned, while the other integer types are all signed, so a metadata-only change is not possible. For example, a value of 255 can fit in one byte for
tinyint, but requires two bytes in any of the signed formats. The signed formats can hold -128 to +127 in one byte when compressed.
One very handy application of this improvement is changing the data type of a column with the
Say we have the following heap table using row compression (page compression would also work):
DROP TABLE IF EXISTS dbo.Test; GO CREATE TABLE dbo.Test ( id integer IDENTITY NOT NULL, some_value integer NOT NULL ) WITH (DATA_COMPRESSION = ROW);
Let’s add 5 million rows of data. This will be enough to make it obvious (from a performance standpoint) whether changing the column data type is a metadata-only operation or not:
WITH Numbers AS ( SELECT n = ROW_NUMBER() OVER (ORDER BY @@SPID) FROM sys.all_columns AS AC1 CROSS JOIN sys.all_columns AS AC2 ORDER BY n OFFSET 0 ROWS FETCH FIRST 5 * 1000 * 1000 ROWS ONLY ) INSERT dbo.Test WITH (TABLOCKX) ( some_value ) SELECT N.n FROM Numbers AS N;
Next we will reseed the
IDENTITY to make it seem like we are almost at the point of running out of values that will fit in an
DBCC CHECKIDENT ( N'dbo.Test', RESEED, 2147483646 );
We can add one more row successfully:
INSERT dbo.Test (some_value) VALUES (123456);
But attempting to add another row:
INSERT dbo.Test (some_value) VALUES (7890);
Results in an error message:
Arithmetic overflow error converting IDENTITY to data type int.
We can fix that by converting the column to
ALTER TABLE dbo.Test ALTER COLUMN id bigint NOT NULL;
Thanks to the improvements in SQL Server 2016, this command changes metadata only, and completes immediately. The previous
INSERT statement (the one that threw the arithmetic overflow error) now completes successfully.
This new ability does not solve all the issues around changing the type of a column with the
IDENTITY property. We will still need to drop and recreate any indexes on the column, recreate any referencing foreign keys, and so on. That is a bit outside the scope of this post (though Aaron Bertrand has written about it before). Being able to change the type as a metadata-only operation certainly doesn’t hurt. With careful planning, the other steps required can be made as efficient as possible, for example using minimally-logged or
Be Careful with Syntax
Be sure to always specify
NOT NULL when changing data types with
ALTER COLUMN. Say for example we wanted to also change the data type of the
some_value column in our test table from
integer NOT NULL to
bigint NOT NULL.
When we write the command, we omit the
NOT NULL qualifier:
ALTER TABLE dbo.Test ALTER COLUMN some_value bigint;
This command completes successfully as a metadata-only change, but also removes the
NOT NULL constraint. The column is now
bigint NULL, which is not what we intended. This behaviour is documented, but it is easy to overlook.
We might try to fix our mistake with:
ALTER TABLE dbo.Test ALTER COLUMN some_value bigint NOT NULL;
This is not a metadata-only change. We are not allowed to change from
NOT NULL (refer back to the earlier table if you need a refresher on the conditions). SQL Server will need to check all the existing values to ensure no nulls are present. It will then physically rewrite every row of the table. As well as being slow in itself, these actions generate a great deal of transaction log, which can have knock-on effects.
As a side note, this same mistake is not possible for columns with the
IDENTITY property. If we write an
ALTER COLUMN statement without
NOT NULL in that case, the engine helpfully assumes we meant
NOT NULL because the identity property is not allowed on nullable columns. It is still a great idea not to rely on this behaviour.
NOT NULL with
Particular care is needed when altering a string column that has a collation not matching the default for the database.
For example, say we have a table with a case- and accent-sensitive collation (assume the database default is different):
DROP TABLE IF EXISTS dbo.Test2; GO CREATE TABLE dbo.Test2 ( id integer IDENTITY NOT NULL, some_string char(8) COLLATE Latin1_General_100_CS_AS NOT NULL ) WITH (DATA_COMPRESSION = ROW);
Add 5 million rows of data:
WITH Numbers AS ( SELECT n = ROW_NUMBER() OVER (ORDER BY @@SPID) FROM sys.all_columns AS AC1 CROSS JOIN sys.all_columns AS AC2 ORDER BY n OFFSET 0 ROWS FETCH FIRST 5 * 1000 * 1000 ROWS ONLY ) INSERT dbo.Test2 WITH (TABLOCKX) ( some_string ) SELECT CONVERT(char(8), N.n) COLLATE Latin1_General_100_CS_AS FROM Numbers AS N;
Double the length of the string column using the following command:
ALTER TABLE dbo.Test2 ALTER COLUMN some_string char(16) NOT NULL;
We remembered to specify
NOT NULL, but forgot about the non-default collation. SQL Server assumes we meant to change collation to the database default (
Latin1_General_CI_AS for my test database). Changing collation prevents the operation from being metadata-only, and so the operation runs for several minutes, generating heaps of log.
Recreate the table and data using the previous script, then try the
ALTER COLUMN command again, but specifying the existing non-default collation as part of the command:
ALTER TABLE dbo.Test2 ALTER COLUMN some_string char(16) COLLATE Latin1_General_100_CS_AS NOT NULL;
The change now completes immediately, as a metadata-only operation. As with the
NOT NULL syntax, it pays to be explicit to avoid accidents. This is good advice in general, not just for
Please be aware that compression needs to be explicitly specified for each index, and separately for the base table if it is a heap. This is another example where using abbreviated syntax or shortcuts can prevent the desired outcome.
For example, the following table does not specify explicit compression for either the primary key or in-line index definition:
CREATE TABLE dbo.Test ( id integer IDENTITY NOT NULL PRIMARY KEY, some_value integer NOT NULL INDEX [IX dbo.Test some_value] ) WITH (DATA_COMPRESSION = PAGE);
PRIMARY KEY will have a name assigned, default to
CLUSTERED ,and be
PAGE compressed. The in-line index will be
NONCLUSTERED and not compressed at all. This table will not be enabled for any of the new optimizations because not all indexes and partitions are compressed.
A much better, and more explicit table definition would be:
CREATE TABLE dbo.Test ( id integer IDENTITY NOT NULL CONSTRAINT [PK dbo.Test id] PRIMARY KEY CLUSTERED WITH (DATA_COMPRESSION = PAGE), some_value integer NOT NULL INDEX [IX dbo.Test some_value] NONCLUSTERED WITH (DATA_COMPRESSION = ROW) );
This table will qualify for the new optimizations because all indexes and partitions are compressed. As noted previously, mixing compression types is fine.
There are a variety of ways to write this
CREATE TABLE statement in an explicit way, so there is an element of personal preference. The important takeaway point is to always be explicit about what you want. This applies to separate
CREATE INDEX statements as well.
Extended Events and Trace Flag
There is an extended event specifically for the new metadata-only
ALTER COLUMN operations supported in SQL Server 2016 onward.
The extended event is
compressed_alter_column_is_md_only in the Debug channel. Its event fields are
This event only indicates if an operation is metadata-only due to the new abilities of SQL Server 2016. Column alterations that were metadata-only before 2016 will show
is_md_only = false despite still being metadata-only.
Other extended events useful for tracking
ALTER COLUMN operations include
alter_column_event, both in the Analytic channel.
Should you need to disable the new SQL Server 2016 capabilities for any reason, undocumented global (or start-up) trace flag 3618 can be used. This trace flag is not effective when used at the session level. There is no way to specify a query-level trace flag with an
ALTER COLUMN command.
Being able to change some fixed-length integer data types with a metadata-only change is a very welcome product improvement. It does require that the table is already fully compressed, but that is becoming more of a common thing anyway. This is especially true since compression was enabled in all editions starting with SQL Server 2016 Service Pack 1.
Fixed-length string type columns are probably much less common. Some of this may be due to somewhat out-of-date considerations like space usage. When compressed, fixed-length string columns do not store trailing blanks, making them just as efficient as variable-length string columns from a storage point of view. It can be annoying to trim spaces for manipulation or display, but if the data usually occupies most of the maximum length, fixed-length types can have important advantages, not least regarding memory grants for things like sorting and hashing.
It’s not all good news with compression enabled. I mentioned earlier that SQL Server can sometimes perform a metadata-only change after checking that all existing values will convert successfully to the new type. This is the case when using
ALTER COLUMN to change from
smallint for example. Unfortunately, these operations are not currently metadata-only for compressed objects.
Special thanks to Panagiotis Antonopoulos (Principal Software Engineer) and Mirek Sztajno (Senior Program Manager) from the SQL Server product team for their assistance and guidance during the research and writing of this article.
None of the details given in this work should be regarded as official Microsoft documentation or product statements.
3 thoughts on “New Metadata-Only Column Changes in SQL Server 2016”
That's a pretty cool and important improvement!
A quick tip: if you enable STATISTICS IO, a metadata-only operation will not report any output.
Excellent write-up, thanks!
This is an excellent mechanism on one hand…
However, it's completely useless when the column you want to change has a CLUSTERED INDEX defined on it.
Such a scenario would especially be common with IDENTITY columns.
First, changing a column's data type is not allowed as long as there's any index defined for it.
Secondly, dropping and re-creating a CLUSTERED index would be equivalent to (or even worse than) rebuilding the ENTIRE table. It would rebuild all non-clustered indexes in order to replace their CLUSTERED key with RID. Then, when you eventually re-create the clustered index, it would mean rebuilding all non-clustered indexes AGAIN in order to replace RID with the CLUSTERED key again.
Worst of all: NONE of these operations can truly be ONLINE. All of them are BLOCKING operations.
So, what are we left with? What would be our best method of operations?
1. Drop all non-clustered indexes.
2. Drop clustered index.
3. Compress the table.
4. Change the column data type.
5. Re-create the clustered index.
6. Re-create the non-clustered indexes.
But, now that it has come to this, we have basically lost any benefit that this "meta-data change only" feature could give us.
We're back to square one again.
Might as well do it the "old-fashioned" way, which would be much better. i.e. create a new table with the new data type, gradually migrate the old data into it, and then rename the tables to switch places.
This would be a much more "ONLINE" operation than the "new" method.
Comments are closed.