New Metadata-Only Column Changes in SQL Server 2016

The ALTER TABLE ... ALTER COLUMN command is very powerful. You can use it to change a column’s data type, length, precision, scale, nullability, collation…and many other things besides.

It is certainly more convenient than the alternative: Creating a new table and migrating the data each time a change is necessary. Nevertheless, there is only so much that can be done to hide the underlying complexity. Along with a large number of restrictions on what is even possible with this command, there is always the question of performance.

Ultimately, tables are stored as a sequence of bytes with some metadata elsewhere in the system to describe what each of those bytes mean, and how they relate to each of the table’s various columns. When we ask SQL Server to change some aspect of a column’s definition, it needs to check that the existing data is compatible with the new definition. It also needs to determine if the current physical layout needs to change.

Depending on the type of change and the configuration of the database, an ALTER COLUMN command will need to perform one of the following actions:

Change metadata in system tables only.
Check all the existing data for compatibility, then change metadata.
Rewrite some or all of the stored data to match the new definition.

Option 1 represents the ideal case from a performance point of view. It requires only a few changes to system tables, and a minimal amount of logging. The operation will still require a restrictive schema modification Sch-M lock, but the metadata changes themselves will complete very quickly, regardless of the size of the table.

Metadata-Only Changes

There are a number of special cases to watch out for, but as a general summary, the following actions only require changes to metadata:

Going from NOT NULL to NULL for the same data type.
Increasing the maximum size of a varchar, nvarchar, or varbinary column (except to max).

Improvements in SQL Server 2016

The subject of this post is the additional changes that are enabled for metadata-only from SQL Server 2016 onward. No changes to syntax are needed, and no configuration settings need to be modified. You get these undocumented improvements for free.

The new capabilities target a subset of the fixed-length data types. The new abilities apply to row-store tables in the following circumstances:

Compression must be enabled:
- On all indexes and partitions, including the base heap or clustered index.
- Either ROW or PAGE compression.
- Indexes and partitions may use a mixture of these compression levels. The important thing is there are no uncompressed indexes or partitions.
Changing from NULL to NOT NULL is not allowed.
The following integer type changes are supported:
- smallint to integer or bigint.
- integer to bigint.
- smallmoney to money (uses integer representation internally).
The following string and binary type changes are supported:
- char(n) to char(m) or varchar(m)
- nchar(n) to nchar(m) or nvarchar(m)
- binary(n) to binary(m) or varbinary(m)
- All of the above only for n < m and m != max
- Collation changes are not allowed

These changes can be metadata-only because the underlying binary data layout does not change when Column Descriptor row format is used (hence the need for compression). Without compression, row store uses the original FixedVar representation, which cannot accommodate these fixed-length data type changes without rewriting the physical layout.

You may notice that tinyint is omitted from the integer types list. This is because it is unsigned, while the other integer types are all signed, so a metadata-only change is not possible. For example, a value of 255 can fit in one byte for tinyint, but requires two bytes in any of the signed formats. The signed formats can hold -128 to +127 in one byte when compressed.

Integer Example

One very handy application of this improvement is changing the data type of a column with the IDENTITY property.

Say we have the following heap table using row compression (page compression would also work):

DROP TABLE IF EXISTS dbo.Test;
GO
CREATE TABLE dbo.Test
(
    id integer IDENTITY NOT NULL,
    some_value integer NOT NULL
)
WITH (DATA_COMPRESSION = ROW);

Let’s add 5 million rows of data. This will be enough to make it obvious (from a performance standpoint) whether changing the column data type is a metadata-only operation or not:

WITH Numbers AS
(
    SELECT 
        n = ROW_NUMBER() OVER (ORDER BY @@SPID) 
    FROM sys.all_columns AS AC1
    CROSS JOIN sys.all_columns AS AC2
    ORDER BY n
    OFFSET 0 ROWS
    FETCH FIRST 5 * 1000 * 1000 ROWS ONLY
)
INSERT dbo.Test
    WITH (TABLOCKX)
(
    some_value
)
SELECT
    N.n
FROM Numbers AS N;

Next we will reseed the IDENTITY to make it seem like we are almost at the point of running out of values that will fit in an integer:

DBCC CHECKIDENT
(
    N'dbo.Test',
    RESEED,
    2147483646
);

We can add one more row successfully:

INSERT dbo.Test
    (some_value)
VALUES
    (123456);

But attempting to add another row:

INSERT dbo.Test
    (some_value)
VALUES
    (7890);

Results in an error message:

Msg 8115, Level 16, State 1, Line 1
Arithmetic overflow error converting IDENTITY to data type int.

We can fix that by converting the column to bigint:

ALTER TABLE dbo.Test
ALTER COLUMN id bigint NOT NULL;

Thanks to the improvements in SQL Server 2016, this command changes metadata only, and completes immediately. The previous INSERT statement (the one that threw the arithmetic overflow error) now completes successfully.

This new ability does not solve all the issues around changing the type of a column with the IDENTITY property. We will still need to drop and recreate any indexes on the column, recreate any referencing foreign keys, and so on. That is a bit outside the scope of this post (though Aaron Bertrand has written about it before). Being able to change the type as a metadata-only operation certainly doesn’t hurt. With careful planning, the other steps required can be made as efficient as possible, for example using minimally-logged or ONLINE operations.

Be Careful with Syntax

Be sure to always specify NULL or NOT NULL when changing data types with ALTER COLUMN. Say for example we wanted to also change the data type of the some_value column in our test table from integer NOT NULL to bigint NOT NULL.

When we write the command, we omit the NULL or NOT NULL qualifier:

ALTER TABLE dbo.Test
ALTER COLUMN some_value bigint;

This command completes successfully as a metadata-only change, but also removes the NOT NULL constraint. The column is now bigint NULL, which is not what we intended. This behaviour is documented, but it is easy to overlook.

We might try to fix our mistake with:

ALTER TABLE dbo.Test
ALTER COLUMN some_value bigint NOT NULL;

This is not a metadata-only change. We are not allowed to change from NULL to NOT NULL (refer back to the earlier table if you need a refresher on the conditions). SQL Server will need to check all the existing values to ensure no nulls are present. It will then physically rewrite every row of the table. As well as being slow in itself, these actions generate a great deal of transaction log, which can have knock-on effects.

As a side note, this same mistake is not possible for columns with the IDENTITY property. If we write an ALTER COLUMN statement without NULL or NOT NULL in that case, the engine helpfully assumes we meant NOT NULL because the identity property is not allowed on nullable columns. It is still a great idea not to rely on this behaviour.

Always specify NULL or NOT NULL with ALTER COLUMN.

Collation

Particular care is needed when altering a string column that has a collation not matching the default for the database.

For example, say we have a table with a case- and accent-sensitive collation (assume the database default is different):

DROP TABLE IF EXISTS dbo.Test2;
GO
CREATE TABLE dbo.Test2
(
    id integer IDENTITY NOT NULL,
    some_string char(8) COLLATE Latin1_General_100_CS_AS NOT NULL
)
WITH (DATA_COMPRESSION = ROW);

Add 5 million rows of data:

WITH Numbers AS
(
    SELECT 
        n = ROW_NUMBER() OVER (ORDER BY @@SPID) 
    FROM sys.all_columns AS AC1
    CROSS JOIN sys.all_columns AS AC2
    ORDER BY n
    OFFSET 0 ROWS
    FETCH FIRST 5 * 1000 * 1000 ROWS ONLY
)
INSERT dbo.Test2
    WITH (TABLOCKX)
(
    some_string
)
SELECT
    CONVERT(char(8), N.n) COLLATE Latin1_General_100_CS_AS
FROM Numbers AS N;

Double the length of the string column using the following command:

ALTER TABLE dbo.Test2
ALTER COLUMN some_string char(16) NOT NULL;

We remembered to specify NOT NULL, but forgot about the non-default collation. SQL Server assumes we meant to change collation to the database default (Latin1_General_CI_AS for my test database). Changing collation prevents the operation from being metadata-only, and so the operation runs for several minutes, generating heaps of log.

Recreate the table and data using the previous script, then try the ALTER COLUMN command again, but specifying the existing non-default collation as part of the command:

ALTER TABLE dbo.Test2
ALTER COLUMN some_string 
    char(16) COLLATE Latin1_General_100_CS_AS NOT NULL;

The change now completes immediately, as a metadata-only operation. As with the NULL and NOT NULL syntax, it pays to be explicit to avoid accidents. This is good advice in general, not just for ALTER COLUMN.

Compression

Please be aware that compression needs to be explicitly specified for each index, and separately for the base table if it is a heap. This is another example where using abbreviated syntax or shortcuts can prevent the desired outcome.

For example, the following table does not specify explicit compression for either the primary key or in-line index definition:

CREATE TABLE dbo.Test
(
    id integer IDENTITY NOT NULL PRIMARY KEY,
    some_value integer NOT NULL
        INDEX [IX dbo.Test some_value]
)
WITH (DATA_COMPRESSION = PAGE);

The PRIMARY KEY will have a name assigned, default to CLUSTERED ,and be PAGE compressed. The in-line index will be NONCLUSTERED and not compressed at all. This table will not be enabled for any of the new optimizations because not all indexes and partitions are compressed.

A much better, and more explicit table definition would be:

CREATE TABLE dbo.Test
(
    id integer IDENTITY NOT NULL
        CONSTRAINT [PK dbo.Test id]
        PRIMARY KEY CLUSTERED
        WITH (DATA_COMPRESSION = PAGE),
    some_value integer NOT NULL
        INDEX [IX dbo.Test some_value]
        NONCLUSTERED
        WITH (DATA_COMPRESSION = ROW)        
);

This table will qualify for the new optimizations because all indexes and partitions are compressed. As noted previously, mixing compression types is fine.

There are a variety of ways to write this CREATE TABLE statement in an explicit way, so there is an element of personal preference. The important takeaway point is to always be explicit about what you want. This applies to separate CREATE INDEX statements as well.

Extended Events and Trace Flag

There is an extended event specifically for the new metadata-only ALTER COLUMN operations supported in SQL Server 2016 onward.

The extended event is compressed_alter_column_is_md_only in the Debug channel. Its event fields are object_id, column_id, and is_md_only (true/false).

This event only indicates if an operation is metadata-only due to the new abilities of SQL Server 2016. Column alterations that were metadata-only before 2016 will show is_md_only = false despite still being metadata-only.

Other extended events useful for tracking ALTER COLUMN operations include metadata_ddl_alter_column and alter_column_event, both in the Analytic channel.

Should you need to disable the new SQL Server 2016 capabilities for any reason, undocumented global (or start-up) trace flag 3618 can be used. This trace flag is not effective when used at the session level. There is no way to specify a query-level trace flag with an ALTER COLUMN command.

Final Thoughts

Being able to change some fixed-length integer data types with a metadata-only change is a very welcome product improvement. It does require that the table is already fully compressed, but that is becoming more of a common thing anyway. This is especially true since compression was enabled in all editions starting with SQL Server 2016 Service Pack 1.

Fixed-length string type columns are probably much less common. Some of this may be due to somewhat out-of-date considerations like space usage. When compressed, fixed-length string columns do not store trailing blanks, making them just as efficient as variable-length string columns from a storage point of view. It can be annoying to trim spaces for manipulation or display, but if the data usually occupies most of the maximum length, fixed-length types can have important advantages, not least regarding memory grants for things like sorting and hashing.

It’s not all good news with compression enabled. I mentioned earlier that SQL Server can sometimes perform a metadata-only change after checking that all existing values will convert successfully to the new type. This is the case when using ALTER COLUMN to change from integer to smallint for example. Unfortunately, these operations are not currently metadata-only for compressed objects.

Acknowledgements

Special thanks to Panagiotis Antonopoulos (Principal Software Engineer) and Mirek Sztajno (Senior Program Manager) from the SQL Server product team for their assistance and guidance during the research and writing of this article.

None of the details given in this work should be regarded as official Microsoft documentation or product statements.

3 thoughts on “New Metadata-Only Column Changes in SQL Server 2016”

Itzik Ben-Gan says:

April 20, 2020 at 10:28 PM

That's a pretty cool and important improvement!

A quick tip: if you enable STATISTICS IO, a metadata-only operation will not report any output.
Alex Friedman says:

April 27, 2020 at 10:36 AM

Excellent write-up, thanks!
Eitan Blumin says:

May 4, 2020 at 6:07 AM

This is an excellent mechanism on one hand…

However, it's completely useless when the column you want to change has a CLUSTERED INDEX defined on it.

Such a scenario would especially be common with IDENTITY columns.

First, changing a column's data type is not allowed as long as there's any index defined for it.

Secondly, dropping and re-creating a CLUSTERED index would be equivalent to (or even worse than) rebuilding the ENTIRE table. It would rebuild all non-clustered indexes in order to replace their CLUSTERED key with RID. Then, when you eventually re-create the clustered index, it would mean rebuilding all non-clustered indexes AGAIN in order to replace RID with the CLUSTERED key again.

Worst of all: NONE of these operations can truly be ONLINE. All of them are BLOCKING operations.

So, what are we left with? What would be our best method of operations?

1. Drop all non-clustered indexes.
2. Drop clustered index.
3. Compress the table.
4. Change the column data type.
5. Re-create the clustered index.
6. Re-create the non-clustered indexes.

But, now that it has come to this, we have basically lost any benefit that this "meta-data change only" feature could give us.

We're back to square one again.

Might as well do it the "old-fashioned" way, which would be much better. i.e. create a new table with the new data type, gradually migrate the old data into it, and then rename the tables to switch places.

This would be a much more "ONLINE" operation than the "new" method.

Comments are closed.