Aaron Bertrand

T-SQL Tuesday #65 : Teach Something New

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

https://sqlmd.wordpress.com/2015/04/07/t-sql-tuesday-065-teach-something-new/

This month's T-SQL Tuesday is being hosted by Mike Donnelly (@SQLMD), and he sums up the topic as follows:

The topic this month is straight forward, but very open ended. You must learn something new and then write a blog post explaining it.

Well, from the moment Mike announced the topic, I didn't really set out to learn anything new, and as the weekend approached and I knew Monday was going to assault me with jury duty, I thought I was going to have to sit this month out.

Then, Martin Smith taught me something I either never knew, or knew long ago but have forgotten (sometimes you don't know what you don't know, and sometimes you can't remember what you never knew and what you can't remember). My recollection was that changing a column from NOT NULL to NULL should be a metadata-only operation, with writes to any page being deferred until that page is updated for other reasons, since the NULL bitmap wouldn't really need to exist until at least one row could become NULL.

On that same post, @ypercube also reminded me of this pertinent quote from Books Online (typo and all):

Altering a column from NOT NULL to NULL is not supported as an online operation when the altered column is references by nonclustered indexes.

"Not an online operation" can be interpreted as "not a metadata-only operation" – meaning it will actually be a size-of-data operation (the larger your index, the longer it will take).

I set out to prove this with a pretty simple (but lengthy) experiment against a specific target column to convert from NOT NULL to NULL. I would create 3 tables, all with a clustered primary key, but each one with a different non-clustered index. One would have the target column as a key column, the second as an INCLUDE column, and the third wouldn't reference the target column at all.

Here are my tables and how I populated them:

CREATE TABLE dbo.test1
(
  a INT NOT NULL, b INT NOT NULL, c BIGINT NOT NULL,
  CONSTRAINT pk_t1 PRIMARY KEY (a,b)
);
GO
CREATE NONCLUSTERED INDEX ix1 ON dbo.test1(b,c);
GO

CREATE TABLE dbo.test2
(
  a INT NOT NULL, b INT NOT NULL, c BIGINT NOT NULL,
  CONSTRAINT pk_t2 PRIMARY KEY (a,b)
);
GO
CREATE NONCLUSTERED INDEX ix2 ON dbo.test2(b) INCLUDE(c);
GO

CREATE TABLE dbo.test3
(
  a INT NOT NULL, b INT NOT NULL, c BIGINT NOT NULL,
  CONSTRAINT pk_t3 PRIMARY KEY (a,b)
);
GO
CREATE NONCLUSTERED INDEX ix3 ON dbo.test3(b);
GO

INSERT dbo.test1(a,b,c) -- repeat for test2 / test3
  SELECT n1, n2, ABS(n2)-ABS(n1)
  FROM 
  (
    SELECT TOP (100000) s1.[object_id], s2.[object_id]
      FROM master.sys.all_objects AS s1
      CROSS JOIN master.sys.all_objects AS s2
      GROUP BY s1.[object_id], s2.[object_id]
  ) AS n(n1, n2);

Each table had 100,000 rows, the clustered indexes had 310 pages, and the non-clustered indexes had either 272 pages (test1 and test2) or 174 pages (test3). (These values are easy to obtain from sys.dm_db_index_physical_stats.)

Next, I needed a simple way to capture operations that were logged at the page level – I chose sys.fn_dblog(), though I could have dug deeper and looked at pages directly. I didn't bother messing with LSN values to pass to the function, since I wasn't running this in production and didn't care much about performance, so after the tests I just dumped the results of the function, excluding any data that was logged prior to the ALTER TABLE operations.

-- establish an exclusion set
SELECT * INTO #x FROM sys.fn_dblog(NULL, NULL);

Now I could run my tests, which were a lot simpler than the setup.

ALTER TABLE dbo.test1 ALTER COLUMN c BIGINT NULL;

ALTER TABLE dbo.test2 ALTER COLUMN c BIGINT NULL;

ALTER TABLE dbo.test3 ALTER COLUMN c BIGINT NULL;

Now I could examine the operations that were logged in each case:

SELECT AllocUnitName, [Operation], Context, c = COUNT(*) 
  FROM 
  (
    SELECT * FROM sys.fn_dblog(NULL, NULL)
    WHERE [Operation] = N'LOP_FORMAT_PAGE'
      AND AllocUnitName LIKE N'dbo.test%'
    EXCEPT 
    SELECT * FROM #x
  ) AS x
  GROUP BY AllocUnitName, [Operation], Context
  ORDER BY AllocUnitName, [Operation], Context;

The results seem to suggest that every leaf page of the non-clustered index is touched for the cases where the target column was mentioned in the index in any way, but no such operations occur for the case where the target column is not mentioned in any non-clustered index:

Operations logged

In fact, in the first two cases, new pages are allocated (you can validate that with DBCC IND, as Spörri did in his answer), so the operation can occur online, but that doesn't mean it's fast (since it still has to write out a copy of all that data, and make the NULL bitmap change as part of writing out each new page, and log all of that activity).

I think most people would suspect that changing a column from NOT NULL to NULL would be metadata-only in all scenarios, but I have shown here that this is not true if the column is referenced by a non-clustered index (and similar things happen whether it is a key or INCLUDE column). Perhaps this operation can also be forced to be ONLINE in Azure SQL Database today, or it will be possible in the next major version? This won't necessarily make the actual physical operations happen any faster, but it will prevent blocking as a result.

I didn't test that scenario (and analysis of whether it is really online is tougher in Azure anyway), nor did I test it on a heap. Something I can revisit in a future post. In the meantime, be careful about any assumptions you might make about metadata-only operations.