Paul Randal

The SQL Server Transaction Log, Part 4: Log Records

July 19, 2022 by in Transaction Log | 6 Comments
SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

Previously I’ve talked about why logging is required and the architecture and circular nature of the log, so now it’s time to look at the real heart of the logging system—the log records themselves. As always, I recommend you read the previous posts in the series before reading this one.

What Are Log Records?

The simplest definition of a log record: it describes a single change to a database. A single operation in the database may cause multiple changes, but each change will usually have its own log record to describe it. An example of this is updating a fixed-width column in a single row—it will do the following:

  • Perform the update, which generates one log record.
  • If this is the first time the extent that contains the page (containing the updated row) has been changed since the most recent full backup, a bit must be set in the relevant differential bitmap, which generates another log record.

Each log record has a fixed overhead so, sometimes, the Storage Engine breaks this rule by combining multiple changes into a single log record to reduce the total number of log records written to the log. Some examples of this are:

  • When an update happens to multiple columns in a single table row, a single log record can contain all the column changes for one row instead of one log record per column change. Note if any nonclustered indexes need to be updated as well, those changes will be in separate log records.
  • When an index is being built or rebuilt, the Storage Engine will fill an entire page with rows and then use a single log record to log the whole page image instead of one log record per row inserted into the new page.

Each log record has a Log Sequence Number (LSN), which I defined back in part two as:

<VLF sequence number>:<log block ID>:<log record ID> (4 bytes: 4 bytes: 2 bytes)

Each LSN is unique, is ever increasing, and identifies the physical location of a log record within the log. Log records never move in the log, so their LSNs never change once created. The page header for each data file page in the database contains the LSN of the most recent log record describing a change to the page and is used during crash recovery, which I’ll cover in a future post.

Log Record Contents

You can think of a log record as similar to a table record—it has a defined structure comprised of a number of fields. All log records share a common header with metadata about the log record and will then have varying additional fields depending on the log record type. Some of the header fields include:

  • The log record LSN
  • The log record type
  • The change context, if any (e.g., heap or clustered index)
  • The transaction ID the change is part of, if any. Some log records are non-transactional and can’t be rolled back—for instance, changes to PFS pages and differential bitmaps
  • The length of the log record in bytes
  • The LSN of the previous log record generated by the transaction, if any
  • The amount of free space reserved in the log in case the log record needs to be rolled back

The log manager must always ensure there’s enough free space in the log to generate any log records required to roll back a transaction without having to grow the size of the log, making log space reservation critical. If this were not the case, and a transaction rollback required the log to grow, which then failed (e.g., out of space on the log drive), the database must be set offline in the SUSPECT state. You can read a bit more about rollbacks in my article A Complete Introduction to SQL Server Transactions on the SentryOne blog, and I’ll cover them in depth in the next article.

Log records describing changes to table/index pages also include:

  • The allocation unit ID that owns the page
  • The page ID and the slot ID of the row being changed
  • The key values of the row being changed, if any
  • A list of what locks were held to protect the change
  • The offset of the change in the row
  • The before and after images of the change, or an array of before and after images if multiple parts of the row are changed

The before and after images are the set of bytes changed and what they were changed to, respectively, and these allow the change to be rolled back or replayed. Some log records only include the after image since they can’t be rolled back, making the before image unnecessary—for example, a log record generated as part of rolling back another log record or a non-transactional log record.

Many more fields may be present depending on the type of log record, and we’ll encounter some of them as we progress through the rest of this series.

Log Record Types

There are between 70 and 100 log record types, depending on the version of SQL Server (there’s no documented list). All log record names start with LOP, which stands for Log Operation, and some common types include:

  • LOP_FORMAT_PAGE – creating a new page image
  • LOP_MODIFY_ROW – modifying a row from 1 to 16 contiguous bytes, potentially involving multiple adjacent columns
  • LOP_MODIFY_COLUMNS – modifying multiple disjoint columns in a row
  • LOP_SET_BITS – setting some bits in an allocation bitmap to 1 or 0
  • LOP_INSERT_ROWS – inserting one or more rows on a page
  • LOP_DELETE_ROWS – deleting one or more rows on a page
  • LOP_EXPUNGE_ROWS – used by the ghost cleanup task when removing deleted rows from a page
  • LOP_BEGIN_XACT – always the first log record generated by a transaction
  • LOP_COMMIT_XACT – the log record signifying a transaction has successfully committed
  • LOP_ABORT_XACT – the log record signifying a transaction has finished being rolled back

Examining Log Records

The way to examine log records is via the undocumented fn_dblog() function. It requires two LSN parameters specifying the starting and ending points in the log, but I seldom use those and instead specify NULL for both, which means scan all available log records in active VLFs. The great thing about this function is it lets you limit which fields it outputs (otherwise, you get all 130), and you can specify predicates.

Note:

  1. If you want to examine all possible log records, including those in VLFs marked inactive by log truncation, you can enable trace flag 2537.
  2. If you’re playing around and want to see what log records an operation does, use a test database in simple recovery mode, and issue a checkpoint to truncate the log. This means you can run your operation, and then fn_dblog() will only show you the log records since the checkpoint.

Let’s look at a simple example. I’ve created a database, a table with a single integer column, inserted one row, and performed a backup and a checkpoint. Now I’ll do the following:

UPDATE TestTable SET Column1 = 2;
GO

SELECT
  	[Current LSN],
  	[Operation],
  	[Context],
  	[Transaction ID],
  	[Log Record Length]
  FROM fn_dblog (NULL, NULL)
  WHERE [Operation] NOT LIKE '%CKPT';
Current LSN             Operation        Context       Transaction ID  Log Record Length
----------------------  ---------------  ------------  --------------  -----------------
00000030:000000af:0001  LOP_BEGIN_XACT   LCX_NULL      0000:000004be   124
00000030:000000af:0002  LOP_SET_BITS     LCX_DIFF_MAP  0000:00000000   56
00000030:000000af:0003  LOP_MODIFY_ROW   LCX_HEAP      0000:000004be   104
00000030:000000af:0004  LOP_COMMIT_XACT  LCX_NULL      0000:000004be   84

The LSN is in hex, and you can see all four log records for the transaction are in the same log block (the 000000AF). Here’s what we’re seeing:

  1. The transaction begins with the LOP_BEGIN_XACT with transaction ID 000004be and no context.
  2. As the update will change an extent after a backup, the differential bitmap bit for it must be set, using the LOP_SET_BITS with the context of LCX_DIFF_MAP. Notice the log record doesn’t have a transaction ID as diff map changes are non-transactional.
  3. The actual modification is logged with the LOP_MODIFY_ROW with context LCX_HEAP (because the table is a heap) and is part of the transaction.
  4. The transaction ends with the LOP_COMMIT_XACT.

It can be fascinating to spelunk around in the log to see what SQL Server is doing for some operations. You might think this doesn’t have any use in a production environment, but here’s a post showing the function being utilized (and the fn_dump_dblog function for examining log records in backups) to find out who performed a DROP TABLE and the restore point to get it back.

Until Next Time…

Since I’ve now laid the groundwork for understanding log records, in the next post of this series, I’ll show some deeper examples, including how transaction rollback works. And feel free to post a comment with any questions you might have.