Previously I’ve talked about why logging is required and the architecture and circular nature of the log, so now it’s time to look at the real heart of the logging system—the log records themselves. As always, I recommend you read the previous posts in the series before reading this one.
What Are Log Records?
The simplest definition of a log record: it describes a single change to a database. A single operation in the database may cause multiple changes, but each change will usually have its own log record to describe it. An example of this is updating a fixed-width column in a single row—it will do the following:
- Perform the update, which generates one log record.
- If this is the first time the extent that contains the page (containing the updated row) has been changed since the most recent full backup, a bit must be set in the relevant differential bitmap, which generates another log record.
Each log record has a fixed overhead so, sometimes, the Storage Engine breaks this rule by combining multiple changes into a single log record to reduce the total number of log records written to the log. Some examples of this are:
- When an update happens to multiple columns in a single table row, a single log record can contain all the column changes for one row instead of one log record per column change. Note if any nonclustered indexes need to be updated as well, those changes will be in separate log records.
- When an index is being built or rebuilt, the Storage Engine will fill an entire page with rows and then use a single log record to log the whole page image instead of one log record per row inserted into the new page.
Each log record has a Log Sequence Number (LSN), which I defined back in part two as:
Each LSN is unique, is ever increasing, and identifies the physical location of a log record within the log. Log records never move in the log, so their LSNs never change once created. The page header for each data file page in the database contains the LSN of the most recent log record describing a change to the page and is used during crash recovery, which I’ll cover in a future post.
Log Record Contents
You can think of a log record as similar to a table record—it has a defined structure comprised of a number of fields. All log records share a common header with metadata about the log record and will then have varying additional fields depending on the log record type. Some of the header fields include:
- The log record LSN
- The log record type
- The change context, if any (e.g., heap or clustered index)
- The transaction ID the change is part of, if any. Some log records are non-transactional and can’t be rolled back—for instance, changes to PFS pages and differential bitmaps
- The length of the log record in bytes
- The LSN of the previous log record generated by the transaction, if any
- The amount of free space reserved in the log in case the log record needs to be rolled back
The log manager must always ensure there’s enough free space in the log to generate any log records required to roll back a transaction without having to grow the size of the log, making log space reservation critical. If this were not the case, and a transaction rollback required the log to grow, which then failed (e.g., out of space on the log drive), the database must be set offline in the SUSPECT state. You can read a bit more about rollbacks in my article A Complete Introduction to SQL Server Transactions on the SentryOne blog, and I’ll cover them in depth in the next article.
Log records describing changes to table/index pages also include:
- The allocation unit ID that owns the page
- The page ID and the slot ID of the row being changed
- The key values of the row being changed, if any
- A list of what locks were held to protect the change
- The offset of the change in the row
- The before and after images of the change, or an array of before and after images if multiple parts of the row are changed
The before and after images are the set of bytes changed and what they were changed to, respectively, and these allow the change to be rolled back or replayed. Some log records only include the after image since they can’t be rolled back, making the before image unnecessary—for example, a log record generated as part of rolling back another log record or a non-transactional log record.
Many more fields may be present depending on the type of log record, and we’ll encounter some of them as we progress through the rest of this series.
Log Record Types
There are between 70 and 100 log record types, depending on the version of SQL Server (there’s no documented list). All log record names start with LOP, which stands for Log Operation, and some common types include:
- LOP_FORMAT_PAGE – creating a new page image
- LOP_MODIFY_ROW – modifying a row from 1 to 16 contiguous bytes, potentially involving multiple adjacent columns
- LOP_MODIFY_COLUMNS – modifying multiple disjoint columns in a row
- LOP_SET_BITS – setting some bits in an allocation bitmap to 1 or 0
- LOP_INSERT_ROWS – inserting one or more rows on a page
- LOP_DELETE_ROWS – deleting one or more rows on a page
- LOP_EXPUNGE_ROWS – used by the ghost cleanup task when removing deleted rows from a page
- LOP_BEGIN_XACT – always the first log record generated by a transaction
- LOP_COMMIT_XACT – the log record signifying a transaction has successfully committed
- LOP_ABORT_XACT – the log record signifying a transaction has finished being rolled back
Examining Log Records
The way to examine log records is via the undocumented
fn_dblog() function. It requires two LSN parameters specifying the starting and ending points in the log, but I seldom use those and instead specify NULL for both, which means scan all available log records in active VLFs. The great thing about this function is it lets you limit which fields it outputs (otherwise, you get all 130), and you can specify predicates.
- If you want to examine all possible log records, including those in VLFs marked inactive by log truncation, you can enable trace flag 2537.
- If you’re playing around and want to see what log records an operation does, use a test database in simple recovery mode, and issue a checkpoint to truncate the log. This means you can run your operation, and then fn_dblog() will only show you the log records since the checkpoint.
Let’s look at a simple example. I’ve created a database, a table with a single integer column, inserted one row, and performed a backup and a checkpoint. Now I’ll do the following:
UPDATE TestTable SET Column1 = 2; GO SELECT [Current LSN], [Operation], [Context], [Transaction ID], [Log Record Length] FROM fn_dblog (NULL, NULL) WHERE [Operation] NOT LIKE '%CKPT';
Current LSN Operation Context Transaction ID Log Record Length ---------------------- --------------- ------------ -------------- ----------------- 00000030:000000af:0001 LOP_BEGIN_XACT LCX_NULL 0000:000004be 124 00000030:000000af:0002 LOP_SET_BITS LCX_DIFF_MAP 0000:00000000 56 00000030:000000af:0003 LOP_MODIFY_ROW LCX_HEAP 0000:000004be 104 00000030:000000af:0004 LOP_COMMIT_XACT LCX_NULL 0000:000004be 84
The LSN is in hex, and you can see all four log records for the transaction are in the same log block (the
000000AF). Here’s what we’re seeing:
- The transaction begins with the LOP_BEGIN_XACT with transaction ID 000004be and no context.
- As the update will change an extent after a backup, the differential bitmap bit for it must be set, using the LOP_SET_BITS with the context of LCX_DIFF_MAP. Notice the log record doesn’t have a transaction ID as diff map changes are non-transactional.
- The actual modification is logged with the LOP_MODIFY_ROW with context LCX_HEAP (because the table is a heap) and is part of the transaction.
- The transaction ends with the LOP_COMMIT_XACT.
It can be fascinating to spelunk around in the log to see what SQL Server is doing for some operations. You might think this doesn’t have any use in a production environment, but here’s a post showing the function being utilized (and the fn_dump_dblog function for examining log records in backups) to find out who performed a
DROP TABLE and the restore point to get it back.
Until Next Time…
Since I’ve now laid the groundwork for understanding log records, in the next post of this series, I’ll show some deeper examples, including how transaction rollback works. And feel free to post a comment with any questions you might have.
6 thoughts on “The SQL Server Transaction Log, Part 4: Log Records”
How are you, Paul Randal ?
As always, great articles!
A question: Can we capture the types of records from the transaction log and relate them to an active session in the sequence they were sent?
For example, session 100 was opened, and it ran the following commands:
1. Insertion in table 1
2. Update on table 1
3. select on table1
Obs.: The session remains open (Begin transaction) and generating blocks. I need to recover all actions/or commands that were executed, but when I run "DBCC inputbuffer", it will only return the "Select"(last command).
How can I have the information of this session only, in order to identify what caused the blocks?
No questions, just want to say that I'm really enjoying the series and looking forward to the next installment.
You could. If you know what lock things are blocking on, and the associated transaction ID, you could write a script using fn_dblog to look for all log records for that transaction and find which log record shows that lock being held and make an educated guess at what SQL statement did it, but you'll only be able to figure out a simple statement – getting columns, predicates, etc is very difficult.
thanks for good article about transaction-log.
I wonder how can we confirm if group commit is working on my database server.
we recently noticed that log flush gets smaller than older version of sql servers. but there is nothing we can recognize if group commit is enabled in standard version.
and following this article,
we can also see that disable group commit in standard version of sql server.
I don't have any more info on group commit other than what it is, and I only deal with Enterprise. I suggest you open a case with Microsoft to help you. Thanks
I am thinking I should open a case too