In many SQL Server workloads, especially OLTP, the database's transaction log can be a bottleneck that adds to the time it takes a transaction to complete. Most people assume that the I/O subsystem is the real bottleneck, with it not being able to keep up with the amount of transaction log being generated by the workload.
Transaction Log Write Latency
The latency of write operations to the transaction log can be monitored using the
sys.dm_io_virtual_file_stats DMV and correlated with the
WRITELOG waits that are occurring on the system. I recorded a demo video of analyzing transaction log I/O back in 2011 so I won't repeat all of that in this post. You can get the video here and the demo code here (suitable for running in production right away).
If the write latency is higher than you'd expect for your I/O subsystem then the I/O subsystem cannot keep up, as is the general supposition. Does that mean that the I/O subsystem needs to be improved though? Not necessarily.
On many client systems I've found that a significant proportion of log records being generated are unnecessary, and if you can reduce the number of log records being generated, you reduce the amount of transaction log being written to disk. This should translate into a reduction in write latency, thus reducing transaction completion time.
There are two main causes of extraneous log records being generated: unused nonclustered indexes, and indexes becoming fragmented.
Unused Nonclustered Indexes
Whenever a record is inserted into a table, a record must be inserted into each nonclustered index defined on the table (with the exception of filtered indexes with appropriate filters, which I'll ignore from this point). This means that extra log records are generated, at least one per nonclustered index, for each table insert. The same thing applies to deleting a record in a table – the matching records must be deleted from all the nonclustered indexes. For an update to a table record, nonclustered index records are only updated if the nonclustered index key column(s) or included column(s) were part of the update.
These operations are necessary, of course, to keep each nonclustered index correct with respect to the table, but if the nonclustered index is unused by the workload, then the operations and the log records produced by them are unnecessary overhead. Furthermore, if these unused indexes become fragmented (which I'll discuss later in this post), then the regular index maintenance tasks will also operate on them, generating even more log records (from the index
REORGANIZE operations) completely unnecessarily.
Unused indexes come from a variety of sources such as someone mistakenly creating an index per table column, someone creating every index suggested by the missing index DMVs, or someone creating all indexes suggested by the Database Tuning Advisor. It could also be that the workload characteristics have changed and so what used to be useful indexes are no longer being used.
Wherever they came from, unused indexes should be removed to reduce their overhead. You can determine which indexes are unused using the sys.dm_db_index_usage_stats DMV, and I recommend you read posts by my colleagues Kimberly L. Tripp (here), and Joe Sack (here and here), as they explain how to use the DMV correctly.
Most people think of index fragmentation as a problem that affects queries that have to read large amounts of data. While this is one of the problems that fragmentation can cause, fragmentation is also a problem because of how it occurs.
Fragmentation is caused by an operation called a page split. The simplest cause of a page split is when an index record must be inserted on a particular page (because of its key value) and the page does not have enough free space. In this scenario, the following operations will take place:
- A new index page is allocated and formatted
- Some of the records from the full page are moved to the new page, thus creating free space in the required page
- The new page is linked into the index structure
- The new record is inserted on the required page
All of these operations generate log records, and as you might imagine, this can be significantly more than is required to insert a new record on a page that does not require a page split. Back in 2009 I blogged an analysis of page split cost in terms of the transaction log and found some cases where a page split generated over 40 times more transaction log than a regular insert!
The first step in reducing the extra cost is to remove unused indexes, as I described above, so that they're not generating page splits. The second step is to identify remaining indexes that are becoming fragmented (and so must be suffering page splits) using the
sys.dm_db_index_physical_stats DMV (or the new SQL Sentry Fragmentation Manager) and proactively creating free space in them using an index fillfactor. A fillfactor instructs SQL Server to leave empty space on index pages when the index is built, rebuilt, or reorganized so that there is space to allow new records to be inserted without requiring a page split, hence cutting down on the extra log records generated.
Of course nothing comes for free – the trade-off when using fillfactors is that you are proactively provisioning extra space in the indexes to prevent more log records being generated – but that's usually a good trade-off to make. Choosing a fillfactor is relatively easy and I blogged about that here.
Reducing the write latency of a transaction log file does not always mean moving to a faster I/O subsystem, or segregating the file into its own portion of the I/O subsystem. With some simple analysis of the indexes in your database, you may be able to significantly reduce the amount of transaction log records being generated, leading to a commensurate reduction in write latency.
There are other, more subtle issues that can affect transaction log performance, and I'll explore those in a future post.