Over the last year I've blogged several times on SQLPerformance.com about transaction log performance issues (see here) and I promised to discuss transaction log monitoring, which I'll do in this post. I'm going to present some of the metrics you can track, why you should care, and any advice for dealing with the indicated problem.
The easiest way to monitor transaction log I/O latencies is to use the
sys.dm_io_virtual_file_stats DMV. You'll need to perform some math to get useful results and you can get some example code in the VirtualFileStats.sql script in this demo zip file. You really want to see write latencies of less than 5ms for the transaction log.
Earlier in November I blogged the results of a survey showing transaction log and tempdb data file latencies for more than 25,000 databases around the world (see here), with 80% of databases hitting the 5ms or less mark for transaction log write latency.
If your write latency is higher than 5ms, you can:
- Work to reduce the amount of log being generated and/or the amount of log flushes occurring from tiny transactions, as I described in earlier posts.
- Follow some of the troubleshooting steps I describe in the survey blog post above.
- Move to a faster I/O subsystem, remembering that if you decide to use an SSD, you need to use two in a RAID-1 configuration.
Another thing you can is watch to make sure you aren't hitting the hard limit of 32 outstanding write I/Os for each database's transaction log in 2008 R2 and before (raised to 2012 from SQL Server 2012 onwards). You can try to do this by looking at the Physical Disk/Avg. Disk Write Queue Length, but that's for an entire volume, not per file so if there's anything else on the volume apart from the log file you're interested in, that won't give you a valid number. A better way is to aggregate the results of the
sys.dm_io_pending_io_requests DMV, which lists all outstanding I/Os. Here's some code to do that:
SELECT COUNT (*) AS [PendingIOs], DB_NAME ([vfs].[database_id]) AS [DBName], [mf].[name] AS [FileName], [mf].[type_desc] AS [FileType], SUM ([pior].[io_pending_ms_ticks]) AS [TotalStall] FROM sys.dm_io_pending_io_requests AS [pior] JOIN sys.dm_io_virtual_file_stats (NULL, NULL) AS [vfs] ON [vfs].[file_handle] = [pior].[io_handle] JOIN sys.master_files AS [mf] ON [mf].[database_id] = [vfs].[database_id] AND [mf].[file_id] = [vfs].[file_id] WHERE [pior].[io_pending] = 1 GROUP BY [vfs].[database_id], [mf].[name], [mf].[type_desc] ORDER BY [vfs].[database_id], [mf].[name];
You can easily modify this to only display results for log files (filter on
type_desc ='LOG') and just for the database ID you're interested in.
If you find that you're hitting the 32 limit for a particular database, you can:
- Reduce the amount of log flushes happening by reducing the number of small transactions and watching out for things like page splits and unused/duplicate indexes being changed during data modification operations. You can read more about optimizing log flushes in this blog post
- Try using a faster I/O subsystem
- Upgrade to SQL Server 2012 or higher, where the limit is 112
- Try the
delayed durability featureDMV that was added in SQL Server 2014
- As a last resort, split the workload over multiple databases or servers
If you're interested to see how much transaction log is being generated by your transactions, you can use the
sys.dm_tran_database_transactions DMV, in code similar to that below:
BEGIN TRAN; GO -- Do something you want to evaluate GO SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions WHERE [database_id] = DB_ID (N'YourTestDB'); GO
You might be very surprised at how much log is being generated, especially in tempdb for code that makes use of temporary objects. And of course, tempdb's transaction log can be a bottleneck just like for a user database.
Performance Monitor Counters
The log-related performance counters are all in the Databases performance object. Here are some of the main ones to watch (either with Performance Monitor itself, or using SQL Agent alerts, or using the sys.dm_os_performance_counters DMV, or in your favorite 3rd-party monitoring tool):
You don't want to see this counter increasing as that says there's something happening in the database that's causing more transaction log to be generated than there is current space. It implies that the transaction log isn't able to clear so you should investigate the cause by querying the log_reuse_wait_desc field of sys.databases and take whatever action is required (see the Books Online topic Factors That Can Delay Log Truncation for more details). Some example code would be:
SELECT [log_reuse_wait_desc] FROM sys.databases WHERE [name] = N'YourDB'; GO
Whenever a log growth occurs, the newly allocated portion of the transaction log has to be zeroed out, plus more Virtual Log Files are added – both of which can cause problems as I blogged about previously.
Unless you are the person performing the shrink operation to bring an out-of-control transaction log back under control, you don't want to see this counter increasing. If someone just shrinks the transaction log for no good reason, it will likely grow again, causing problems as I blogged about previously.
Percent Log Used
You should monitor this counter and be concerned if the value goes higher than 90%, as that indicates that a log growth might be imminent and the transaction log is not able to clear correctly, as I discussed above.
Log Flush Waits/sec
You want this value to remain the same or decrease. If it increases, it means you have an I/O subsystem bottleneck or a bottleneck inside the log flush mechanism because you're flushing many small portions of log. An increase here may also correlate with hitting the 32 outstanding I/Os for the log. See the discussion of
sys.dm_io_pending_io_requests above for more details.
Log Bytes Flushed/sec and Log Flushes/sec
These two counters allow you to figure out the average log flush size, by dividing the first counter by the second. The result will be a value between 512 and 61440 (the minimum and maximum sizes of a log flush, respectively). A lower value is more likely to correlate with increasing Log Flush Waits/sec. Again, see the discussion of
sys.dm_io_pending_io_requests above for more details.
For the more advanced among you, there are some Extended Events that you can use to watch what's going on with the log. I recommend you start by using the Database Log File IO Tracking template in SQL Server 2012 SSMS. You can get to this by going to Object Explorer and then your instance -> Management -> Extended Events and right-clicking Sessions to select New Session Wizard. In the Window that comes up, type a session name and select the tracking template from the drop down. Then hit Ctrl+Shift+N and the session will be scripted to a query window. The details of everything in there are beyond the scope of this post, unfortunately, but the template description is pretty explanatory:
There's also a new Extended Event in SQL Server 2012 called transaction_log that can be used to do all sorts of interesting analysis of what log records are being generated. That's definitely a subject I'll cover in a future post.
Given all the information above, you should be able to come up with a pretty good transaction log monitoring system. Transaction log health is of paramount importance for ensuring your workload is performing as it should be and I hope the four posts in this series (plus all the other links) have helped you to improve the overall performance of your SQL Server environment.