You’ve probably heard many times before that SQL Server provides a guarantee for ACID transaction properties. This article focuses on the D part, which of course stands for durability. More specifically, this article focuses on an aspect of the SQL Server logging architecture that enforces transaction durability—log buffer flushes. I talk about the function that the log buffer serves, the conditions that force SQL Server to flush the log buffer to disk, what you can do to optimize transaction performance, as well as recently added related technologies like delayed durability and nonvolatile storage class memory.
Log buffer flushes
The D part in the ACID transaction properties stands for durability. At the logical level it means that when an application sends SQL Server an instruction to commit a transaction (explicitly, or with an auto-commit transaction), SQL Server normally returns control to the caller only after it can guarantee that the transaction is durable. In other words, once the caller got back control after committing a transaction, it can trust that even if a moment later the server experiences a power failure, the transaction changes made it to the database. As long as the server restarts successfully and the database files were not corrupted, you will find that all transaction changes have been applied.
The way SQL Server enforces transaction durability, in part, is by ensuring that all of the transaction’s changes are written to the database’s transaction log on disk before returning control to the caller. In a case of a power failure after a transaction’s commit was acknowledged, you know that all those changes were at least written to the on-disk transaction log. That’s the case even if the related data pages were modified only in the data cache (the buffer pool) but not yet flushed to the data files on disk. When you restart SQL Server, during the redo phase of the recovery process, SQL Server uses the information recorded in the log to replay changes that were applied after the last checkpoint and that haven’t made it to the data files. There’s a bit more to the story depending on the recovery model that you’re using and on whether bulk operations were applied after the last checkpoint, but for the purposes of our discussion, suffice to focus on the part that involves hardening the changes to the transaction log.
The tricky part in SQL Server’s logging architecture is that log writes are sequential. Had SQL Server not used some sort of a log buffer to alleviate log writes to disk, write-intensive systems—especially ones that involve lots of small transactions—would quickly run into terrible log-write-related performance bottlenecks.
To alleviate the negative performance impact of frequent sequential log writes to disk, SQL Server uses a log buffer in memory. Log writes are first done to the log buffer, and certain conditions cause SQL Server to flush, or harden, the log buffer to disk. The hardened unit (aka log block) can range from a minimum of a sector size (512 bytes) to a maximum of 60 KB. Following are conditions that trigger a log buffer flush (ignore the parts that appear in square brackets for now):
- SQL Server gets a commit request of a [fully durable] transaction that changes data [in a database other than tempdb]
- The log buffer fills up, reaching its 60 KB capacity
- SQL Server needs to harden dirty data pages, e.g., during a checkpoint process, and the log records representing the changes to those pages were not yet hardened (write ahead logging, or WAL in short)
- You manually request a log buffer flush by executing the procedure sys.sp_flush_log
- SQL Server writes a new sequence-cache-related recovery value [in a database other than tempdb]
The first four conditions should be pretty clear, if you ignore for now the information in square brackets. The last one perhaps isn’t clear yet, but I’ll explain it in detail later in the article.
The time SQL Server waits for an I/O operation handling a log buffer flush to complete is reflected by the WRITELOG wait type.
So, why is this information so interesting, and what do we do with it? Understanding the conditions that trigger log buffer flushes can help you figure out why certain workloads experience related bottlenecks. Also, in some cases there are actions you can take to reduce or eliminate such bottlenecks. I’ll cover a number of examples like one large transaction versus many small transactions, fully durable versus delayed durable transactions, user database versus tempdb and sequence object caching.
One large transaction versus many small transactions
As mentioned one of the conditions that triggers a log buffer flush is when you commit a transaction to guarantee the transaction’s durability. This means that workloads that involve lots of small transactions, like OLTP workloads, can potentially experience log-write-related bottlenecks.
Even though this is often not the case, if you have a single session submitting lots of small changes, a simple and effective way to optimize the work is to apply the changes in a single big transaction instead of multiple small ones.
Consider the following simplified example (download PerformanceV3 here):
SET NOCOUNT ON;
USE PerformanceV3;
ALTER DATABASE PerformanceV3 SET DELAYED_DURABILITY = Disabled; -- default
DROP TABLE IF EXISTS dbo.T1;
CREATE TABLE dbo.T1(col1 INT NOT NULL);
DECLARE @i AS INT = 1;
WHILE @i <= 1000000
BEGIN
BEGIN TRAN
INSERT INTO dbo.T1(col1) VALUES(@i);
COMMIT TRAN;
SET @i += 1;
END;
This code executes 1,000,000 small transactions that change data in a user database. This work will trigger at least 1,000,000 log buffer flushes. You could get a few additional ones due to the log buffer filling up. You can use the following test template to count the number of log buffer flushes and measure the time it took the work to complete:
-- Test template
-- ... Preparation goes here ...
-- Count log flushes and measure time
DECLARE @logflushes AS INT, @starttime AS DATETIME2, @duration AS INT;
-- Stats before
SET @logflushes = ( SELECT cntr_value FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log Flushes/sec'
AND instance_name = @db );
SET @starttime = SYSDATETIME();
-- ... Actual work goes here ...
-- Stats after
SET @duration = DATEDIFF(second, @starttime, SYSDATETIME());
SET @logflushes = ( SELECT cntr_value FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log Flushes/sec'
AND instance_name = @db ) - @logflushes;
SELECT
@duration AS durationinseconds,
@logflushes AS logflushes;
Even though the performance counter name is Log Flushes/sec, it actually keeps accumulating the number of log buffer flushes thus far. So, the code subtracts the before-work count from the after-work count to figure out the count of log flushes generated by the work. This code also measures the time in seconds it took the work to complete. Even though I don’t do this here, you could, if you wanted to, similarly figure out the number of log records and the size written to the log by the work by querying the before-work and after-work states of the fn_dblog function.
For our example above, following is the part you need to place in the preparation section of the test template:
-- Preparation
SET NOCOUNT ON;
USE PerformanceV3;
ALTER DATABASE PerformanceV3 SET DELAYED_DURABILITY = Disabled;
DROP TABLE IF EXISTS dbo.T1;
CREATE TABLE dbo.T1(col1 INT NOT NULL);
DECLARE @db AS sysname = N'PerformanceV3';
DECLARE @logflushes AS INT, @starttime AS DATETIME2, @duration AS INT;
And following is the part that you need to place in the actual work section:
-- Actual work
DECLARE @i AS INT = 1;
WHILE @i <= 1000000
BEGIN
BEGIN TRAN
INSERT INTO dbo.T1(col1) VALUES(@i);
COMMIT TRAN;
SET @i += 1;
END;
Altogether, you get the following code:
-- Example test with many small fully durable transactions in user database
-- ... Preparation goes here ...
-- Preparation
SET NOCOUNT ON;
USE PerformanceV3;
ALTER DATABASE PerformanceV3 SET DELAYED_DURABILITY = Disabled;
DROP TABLE IF EXISTS dbo.T1;
CREATE TABLE dbo.T1(col1 INT NOT NULL);
DECLARE @db AS sysname = N'PerformanceV3';
DECLARE @logflushes AS INT, @starttime AS DATETIME2, @duration AS INT;
-- Stats before
SET @logflushes = ( SELECT cntr_value FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log Flushes/sec'
AND instance_name = @db );
SET @starttime = SYSDATETIME();
-- ... Actual work goes here ...
-- Actual work
DECLARE @i AS INT = 1;
WHILE @i <= 1000000
BEGIN
BEGIN TRAN
INSERT INTO dbo.T1(col1) VALUES(@i);
COMMIT TRAN;
SET @i += 1;
END;
-- Stats after
SET @duration = DATEDIFF(second, @starttime, SYSDATETIME());
SET @logflushes = ( SELECT cntr_value FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log Flushes/sec'
AND instance_name = @db ) - @logflushes;
SELECT
@duration AS durationinseconds,
@logflushes AS logflushes;
This code took 193 seconds to complete on my system, and triggered 1,000,036 log buffer flushes. That’s very slow, but can be explained due to the large number of log flushes.
In typical OLTP workloads different sessions submit small changes in different small transactions concurrently, so it’s not like you really have the option to encapsulate lots of small changes in a single big transaction. However, if your situation is that all of the small changes are submitted from the same session, a simple way to optimize the work is to encapsulate it in a single transaction. This will give you two main benefits. One is that your work will write fewer log records. With 1,000,000 small transactions, each transaction actually writes three log records: one for beginning the transaction, one for the change, and one for committing the transaction. So, you’re looking at around 3,000,0000 transaction log records versus a bit over 1,000,000 when executed as one big transaction. But more importantly, with one big transaction most of the log flushes are triggered only when the log buffer fills up, plus one more log flush at the very end of the transaction when it commits. The performance difference can be quite significant. To test the work in one big transaction, use the following code in the actual work part of the test template:
-- Actual work
BEGIN TRAN;
DECLARE @i AS INT = 1;
WHILE @i <= 1000000
BEGIN
INSERT INTO dbo.T1(col1) VALUES(@i);
SET @i += 1;
END;
COMMIT TRAN;
On my system, this work completed in 7 seconds, and triggered 1,758 log flushes. Here’s a comparison between the two options:
#transactions log flushes duration in seconds -------------- ------------ -------------------- 1000000 1000036 193 1 1758 7
But again, in typical OLTP workloads, you don’t really have the option to replace many small transactions submitted from different sessions with one big transaction submitted from the same session.
Fully durable versus delayed durable transactions
Starting with SQL Server 2014, you can use a feature called delayed durability that allows you to improve the performance of workloads with many small transactions, even if submitted by different sessions, by sacrificing the normal full durability guaranty. When committing a delayed durable transaction, SQL Server acknowledges the commit as soon as the commit log record is written to the log buffer, without triggering a log buffer flush. The log buffer is flushed due to any of the other aforementioned conditions like when it fills up, but not when a delayed durable transaction commits.
Before using this feature, you need to think very carefully whether it’s appropriate for you. In terms of performance its impact is significant only in workloads with lots of small transactions. If to begin with your workload mainly involves large transactions, you will probably not see any performance advantage. More importantly, you need to realize the potential for data loss. Say the application commits a delayed durable transaction. A commit record is written to the log buffer and immediately acknowledged (control given back to the caller). If SQL Server experiences a power failure before the log buffer is flushed, after restart, the recovery process undoes all the changes that were made by the transaction, even though the application thinks that it was committed.
So, when is it OK to use this feature? One obvious case is when data loss is not a problem, like this example from SentryOne's Melissa Connors. Another is when after a restart you have the means to identify which changes didn’t make it to the database, and you’re able to reproduce them. If your situation does not fall into one of these two categories, don’t use this feature despite the temptation.
To work with delayed durable transactions, you need to set a database option called DELAYED_DURABILITY. This option can be set to one of three values:
- Disabled (default): all transactions in the database are fully durable, and therefore every commit triggers a log buffer flush
- Forced: all transactions in the database are delayed durable, and therefore commits do not trigger a log buffer flush
- Allowed: unless otherwise mentioned, transactions are fully durable and committing them triggers a log buffer flush; however, if you use the option DELAYED_DURABILITY = ON in either a COMMIT TRAN statement or an atomic block (of a natively compiled proc), that particular transaction is delayed durable and therefore committing it does not trigger a log buffer flush
As a test, use the following code in the preparation section of our test template (notice the database option is set to Forced):
-- Preparation
SET NOCOUNT ON;
USE PerformanceV3; -- http://tsql.solidq.com/SampleDatabases/PerformanceV3.zip
ALTER DATABASE PerformanceV3 SET DELAYED_DURABILITY = Forced;
DROP TABLE IF EXISTS dbo.T1;
CREATE TABLE dbo.T1(col1 INT NOT NULL);
DECLARE @db AS sysname = N'PerformanceV3';
And use the following code in the actual work section (notice, 1,000,000 small transactions):
-- Actual work
DECLARE @i AS INT = 1;
WHILE @i <= 1000000
BEGIN
BEGIN TRAN
INSERT INTO dbo.T1(col1) VALUES(@i);
COMMIT TRAN;
SET @i += 1;
END;
Alternatively, you could use the Allowed mode at the database level, and then in the COMMIT TRAN command, add WITH (DELAYED_DURABILITY = ON).
On my system, the work took 22 seconds to complete and triggered 95,407 log flushes. That’s longer than running the work as one big transaction (7 seconds) since more log records are generated (remember, per transaction, one for beginning the transaction, one for the change, and one for committing the transaction); however, it’s much faster than the 193 seconds that it took the work to complete using 1,000,000 fully durable transactions since the number of log flushes dropped from over 1,000,000 to fewer than 100,000. Plus, with delayed durability, you would get the performance gain even if the transactions are submitted from different sessions where it’s not an option to use one big transaction.
To demonstrate that there’s no benefit using delayed durability when doing the work as big transactions, keep the same code in the preparation part of the last test, and use the following code in the actual work part:
-- Actual work
BEGIN TRAN;
DECLARE @i AS INT = 1;
WHILE @i <= 1000000
BEGIN
INSERT INTO dbo.T1(col1) VALUES(@i);
SET @i += 1;
END;
COMMIT TRAN;
I got 8 seconds run time (compared to 7 for one big fully durable transaction) and 1,759 log flushes (compared to 1,758). The numbers are essentially the same, but with the delayed durable transaction you do have the risk of data loss.
Here’s a summary of the performance numbers for all four tests:
durability #transactions log flushes duration in seconds ------------------- -------------- ------------ -------------------- full 1000000 1000036 193 full 1 1758 7 delayed 1000000 95407 22 delayed 1 1759 8
Storage class memory
The delayed durability feature can significantly improve the performance of OLTP-style workloads that involve a large number of small update transactions that require high frequency and low latency. The problem is that you are risking data loss. What if you can’t allow any data loss, but you still want delayed-durability-like performance gains, where the log buffer doesn’t get flushed for every commit, rather when it fills up? We all like to eat the cake and have it too, right?
You can achieve this in SQL Server 2016 SP1 or later by using storage class memory, aka NVDIMM-N nonvolatile storage. This hardware is essentially a memory module giving you memory-grade performance, but the information there is persisted and therefore not lost when power is gone. The addition in SQL Server 2016 SP1 lets you configure the log buffer as a persisted one on such hardware. To do this, you set up the SCM as a volume in Windows and format it as a Direct Access Mode (DAX) volume. You then add a log file to the database using the normal ALTER DATABASE <db> ADD LOG FILE command, with the file path residing on the DAX volume, and set the size to 20 MB. SQL Server, in turn, recognizes that it’s a DAX volume, and from that moment treats the log buffer as a persisted one on that volume. Transaction commit events do not trigger log buffer flushes anymore, rather once the commit was recorded in the log buffer, SQL Server knows that it’s actually persisted, and therefore returns control to the caller. When the log buffer fills up, then SQL Server flushes it to the transaction log files on the traditional storage.
For more details on this feature, including performance numbers, see Transaction Commit latency acceleration using Storage Class Memory in Windows Server 2016/SQL Server 2016 SP1 by Kevin Farlee.
Curiously, SQL Server 2019 enhances support for storage class memory beyond just the persisted log cache scenario. It supports placing data files, log files, and In-Memory OLTP checkpoint files on such hardware. All you need to do is expose it as a volume at the OS level and format as a DAX drive. SQL Server 2019 automatically recognizes this technology, and works in an enlightened mode, directly accessing the device, bypassing the OS’s storage stack. Welcome to the future!
User database versus tempdb
The tempdb database is of course created from scratch as a fresh copy of the model database every time you restart SQL Server. As such, there’s never a need to recover any data that you write to tempdb, whether you write it to temporary tables, table variables, or user tables. It’s all gone after restart. Knowing this, SQL Server can relax a lot of the logging-related requirements. For instance, irrespective of whether you enable the delayed durability option or not, commit events do not trigger a log buffer flush. Furthermore, the amount of information that needs to be logged is reduced since SQL Server needs only enough information to support rolling back transactions, or undoing work, if needed, but not rolling transactions forward, or redoing work. As a result, transaction log records representing changes to an object in tempdb tend to be smaller compared to when the same change is applied to an object in a user database.
To demonstrate this, you will run the same tests that you ran earlier in PerformanceV3, only this time in tempdb. We’ll start with the test of many small transactions when the database option DELAYED_DURABILITY is set to Disabled (default). Use the following code in the preparation section of the test template:
-- Preparation
SET NOCOUNT ON;
USE tempdb;
ALTER DATABASE tempdb SET DELAYED_DURABILITY = Disabled;
DROP TABLE IF EXISTS dbo.T1;
CREATE TABLE dbo.T1(col1 INT NOT NULL);
DECLARE @db AS sysname = N'tempdb';
Use the following code in the actual work section:
-- Actual work
DECLARE @i AS INT = 1;
WHILE @i <= 1000000
BEGIN
BEGIN TRAN
INSERT INTO dbo.T1(col1) VALUES(@i);
COMMIT TRAN;
SET @i += 1;
END;
This work generated 5,095 log flushes, and it took 19 seconds to complete. That’s compared with over a million log flushes and 193 seconds in a user database with full durability. That’s even better than with delayed durability in a user database (95,407 log flushes and 22 seconds) due to the reduced size of the log records.
To test one large transaction, leave the preparation section unchanged, and use the following code in the actual work section:
-- Actual work
BEGIN TRAN;
DECLARE @i AS INT = 1;
WHILE @i <= 1000000
BEGIN
INSERT INTO dbo.T1(col1) VALUES(@i);
SET @i += 1;
END;
COMMIT TRAN;
I got 1,228 log flushes, and 9 seconds run time. That’s compared with 1,758 log flushes and 7 seconds run time in the user database. The run time is similar, even a bit faster in the user database, but it could be small variations between the tests. The sizes of the log records in tempdb are reduced, and hence you get fewer log flushes compared to the user database.
You can also try running the tests with the DELAYED_DURABILITY option set to Forced, but this will have no impact in tempdb since, as mentioned, anyway commit events don’t trigger a log flush in tempdb.
Here are the performance measures for all of the tests, both in the user database and in tempdb:
database durability #transactions log flushes duration in seconds -------------- ------------------- -------------- ------------ -------------------- PerformanceV3 full 1000000 1000036 193 PerformanceV3 full 1 1758 7 PerformanceV3 delayed 1000000 95407 22 PerformanceV3 delayed 1 1759 8 tempdb full 1000000 5095 19 tempdb full 1 1228 9 tempdb delayed 1000000 5091 18 tempdb delayed 1 1226 9
Sequence object caching
Perhaps a surprising case that triggers log buffer flushes is related to the sequence object cache option. Consider as an example the following sequence definition:
CREATE SEQUENCE dbo.Seq1 AS BIGINT MINVALUE 1 CACHE 50; -- the default cache size is 50;
Every time you need a new sequence value, you use the NEXT VALUE FOR function, like so:
SELECT NEXT VALUE FOR dbo.Seq1;
The CACHE property is a performance feature. Without it, every time a new sequence value was requested, SQL Server would have had to write the current value to disk for recovery purposes. Indeed, that’s the behavior that you get when using the NO CACHE mode. Instead, when the option is set to a value greater than zero, SQL Server writes a recovery value to disk only once every cache-size number of requests. SQL Server maintains two members in memory, sized as the sequence type, one holding the current value, and one holding the number of values left before the next disk write of the recovery value is needed. In case of a power failure, upon restart, SQL Server sets the current sequence value to the recovery value.
This is probably much easier to explain with an example. Consider the above sequence definition with the CACHE option set to 50 (default). You request a new sequence value for the first time by running the above SELECT statement. SQL Server sets the aforementioned members to the following values:
On disk recovery value: 50, In-memory current value: 1, In-memory values left: 49, You get: 1
49 more requests are not going to touch the disk, rather only update the memory members. After 50 requests in total, the members are set to the following values:
On disk recovery value: 50, In-memory current value: 50, In-memory values left: 0, You get: 50
Make another request for a new sequence value, and this triggers a disk write of the recovery value 100. The members are then set to the following values:
On disk recovery value: 100, In-memory current value: 51, In-memory values left: 49, You get: 51
If at this point the system experiences a power failure, after restart, the current sequence value is set to 100 (the value recovered from disk). The next request for a sequence value produces 101 (writing the recovery value 150 to disk). You lost all values in the range 52 to 100. The most you can lose due to an unclean termination of the SQL Server process, like in a case of a power failure, is as many values as the cache size. The tradeoff is clear; the larger the cache size, the fewer the disk writes of the recovery value, and hence the better the performance. At the same time, the bigger the gap that can be generated between two sequence values in case of a power failure.
All of this is pretty straightforward, and perhaps you are very well familiar with how it works. What might be surprising is that every time SQL Server writes a new recovery value to disk (every 50 requests in our example), it also hardens the log buffer. That’s not the case with the identity column property, even though SQL Server internally uses the same caching feature for identity like it does for the sequence object, it just doesn’t let you control its size. It’s on by default with size 10000 for BIGINT and NUMERIC, 1000 for INT, 100 for SMALLINT and 10 FOR TINYINT. If you want you can turn it off with trace flag 272 or the IDENTITY_CACHE scoped configuration option (2017+). The reason SQL Server does not need to flush the log buffer when writing the identity-cache-related recovery value to disk is that a new identity value can only be created when inserting a row into a table. In case of a power failure, a row inserted into a table by a transaction that did not commit will be pulled out of the table as part of the database recovery process when the system restarts. So, even if after the restart SQL Server generates the same identity value like the one created in the transaction that didn’t commit, there’s no chance for duplicates since the row was pulled out of the table. Had the transaction committed, this would have triggered a log flush, which would also persist the writing of a cache-related recovery value. Hence, Microsoft didn’t feel compelled to flush the log buffer every time an identity-cache-related disk write of the recovery value takes place.
With the sequence object the situation is different. An application can request a new sequence value and not store it in the database. In case of a power failure after the creation of a new sequence value in a transaction that didn’t commit, after restart, there’s no way for SQL Server to tell the application to not rely on that value. Therefore, to avoid creating a new sequence value after restart that is equal to a previously generated sequence value, SQL Server forces a log flush every time a new sequence-cache-related recovery value is written to disk. One exception to this rule is when the sequence object is created in tempdb, of course there’s no need for such log flushes since anyway after a system restart tempdb is created anew.
A negative performance impact of the frequent log flushes is especially noticeable when using a very small sequence cache size, and in one transaction generating lots of sequence value, e.g., when inserting lots of rows into a table. Without the sequence, such a transaction would mostly harden the log buffer when it fills up, plus one more time when the transaction commits. But with the sequence, you get a log flush every time a disk write of a recovery value takes place. That’s why you want to avoid using a small cache size — not to speak of the NO CACHE mode.
To demonstrate this, use the following code in the preparation section of our test template:
-- Preparation
SET NOCOUNT ON;
USE PerformanceV3; -- try PerformanceV3, tempdb
ALTER DATABASE PerformanceV3 -- try PerformanceV3, tempdb
SET DELAYED_DURABILITY = Disabled; -- try Disabled, Forced
DROP TABLE IF EXISTS dbo.T1;
DROP SEQUENCE IF EXISTS dbo.Seq1;
CREATE SEQUENCE dbo.Seq1 AS BIGINT MINVALUE 1 CACHE 50; -- try NO CACHE, CACHE 50, CACHE 10000
DECLARE @db AS sysname = N'PerformanceV3'; -- try PerformanceV3, tempdb
And the following code in the actual work section:
-- Actual work
SELECT
-- n -- to test without seq
NEXT VALUE FOR dbo.Seq1 AS n -- to test sequence
INTO dbo.T1
FROM PerformanceV3.dbo.GetNums(1, 1000000) AS N;
This code uses one transaction to write 1,000,000 rows into a table using the SELECT INTO statement, generating as many sequence values as the number of inserted rows.
As instructed in the comments, run the test with NO CACHE, CACHE 50 and CACHE 10000, both in PerformanceV3 and in tempdb, and try both fully durable transactions and delayed durable ones.
Here are the performance numbers that I got on my system:
database durability cache log flushes duration in seconds -------------- ------------------- --------- ------------ -------------------- PerformanceV3 full NO CACHE 1000047 171 PerformanceV3 full 50 20008 4 PerformanceV3 full 10000 339 < 1 tempdb full NO CACHE 96 4 tempdb full 50 74 1 tempdb full 10000 8 < 1 PerformanceV3 delayed NO CACHE 1000045 166 PerformanceV3 delayed 50 20011 4 PerformanceV3 delayed 10000 334 < 1 tempdb delayed NO CACHE 91 4 tempdb delayed 50 74 1 tempdb delayed 10000 8 < 1
There are quite a few interesting things to notice.
With NO CACHE you get a log flush for every single sequence value generated. Therefore, it’s strongly recommended to avoid it.
With a small sequence cache size, you still get lots of log flushes. Perhaps the situation is not as bad as with NO CACHE, but observe that the workload took 4 seconds to complete with the default cache size of 50 compared to less than a second with the size 10,000. I personally use 10,000 as my preferred value.
In tempdb you don’t get log flushes when a sequence cache-related recovery value is written to disk, but the recovery value is still written to disk every cache-sized number of requests. That’s perhaps surprising since such a value would never need to be recovered. Therefore, even when using a sequence object in tempdb, I’d still recommend using a large cache size.
Also notice that delayed durability doesn’t prevent the need for log flushes every time the sequence cache-related recovery value is written to disk.
Conclusion
This article focused on log buffer flushes. Understanding this aspect of SQL Server’s logging architecture is important especially in order to be able to optimize OLTP-style workloads that require high frequency and low latency. Workloads using In-Memory OLTP included, of course. You have more options with newer features like delayed durability and persisted log buffer with storage class memory. Make sure you’re very careful with the former, though, since it does incur potential for data loss unlike the latter.
Be careful not to use the sequence object with a small cache size, not to speak of the NO CACHE mode. I find the default size 50 too small and prefer to use 10,000. I’ve heard people expressing concerns that with a cache size 10000, after multiple power failures they might lose all the values in the type. However, even with a four-byte INT type, using only the positive range, 10,000 fits 214,748 times. If your system experience that many power failures, you have a completely different problem to worry about. Therefore, I feel very comfortable with a cache size of 10,000.
Thanks very much for this.
You're most welcome, Charlie.
Nice article.
Small typo:
"You can achieve this in SQL Server SP1" => "You can achieve this in SQL Server 2016 SP1"
Excellent article. Thanks Itzik
Thanks, Andrej; this is now fixed.
Thanks Justin; I'm glad you liked it.
thanks for an awesome article!
Fixed the issue, sorry about that!
Great article as always Itzik!
I had a web-scale client not long ago that received a substantial increase in throughput (which at peak load was over 100000 transactions per second) by switching the database to delayed durability. We were eventually able to shift the huge data ingestion constructs to a separate database with In-Memory OLTP for a far better gain in throughput and as well as a decrease in server load.
Thanks Mo!
Thanks, Kevin.
With In-Memory OLTP Microsoft managed to eliminate most of the classic bottlenecks that are related to OLTP workloads, but not the log buffer flushes bottleneck. I wouldn't be surprised if the original motivation/trigger to introduce the delayed durability feature was In-Memory OLTP.
Great article. Thanks.
Thanks Michael!
Thanks, Itzik, great article, very clear explanations and examples.
In a future article would it be possible to give equally good explanation and examples of the multiple log writer threads enhancement in SQL 2016 and above ? I believe there's a trace flag to switch this off. I've been looking for a very good example showing its effect but not found one so far.
Thanks again.
Thanks, LondonDBA, and thanks for the suggestion. Not sure if it will be me who writes on the topic, but it's good to know which topics are of interest to people.