SQL Server has a cost-based optimizer that uses knowledge about the various tables involved in a query to produce what it decides is the most optimal plan in the time available to it during compilation. This knowledge includes whatever indexes exist and their sizes and whatever column statistics exist. Part of what goes into finding the optimal query plan is trying to minimize the number of physical reads needed during plan execution.
One thing I’ve been asked a few times is why the optimizer doesn’t consider what’s in the SQL Server buffer pool when compiling a query plan, as surely that could make a query execute faster. In this post, I’ll explain why.
Figuring Out Buffer Pool Contents
The first reason why the optimizer ignores the buffer pool is that it’s a non-trivial problem to figure out what’s in the buffer pool because of the way the buffer pool is organized. Data file pages are controlled in the buffer pool by small data structures called buffers, which track things like (non-exhaustive list):
- The page’s ID (file number: page-number-in-file)
- The last time the page was referenced (used by the lazy writer to help implement the least-recently-used algorithm that creates free space when needed)
- The memory location of the 8KB page in the buffer pool
- Whether the page is dirty or not (a dirty page has changes on it that haven’t yet been written back to durable storage)
- The allocation unit the page belongs to (explained here) and the allocation unit ID can be used to figure out what table and index the page is part of
For each database that has pages in the buffer pool, there is a hash list of pages, in page ID order, that’s quickly searchable to determine whether a page is already in memory or whether a physical read has to be performed. However, nothing easily allows SQL Server to determine what percentage of the leaf level for each index of a table is already in memory. Code would have to scan the entire list of buffers for the database, looking for buffers that map pages for the allocation unit in question. And the more pages in memory for a database, the longer the scan would take. It would be prohibitively expensive to do as part of query compilation.
If you’re interested, I wrote a post a while back with some T-SQL code that scans the buffer pool and gives some metrics, using the DMV sys.dm_os_buffer_descriptors.
Why Using Buffer Pool Contents Would Be Dangerous
Let’s pretend there *is* a highly efficient mechanism to determine buffer pool contents the optimizer can use to help it choose which index to use in a query plan. The hypothesis I’m going to explore is if the optimizer knows enough of a less efficient (larger) index is already in memory, compared to the most efficient (smaller) index to use, it should pick the in-memory index because it will reduce the number of physical reads required and the query will run faster.
The scenario I’m going to use is as follows: a table BigTable has two nonclustered indexes, Index_A and Index_B, both completely covering a particular query. The query requires a complete scan of the leaf level of the index to retrieve the query results. The table has 1 million rows. Index_A has 200,000 pages at its leaf level, and Index_B has 1 million pages at its leaf level, so a complete scan of Index_B requires processing five times more pages.
I created this contrived example on a laptop running SQL Server 2019 with 8 processor cores, 32GB of memory, and solid-state disks. The code is as follows:
CREATE TABLE BigTable (
c1 BIGINT IDENTITY,
c2 AS (c1 * 2),
c3 CHAR (1500) DEFAULT 'a',
c4 CHAR (5000) DEFAULT 'b'
);
GO
INSERT INTO BigTable DEFAULT VALUES;
GO 1000000
CREATE NONCLUSTERED INDEX Index_A ON BigTable (c2) INCLUDE (c3);
-- 5 records per page = 200,000 pages
GO
CREATE NONCLUSTERED INDEX Index_B ON BigTable (c2) INCLUDE (c4);
-- 1 record per page = 1 million pages
GO
CHECKPOINT;
GO
And then I timed the contrived queries:
DBCC DROPCLEANBUFFERS;
GO
-- Index_A not in memory
SELECT SUM (c2) FROM BigTable WITH (INDEX (Index_A));
GO
-- CPU time = 796 ms, elapsed time = 764 ms
-- Index_A in memory
SELECT SUM (c2) FROM BigTable WITH (INDEX (Index_A));
GO
-- CPU time = 312 ms, elapsed time = 52 ms
DBCC DROPCLEANBUFFERS;
GO
-- Index_B not in memory
SELECT SUM (c2) FROM BigTable WITH (INDEX (Index_B));
GO
-- CPU time = 2952 ms, elapsed time = 2761 ms
-- Index_B in memory
SELECT SUM (c2) FROM BigTable WITH (INDEX (Index_B));
GO
-- CPU time = 1219 ms, elapsed time = 149 ms
You can see when neither index is in memory, Index_A is easily the most efficient index to use, with an elapsed query time of 764ms against 2,761ms using Index_B, and the same is true when both indexes are in memory. However, if Index_B is in memory, and Index_A is not, if the query uses Index_B (149ms) it’s going to run faster than if it uses Index_A (764ms).
Now let’s allow the optimizer to base plan choice on what’s in the buffer pool…
If Index_A is mostly not in memory and Index_B is mostly in memory, it would be more efficient to compile the query plan to use Index_B, for a query running at that instant. Even though Index_B is larger and would need more CPU cycles to scan through, physical reads are much slower than the extra CPU cycles so a more efficient query plan minimizes the number of physical reads.
This argument only holds, and a “use Index_B” query plan is only more efficient than a “use Index_A” query plan, if Index_B remains mostly in memory, and Index_A remains mostly not in memory. As soon as most of Index_A is in memory, the “use Index_A” query plan would be more efficient, and the “use Index_B” query plan is the wrong choice.
The situations when the compiled “use Index_B” plan is less efficient than the cost-based “use Index_A” plan are (generalizing):
- Index_A and Index_B are both in memory: the compiled plan will take almost three times longer
- Neither index is memory resident: the compiled plan take over 3.5 times longer
- Index_A is memory resident and Index_B isn’t: all physical reads performed by the plan are extraneous, AND it will take a whopping 53 times longer
Summary
Although in our thought exercise, the optimizer can use buffer pool knowledge to compile the most efficient query at a single instant, it would be a dangerous way to drive plan compilation because of the potential volatility of the buffer pool contents, making the future efficiency of the cached plan highly unreliable.
Remember, the optimizer’s job is to find a good plan fast, not necessarily the single best plan for 100% of all situations. In my opinion, the SQL Server optimizer does the right thing by ignoring the actual contents of the SQL Server buffer pool, and instead relies on the various costing rules instead to produce a query plan that is likely to be the most efficient most of the time.
Shows how powerful the optimizer already is. For end of month or payroll processing there are popular indexes that could be pinned in memory but on a daily basis – users and queries are way too varied.
When there are multiple databases but only one buffer pool – this gets harder.
DB 2 on z/OS (mainframe) is easier. We have fewer tables than most windows applications making index re-use easier.
Does sql server 2019 do adjustment of estimated IO costs based on the speed of the hardware disks? Ie new SSD vs spinning hard drives.
I often see the problem of a high IO cost estimate being a scan and hash join when a forced loop join is faster.
Not that I know of.
I see also an other "danger":
Lets say index a not in memeory, index b in memory. Now execute your query 10 times.
If optimizier "correctly" chooses index b situation will not change. You will have 10*149ms = 1490ms.
If optimizer chooses "wrongly" index a than after the first execution index a will be in memory. You get 1*764ms + 9*52ms = 1232ms
Optimizer cannot predict what users will do next