Rob Farley

Estimated Number of Rows to be Read

January 10, 2017 by in SQL Performance, SQL Plan | 2 Comments
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Jonathan Kehayias is a Principal Consultant with SQLskills and the youngest MCM ever.

Jonathan’s Posts

T-SQL TuesdayDon’t get me wrong – I love the Actual Rows Read property that we saw arrive in SQL Server’s execution plans in late 2015. But in SQL Server 2016 SP1, less than two months ago (and considering we’ve had Christmas in between, I don’t think much of the time since then counts), we got another exciting addition – Estimated Number of Rows to be Read (oh, and this is somewhat down to the Connect item I submitted, both demonstrating that Connect Items are worth submitting and making this post eligible for this month’s T-SQL Tuesday, hosted by Brent Ozar (@brento) on the topic of Connect items).

Let’s recap a moment… when the SQL Engine access data in a table, it uses either a Scan operation or a Seek operation. And unless that Seek has a Seek Predicate that can access at most one row (because it’s looking for an equality match on a set of columns – could be just a single column – which are known to be unique), then the Seek will perform a RangeScan, and behaves just like a Scan, just across the subset of rows that are satisfied by the Seek Predicate.

The rows satisfied by a Seek Predicate (in the case of a Seek operation’s RangeScan) or all the rows in the table (in the case of a Scan operation) are treated in essentially the same way. Both might get terminated early if no more rows are requested from the operator to its left, for example if a Top operator somewhere has already grabbed enough rows, or if a Merge Operator has no more rows to match against. And both might be filtered further by a Residual Predicate (shown as the ‘Predicate’ property) before the rows even get served up by the Scan/Seek operator. The “Number of Rows” and “Estimated Number of Rows” properties would tell us how many rows were expected to be produced by the operator, but we didn’t have any information about how may rows would be filtered by just the Seek Predicate. We could see the TableCardinality, but this was only really useful for Scan operators, where there was a chance that the Scan might look through the whole table for the rows it needed. It wasn’t useful at all for Seeks.

Table Cardinality

The query that I’m running here is against the WideWorldImporters database, and is:

SELECT COUNT(*)
FROM Sales.Orders
WHERE SalespersonPersonID = 7
AND YEAR(OrderDate) = 2013
AND MONTH(OrderDate) = 4;

Furthermore, I have an index in play:

CREATE NONCLUSTERED INDEX rf_Orders_SalesPeople_OrderDate 
  ON Sales.Orders (SalespersonPersonID, OrderDate);

This index is covering – the query doesn’t need any other columns to get its answer – and has been designed so that a Seek Predicate can be used on SalespersonPersonID, quickly filtering the data down to a smaller range. The functions on OrderDate mean that those last two predicates can’t be used within the Seek Predicate, so they are relegated to the Residual Predicate instead. A better query would filter those dates using OrderDate >= '20130401' AND OrderDate < '20130501', but I’m imagining a scenario here which is all too common…

Now, if I run the query, I can see the impact of the Residual Predicates. Plan Explorer even gives that useful warning that I’d written about before.

rf-tsql-plan

I can see very clearly that the RangeScan is 7,276 rows, and that the Residual Predicate filters this down to 149. Plan Explorer shows more information about this on the tooltip:

rf-tsql2-2

But without running the query, I can’t see that information. It’s simply not there. The properties in the estimated plan don’t have it:

rf-tsql2-3

And I’m sure I don’t need to remind you – this information is not present in the plan cache either. Having grabbed the plan from the cache using:

SELECT p.query_plan, t.text
FROM sys.dm_exec_cached_plans c
CROSS APPLY sys.dm_exec_query_plan(c.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) t
WHERE t.text LIKE '%YEAR%';

I opened it up, and sure enough, no sign of that 7,276 value. It looks just the same as the estimated plan I just showed.

Getting plans out of the cache is where the estimated values come into their own. It’s not just that I’d prefer to not actually run potentially-expensive queries on customer databases. Querying the plan cache is one thing, but running queries to get the actuals – that’s a lot harder.

With SQL 2016 SP1 installed, thanks to that Connect item, I can now see the Estimated Number of Rows to be Read property in estimated plans, and in the plan cache. The operator tooltip shown here is taken from the cache, and I can easily see that Estimated property showing 7,276, as well as the residual warning:

rf-ertbr-pe1

This is something which I could do on a customer box, looking in the cache for situations in problematic plans where the ratio of Estimated Number of Rows to be Read and Estimated Number of Rows isn’t great. Potentially, someone could make a process that checked every plan in the cache, but it’s not something that I’ve done.

Astute reading will have noticed that the Actual Rows that came out of this operator was 149, which was much smaller than the estimated 1382.56. But when I’m looking for Residual Predicates that are having to check too many rows, the ratio of 1,382.56 : 7,276 is still significant.

Now that we’ve found that this query is ineffective without even needing to run it, the way to fix it is to make sure that the Residual Predicate is sufficiently SARGable. This query…

SELECT COUNT(*) 
FROM Sales.Orders
WHERE SalespersonPersonID = 7 
AND OrderDate >= '20130401' 
AND OrderDate <  '20130501';

...gives the same results, and doesn’t have a Residual Predicate. In this situation, the Estimated Number of Rows to be Read value is identical to the Estimated Number of Rows, and the inefficiency is gone:

rf-ertbr-pe-2

As mentioned earlier, this post is part of this month’s T-SQL Tuesday. Why not head over there to see what other feature requests have been granted recently?