The Zombie PerfMon Counters That Never Die!

One of the things that's simultaneously great and horrible about the Internet is that, once something gets posted out in the ether, it basically never goes away. (Some day, politicians will realize this. We can easily fact check their consistency.) Because of longevity of content posted to the Internet, a lot of performance tuning topics become "zombies." We shoot 'em dead, but they keep coming back!

In other words, those old recommendations were a suggested best practice long ago, for a specific version of SQL Server, but are now inappropriate for the newer version. It's not uncommon for me, when speaking at a conference, to encounter someone who's still clinging to settings and techniques which haven't been good practice since the days of SQL Server 2000. The SQL Server 2000 Operations Guide on Capacity/Storage contains many "best practice" recommendations that were very version-specific and no longer apply today.

So here's an example. The % Disk Time and Disk Queue Length PerfMon counters were heavily recommended as key performance indicators for I/O performance. SQL Server throws a lot of I/O at the disks using scatter/gather to maximize the utilization of the disk-based I/O subsystem. This approach leads to short bursts of long queue depths during checkpoints and read-aheads for an instance of SQL Server. Sometimes the server workload is such that your disk can't keep up with the I/O shoved at it and, when that happens, you'll see long queue lengths too. The short burst scenario isn't a problem. The lengthening queue length scenario usually is a problem. So is that a good practice?

In a word, not-so-much.

Those counters can still be of some use on an instance of SQL Server which only has one hard disk (though that's exceedingly rare these days). Why?

The PerfMon counter % Disk Time is a bogus performance metric for several reasons. It does not take into account asynchronous I/O requests. It can't tell what the real performance profile for an underlying RAID set may be, since they contain multiple disk drives. The PerfMon counter Disk Queue Length is also mostly useless, except on SQL Servers with a single physical disk, because the hard disk controller cache obfuscates how many I/O operations are actually pending on the queue or not. In fact, some hard disks even have tiny write caches as well, which further muddies the water was to whether the I/O is truly queued, in a cache somewhere between the operating system and the disk, or has finally made it all the way to the CMOS on the disk.

Better I/O PerfMon Counters

Instead of using those PerfMon counters, use the Avg Disk Reads/sec, Avg Disk Writes/sec, and Avg Disk Transfers/sec to track the performance of disk subsystems. These counters track the average number of read I/Os, write I/Os, and combined read and write I/Os that occured in the last second. Occassionally, I like to track the same metrics by volume of data rather than the rate of I/O operations. So, to get that data, you may wish to give these volume-specific PerfMon counters a try: Avg Disk Transfer Bytes/sec, Avg Disk Read Bytes/sec, and Avg Disk Write Bytes/sec.

For SQL Server I/O Performance, Use Dynamic Management Views (DMV)

And unless you've been living in a cave, you should make sure to use SQL Server's Dynamic Management Views (DMVs) to check on I/O performance for recent versions of SQL Server. Some of my favorite DMVs for I/O include:

So how are you tracking I/O performance metrics? Which ones are you using?

I look forward to hearing back from you!

Enjoy,
-Kev
–Follow me on Twitter!