In this continuation of my ‘knee-jerk performance tuning’ series, I’d like to discuss Solid State Disks (SSDs) and some of the problems I see with their use in a SQL Server environment. For an in-depth description of SSDs, check out this Wikipedia article.
What Do SSDs do For SQL Server Performance?
SSDs don’t have any moving parts so when a read or write occurs, there’s almost no I/O latency. The latency in a spinning drive comes from two things:
- Moving the disk head to the right track on the disk surface (known as the seek time)
- Waiting for the disk to spin to the right point on the track (known as the rotational latency)
This means that SSDs provide a big performance boost when there’s an I/O bottleneck.
It’s that simple.
There’s a bit of complication that’s worth mentioning but beyond the scope of this article to go into in depth: SSD performance can start to degrade as the drive gets really full (investigated and explained in detail in this article from AnandTech). There may also be some system memory required for the SSD driver to help with wear leveling (prolonging the life of the NAND cells in the SSD), and that’s going to vary by vendor. Enough of that – back to the SQL Server stuff.
Avoid Bad Internet Advice
There are two bits of very poor advice I see on the Internet around SQL Server and SSDs.
The first is around what to put on the SSD, where the advice is to always put tempdb and your transaction logs on SSDs. At first glance that sounds like good advice, as transaction logs and tempdb are commonly bottlenecks in the system.
But what if they’re not?
Your workload may be read-mostly, in which case the transaction log likely won’t be a workload bottleneck and so putting it on an SSD may be a waste of an expensive SSD.
Your tempdb may not be used very much by your workload, so putting it on an SSD may be a waste of an expensive SSD.
When you’re considering which portion of the SQL Server environment to move to the SSD, you want to investigate where the I/O bottlenecks are. This can be done very easily using the code I posted last week that uses the sys.dm_io_virtual_file_stats DMV to provide a snapshot of the I/O latencies for all files in all databases on the instance. To make sense of your latency numbers, and compare them against good/bad values, read through this long post I did specifically around tempdb and transaction log I/O latencies.
And then even if you do have high latencies, don’t knee-jerk and think that the only solution is to move the poorly-performing file(s) to an SSD:
- For data file read latencies, investigate why there are so many reads occurring. I cover that here.
- For log file write latencies, consider all the ways to tune the performance of the log and what’s being logged. I cover that here, here, and here.
The worst possible case is where you’re given a bunch of SSDs, follow the Internet advice to move tempdb and your log files to them, and then there’s no workload performance gain. That’s not going to encourage your management to provide you with more expensive SSDs.
The second piece of poor advice is around index fragmentation, where the advice is that because SSDs are so fast, you don’t need to worry about index fragmentation when using SSDs.
There are three ways I refute that bad advice:
- SSDs in no way stop the cause of index fragmentation: page splits from pages needing free space for a random insert or row size increase. A page split generates the same amount of transaction log, resource usage, and potential thread waits regardless of where the data/log files are stored.
- Index fragmentation includes having many data/index pages with low page density (i.e. lots of empty, free space). Do you really want your expensive SSDs storing lots of empty space? SSDs don’t help here at all.
- My colleague Jonathan Kehayias did an in-depth investigation, using Extended Events, of I/O patterns around index fragmentation specifically to address this bad advice and found that there is still a performance hit from having index fragmentation when using SSDs. You can read his long post here.
The only thing that SSDs do around index fragmentation is make the reads go faster, so there’s less of a performance penalty for index range scans when index fragmentation exists, but point 3 above shows that there’s still a penalty.
SSDs do not change how you deal with and/or prevent index fragmentation in your SQL Server environment.
Make Sure to Protect Your Data
One of the cardinal sins I see people committing around using SSDs is only using one of them. With only one drive, what RAID level are you using? Zero. RAID-0 provides no redundancy at all.
If you’re going to use an SSD, then you need to use at least two, in a RAID-1 (mirroring) configuration. There’s no point having a performance boost if you’re sacrificing the availability of the system as the trade-off.
One push back I sometimes get to using at least two SSDs is that the SSD card provides two drives to Windows, and so surely creating a Windows mirrored volume over the two drives is the same as RAID-1 across two physically separate SSDs?
No, it’s not. It’s still one physical SSD, with no redundancy. Have you ever seen half of an SSD card fail? No, neither have I. Do it right and use two of them and get real redundancy for your data.
The other push back I get is that they’re SSDs, not spinning drives, so aren’t going to fail. That’s wrong. SSDs can and do fail just like spinning drives. I’ve personally seen two enterprise-grade SSDs fail during testing in our lab environment. According to this article on StorageReview.com, consumer-grade SSDs have an MTBF of 2 million hours vs. 1.5 million hours for consumer-grade spinning drives, and I’d expect similar results for enterprise-grade drives, but SSDs do fail.
Don’t fall into the trap of thinking that whatever you put on the SSD means that you’ll get a boost in performance – you have to pick and choose carefully. And don’t believe the nonsense out there about ignoring index fragmentation when using SSDs either.
SSDs are a very useful way to increase performance, but for their cost, you want to make sure you’re maximizing the return on your company’s investment by using them correctly and only where appropriate.
In the next article in the series, I’ll discuss another common cause of knee-jerk performance tuning. Until then, happy troubleshooting!