In this continuation of my ‘knee-jerk performance tuning’ series, I’d like to discuss Solid State Disks (SSDs) and some of the problems I see with their use in a SQL Server environment. For an in-depth description of SSDs, check out this Wikipedia article.
What Do SSDs do For SQL Server Performance?
SSDs don’t have any moving parts so when a read or write occurs, there’s almost no I/O latency. The latency in a spinning drive comes from two things:
- Moving the disk head to the right track on the disk surface (known as the seek time)
- Waiting for the disk to spin to the right point on the track (known as the rotational latency)
This means that SSDs provide a big performance boost when there’s an I/O bottleneck.
It’s that simple.
There’s a bit of complication that’s worth mentioning but beyond the scope of this article to go into in depth: SSD performance can start to degrade as the drive gets really full (investigated and explained in detail in this article from AnandTech). There may also be some system memory required for the SSD driver to help with wear leveling (prolonging the life of the NAND cells in the SSD), and that’s going to vary by vendor. Enough of that – back to the SQL Server stuff.
Avoid Bad Internet Advice
There are two bits of very poor advice I see on the Internet around SQL Server and SSDs.
The first is around what to put on the SSD, where the advice is to always put tempdb and your transaction logs on SSDs. At first glance that sounds like good advice, as transaction logs and tempdb are commonly bottlenecks in the system.
But what if they’re not?
Your workload may be read-mostly, in which case the transaction log likely won’t be a workload bottleneck and so putting it on an SSD may be a waste of an expensive SSD.
Your tempdb may not be used very much by your workload, so putting it on an SSD may be a waste of an expensive SSD.
When you’re considering which portion of the SQL Server environment to move to the SSD, you want to investigate where the I/O bottlenecks are. This can be done very easily using the code I posted last week that uses the sys.dm_io_virtual_file_stats DMV to provide a snapshot of the I/O latencies for all files in all databases on the instance. To make sense of your latency numbers, and compare them against good/bad values, read through this long post I did specifically around tempdb and transaction log I/O latencies.
And then even if you do have high latencies, don’t knee-jerk and think that the only solution is to move the poorly-performing file(s) to an SSD:
- For data file read latencies, investigate why there are so many reads occurring. I cover that here.
- For log file write latencies, consider all the ways to tune the performance of the log and what’s being logged. I cover that here, here, and here.
The worst possible case is where you’re given a bunch of SSDs, follow the Internet advice to move tempdb and your log files to them, and then there’s no workload performance gain. That’s not going to encourage your management to provide you with more expensive SSDs.
The second piece of poor advice is around index fragmentation, where the advice is that because SSDs are so fast, you don’t need to worry about index fragmentation when using SSDs.
What nonsense!
There are three ways I refute that bad advice:
- SSDs in no way stop the cause of index fragmentation: page splits from pages needing free space for a random insert or row size increase. A page split generates the same amount of transaction log, resource usage, and potential thread waits regardless of where the data/log files are stored.
- Index fragmentation includes having many data/index pages with low page density (i.e. lots of empty, free space). Do you really want your expensive SSDs storing lots of empty space? SSDs don’t help here at all.
- My colleague Jonathan Kehayias did an in-depth investigation, using Extended Events, of I/O patterns around index fragmentation specifically to address this bad advice and found that there is still a performance hit from having index fragmentation when using SSDs. You can read his long post here.
The only thing that SSDs do around index fragmentation is make the reads go faster, so there’s less of a performance penalty for index range scans when index fragmentation exists, but point 3 above shows that there’s still a penalty.
SSDs do not change how you deal with and/or prevent index fragmentation in your SQL Server environment.
Make Sure to Protect Your Data
One of the cardinal sins I see people committing around using SSDs is only using one of them. With only one drive, what RAID level are you using? Zero. RAID-0 provides no redundancy at all.
If you’re going to use an SSD, then you need to use at least two, in a RAID-1 (mirroring) configuration. There’s no point having a performance boost if you’re sacrificing the availability of the system as the trade-off.
One push back I sometimes get to using at least two SSDs is that the SSD card provides two drives to Windows, and so surely creating a Windows mirrored volume over the two drives is the same as RAID-1 across two physically separate SSDs?
No, it’s not. It’s still one physical SSD, with no redundancy. Have you ever seen half of an SSD card fail? No, neither have I. Do it right and use two of them and get real redundancy for your data.
The other push back I get is that they’re SSDs, not spinning drives, so aren’t going to fail. That’s wrong. SSDs can and do fail just like spinning drives. I’ve personally seen two enterprise-grade SSDs fail during testing in our lab environment. According to this article on StorageReview.com, consumer-grade SSDs have an MTBF of 2 million hours vs. 1.5 million hours for consumer-grade spinning drives, and I’d expect similar results for enterprise-grade drives, but SSDs do fail.
Summary
Don’t fall into the trap of thinking that whatever you put on the SSD means that you’ll get a boost in performance – you have to pick and choose carefully. And don’t believe the nonsense out there about ignoring index fragmentation when using SSDs either.
SSDs are a very useful way to increase performance, but for their cost, you want to make sure you’re maximizing the return on your company’s investment by using them correctly and only where appropriate.
In the next article in the series, I’ll discuss another common cause of knee-jerk performance tuning. Until then, happy troubleshooting!
I would like to point out the following which is often overlooked or even falsely contradicted:
SSDs are much slower with random IO than with sequential IO. By about a factor of 10. That's 10x better than magnetic disks do but there is still a gap of an order of magnitude. For that reason alone the usual "never defrag an SSD" advice is false.
I would like to add that I've seen significant gains using SSDs by placing tempdb *data* files on SSDs. The WRITE activity specifically on the data files seems to be very high on highly transactional environments. (even when there's no snapshot isolation or no rebuilds sorting in tempdb) This comes after the environment has been tuned. By tuned a I mean I've eliminated patching issues, SAN issues, created multiple data files specific to environment, correct indexing, stats updated, index rebuilt… etc.
I don't the reason why, but if you look at the WRITE stalls on tempdb data files across the board I'm guessing many will see the same thing. Lot's of spills and possibly an excessive use of temp tables? I use Paul's approach to capturing the stalls with sys.dm_io_virtual_file_stats he blogged about in April of 2011. (How to examine IO subsystem latencies from within SQL Server)
I would like to add I've been in several environments lately where SAN guys and DBA's have said, "We don't have to defrag because we are on SSDs." I always ask, "Where did you hear that from?" They never know but it sure it making the rounds.
One issue that I have never seen anyone in the real world consider (or test for – because faster hardware can NEVER be bad, right?!?) is deadlocking. Query A and B try to read and write from the same resources in opposite order. But on your slow rotating media with 172ms read IO stalls on average your two queries hardly ever bump into each other. So you drop in SSD storage and BAM – queries A and B now don't stay out of each other's way and you can get deadlocks. I had one client instantly make their system UNUSABLE under load because of a BLIZZARD of deadlocks (and yes, I gave them an ear full for not involving me in their planned hardware upgrade!!). I have had other clients have lesser issues with it too.
Best,
Kevin
"You can read his long post here.
The only thing that SSDs do around index fragmentation is make the reads go faster, so there’s less of a performance penalty for index range scans when index fragmentation exists, but point 3 above shows that there’s still a penalty"
I do not agree, read:
https://www.vcritical.com/2010/01/vmware-esxi-4-sd-flash-in-bl460c-g6/
You're welcome to disagree.
However, that link doesn't say anything about SQL Server.
SSDs do not stop page splits happening, do not stop extra log being written, and do not prevent wasted space on disk and in memory. They only make reads faster, so there's less of an effect from smaller I/Os because of less efficient readahead when doing large scans on fragmented indexes. Can you point me at a link that explains why you disagree?
Thanks
One thing to consider, if you're a consultant, then the hours needed to monitor and analyse the IO might quickly make up for the cost of the SSD. Should then the choice of SSD used for tempdb be done lightly? I don't know
What RAID configuration do you recommend for 2TB of storage for non-db use in small business environment?
Thank you
I don't – I only do databases.