The primary focus of this blog site is performance in a SQL Server environment. One could argue that performance starts with database and application design. But you can also make a case that having the right resources available is essential for good performance. For all the discussion about the right resources (what CPU model, how much memory, what kind of storage), we sometimes neglect the act of capacity planning: using the data we have to make informed decisions about what we need. Capacity planning isn't just figuring out how much disk space we need, it also involves the resources that a SQL Server instance must have available to handle the workload.
New or Existing?
Capacity planning for a new solution is really tricky. You have to come up with estimates about workload based on information you collect from the business. This means you have to ask hard questions about how much data they will expect in the first month, the first six months, and the first year. When a new solution is coming in, this is often the LAST thing the business is thinking about, so very often you're going to get vague answers. In the case of a new solution, you really have make a best guess effort. Don't pull your hair out trying to get exact numbers.
If the solution is from a vendor, you must ask the vendor for planning recommendations about both space needed and the resources needed. I admit, they may not have that data, but you don't get what you don't ask for. It never hurts to try.
[Bonus: If the vendor doesn't have any information to provide, wouldn't it be helpful if, after the system is up and running for a few months, you send them your data…such as what hardware you have, and what the workload looks like? It doesn't need to be a 20-page write-up, but the feedback might nudge them in the direction of being more proactive going forward.]
In terms of an existing solution, if you're having a performance issue or you're looking to upgrade hardware, you'll want to capture information about the current environment in order to plan for a new one.
Planning for the amount of storage needed is fairly simple, it just requires some planning up front. In my proactive SQL Server health check articles I discuss monitoring disk space and include a query to capture file information. This query captures the size of the database files for the instance as well as the space used. It's imperative to trend this data over time, and that doesn't mean a few weeks. You're looking to see how the files change over months, possibly for up to one to two years, because usage patterns for an application can change. This information is easy to capture, requires little space to store, and is invaluable to have as a reference when you are procuring storage. If you can provide quantitative data about the system's growth then you have a much better chance of getting the space you need up front rather than having to ask for it later. And when you're asking for space, make sure to include tempdb in your calculations.
Optimizing your CPU performance is not just about the number of CPUs that you have, you also have to consider the model and the workload (e.g. data warehouse with large parallel queries vs. OLTP with serial queries). With this information, and a little help from Glenn, you can determine the best processor for your server. Don't forget to consider licensing costs and limitations based on your edition of SQL Server!
Memory is relatively inexpensive and it's our recommendation to always purchase the maximum amount of memory that a server can hold. Reading data from memory is significantly faster than reading it from disk, so the more data that fits into memory the better. Note that the entire database does not have to fit in memory. You just need the working set of data to fit into memory. Consider a 2TB database. It's unlikely that, in an OLTP scenario, all 2TB is accessed every day. Typically only recent data is accessed – perhaps just the last 30 or 60 days. That is the data that needs to fit in memory. But of course, rarely do we see a pure OLTP environment, often it's a mixed environment because users like to run reports over large sets of data, and there's no data warehouse or reporting copy of the database so they have to run the reports against production. This complicates the memory requirement. Now, sometimes you need that older data in memory, but sometimes you don't. It's important to understand the workload; what types of queries are executing against the database?
If you are using Standard Edition verify that you have more memory in the server than the maximum memory supported. For example, with SQL Server 2014 and higher, in Standard Edition the maximum amount of memory that you can allocate to the buffer pool (via the max server memory setting) is 128GB. Therefore, you want to have more memory in the server (e.g. 160GB) so you can set max server memory at the highest possible value of 128GB, and still have memory available for the OS and other SQL Server processes. Further, with SQL Server 2016 SP1 Standard Edition you can use In-Memory OLTP, with a limit of 32GB per database. This is above the max server memory value, so if you plan to use this feature, purchase memory accordingly.
When we talk about performance requirements for storage, you often hear people talk about IOPS (input/output operations per second). In fact, this article was inspired by a question from a viewer who watched my Pluralsight course on Benchmarking and Baselining. The question was: "How do you correlate the Performance Monitor counters for reads and writes per second to user connections to estimate the number of IOs per user?" Reads and writes per second are the input/output operations, so we have this data available through PerfMon at the instance level, and this is what you use to define the IOPS requirements for an instance.
However, if you know reads and writes and user connections, then you can do some math and figure out IOPS per user. This is useful if you're planning to grow the solution and add more users. You want to make sure the solution will scale, and one option you have is to take your calculated IOPS per user, based on X number of users, and then estimate instance IOPS for Y number of users. Now, we make a lot of assumptions with this calculation. We assume that the way new connections use the system is the same as it is today – that may or may not be the case in the end, you won't know until the system is in place. When you understand this value (reads + writes / user connections = average IOPS per user), then you know how to estimate IOPS for a solution based on anticipated user connections.
You then take this information to your storage person to discuss the potential configurations available. You can calculate the maximum IOPS for a disk configuration, provided you have information about the disks (e.g. the number of disks, the speed, the size, and the RAID configuration). You can test IO throughput for a drive using CrystalDiskMark, although this may not be possible if the storage hasn't been decided. Once it is in place, however, you should go through this testing to ensure that the IOPS for a given drive can meet the expected workload.
IOPS are just one way to look at storage performance. Understand that this data tells you how much IO is occurring, and ideally, if you know IOPS and you have the storage to meet the requirements, then latency should be minimal. But, latency is what affects performance. In order to determine what latency exists, you'll need to use a tool like DiskSpd to benchmark the storage. Glenn has a great article that explains how to analyze IO performance, and then another article on how use DiskSpd to test it to understand the latency. I highly recommend reviewing both articles if you haven't looked at storage and performance previously.
Capacity planning is about more than just knowing how much space you need for database files. You have to understand the workload and what it requires in terms of CPU, memory, and disk resources. To do this, you need data…which means you need capture baselines. My very first session in the SQL Server community was in December of 2010, and it was on the topic of baselines. Six years later I am still talking about their importance, and I am still hearing from people that they don't have these numbers. If you want to do intelligent, targeted capacity planning, then you need to collect the appropriate data…otherwise you're just guessing.