Practical Processor Selection for OLTP Workloads
SentryOne - SQL Sentry
Sep 152015
 
PentiumProbably not the CPU you're looking for.

Imagine that you have just gotten management approval for a new on-premises, database server that will run a mission critical OLTP workload with SQL Server 2014 Enterprise Edition running on Windows Server 2012 R2 Standard Edition. This new server will replace a legacy Dell PowerEdge R910 server that is running SQL Server 2008 R2 Enterprise Edition on Windows Server 2008 R2 Enterprise Edition. This existing server has four, 45nm Intel Xeon X7560 Nehalem-EX processors (giving you 32 total ”physical cores in the system) and has 512GB of RAM, which was a premium setup back when it was purchased in 2010.

For this example, you have a somewhat limited hardware and software license budget available for the platform upgrade. You want to move to SQL Server 2014 Enterprise Edition, on a new server with much better single-threaded CPU performance (since you have an OLTP workload), and higher overall CPU capacity, while still minimizing your hardware and SQL Server license costs.

Given this information, how would you go about deciding which model server to choose and which exact processor to select? Would you let someone in the I.T. department make these decisions, or would you get involved in the evaluation and decision-making process? Rather than letting "Shon the server guy" decide, I want to show you an easy, practical method for figuring this out, using TPC-E benchmark scores, some simple arithmetic and some common sense.

The first step is to check out the TPC-E benchmark result scores online, looking for the nearest equivalent system to your existing system. If you are lucky, you will find an official benchmark submission that is an exact match to to your system. In our case, I found a submission from April 10, 2010 that just happens to exactly match our legacy system. This Dell PowerEdge R910 system had an actual TPC-E Throughput score of 1,933.96, with four Intel Xeon X7560 eight-core processors. The actual TPC-E Throughput score is a good measurement of the overall CPU capacity of the system, giving you a rough indication of how many concurrent queries it can handle (assuming you don’t have storage or memory-related bottlenecks).

The second step is to take the actual TPC-E Throughput score for this system, and divide it by the number of physical processor cores in the system, to get an idea of the single-threaded processor performance for the particular processor in the benchmarked system. For this system, we would simply divide 1,933.96 by 32 physical cores, to come up with a "score per core" of 60.44. The Intel Xeon X7560 has a base clock speed of 2.26GHz, with a Turbo clock speed of 2.66GHz. It has a 24MB shared L3 cache, and a 6.4 GT/s Intel QPI speed. Quick Path Interconnect (QPI) is a point-to-point interconnect between the CPU and the integrated memory controller, so having higher QPI speeds means faster memory access.

The third step is to look for TPC-E benchmark results for current generation systems and processors, trying to find a result for a system that is roughly equivalent to the type of new system that you are considering. Unfortunately, this will sometimes be be difficult because the server vendors have not been submitting as many new systems as they used to, and because they always submit systems that use the "flagship", highest core count processor for a given processor generation and family. This makes the actual TPC-E Throughput score higher, but also makes a system using that processor much more expensive to license for SQL Server because of the higher core counts. Another problem with these "flagship" processor models is that they typically trade higher physical core counts for lower base clock speeds, so you end up paying much higher licensing costs and actually getting lower single-threaded CPU performance.

Conventional wisdom would be that you would want a new four-socket server to replace your existing four-socket server. With modern two-socket servers, that may not be necessary anymore, depending on the magnitude of your workload. There was a period, from early 2012 through early 2014, when new two-socket Xeon E5 family server processors were much faster for single-threaded performance than new four-socket Xeon E7 family server processors. This performance gap closed up quite a bit when the Xeon E7 v2 Ivy Bridge-EX family was released in early 2014. Since then, the E5 and E7 families are even closer now since both are using the Haswell microarchitecture.

Given this near CPU performance parity, your server model choice comes down to overall memory, CPU and IO capacity. Current two-socket servers are practically limited to 768GB of RAM with 32GB DIMMs, while current four-socket servers can have 3TB of RAM with 32GB DIMMs. Most customers that I run into can comfortably run their workload on 768GB of RAM, but a few actually do need more than that. Both two-socket and four-socket servers can use Haswell processors with 18 physical cores each, so you have a limit of either 36 or 72 physical cores in your system, depending on your socket count.

For this exercise, there is a TPC-E benchmark result for a two-socket Fujitsu Primergy RX2540 M1 system with an actual TPC-E Throughput score of 3,772.08, using two Intel Xeon E5-2699 v3 eighteen core Haswell-EP processors and 512GB of RAM. The Xeon E5-2699 v3 processor has a base clock speed of 2.3GHz, with a Turbo clock speed of 3.6GHz. It has a 45MB shared L3 cache, and a 9.6 GT/s Intel QPI speed. For this system, we would divide the actual 3,772.08 score by 36 physical cores, to come up with a "score per core" of 104.78.

This new two-socket system would have nearly double the overall CPU capacity of the legacy system, with about 1.57 times the single-threaded performance. From a CPU capacity and single-threaded performance perspective, I would be quite confident that this new two-socket system would be a good replacement for the legacy system. This new system would also have PCIe 3.0 expansion slots, so it will have more total IO capacity than the old system. Unfortunately, it does have 36 physical cores that will require SQL Server 2014 Enterprise Edition licenses, which will be rather expensive.

A better processor choice might be the Intel Xeon E5-2667 v3, which has eight physical cores, with a base clock speed of 3.2GHz, a Turbo clock speed of 3.6GHz, a 20MB shared L3 cache, and a 9.6 GT/s Intel QPI speed. Using some simple arithmetic, we can adjust for the differences between the Xeon E5-2699 v3 and the Xeon E5-2667 v3, (since they are both from the same processor generation and family).

First, I adjust for the difference in physical core counts. If you divide 8 by 18, the result is 0.44, so you multiply 3,772.08 times 0.44, we get a result of 1,676.48. Next, we have to adjust for the difference in base clock speed between these two processors. The E5-2667 v3 has a 39% higher base clock speed than the E5-2699 v3 processor, so we would multiply 1,676.48 times 1.39 to get a final estimated TPC-E score of 2330.31. If we divide this estimated score by 16 physical cores, we get a "score per core" result of 145.64.

So, according to this calculation, we would still have more overall CPU capacity than the legacy system, and much better single-threaded CPU performance using the lower core count, "frequency optimized" Xeon E5-2667 v3 processor in the new two-socket system. The final bonus from this processor choice is much lower SQL Server 2014 licensing costs, since we would only need 16 core licenses instead of 36 core licenses, which is a savings of about $136,000.00.

Hopefully, it is a little more clear how you can do some simple analysis to help you decide the exact processor that you should choose for a new database server, given your workload and budget constraints. This analysis is a little more complicated when you have to compare processors from different generations and families, but the same basic principles can be used. I will dive deeper into this subject in a future article.

  2 Responses to “Practical Processor Selection for SQL Server 2014/2016 OLTP Workloads”

  1. Thanks for the thorough analysis. For anyone responsible or interested in this topic, check out Glenn's Pluralsight recording: "SQL Server 2012: Evaluating and Sizing Hardware." He does an excellent job with presenting information about hardware and how it impacts performance and overall cost. Another useful resource that focuses on hardware (CPU, RAM, I/O) is http://www.anandtech.com. Lastly, Intel provides details on its processors at http://ark.intel.com that allows you to compare processors at a granular level. The details matter significantly and as Glenn points out in this article and his videos, the processor selection impacts scalability and cost. Before upgrading, understand what the workload is today and what it may be in the future. Also, engage the licensing team (internal to your organization or Microsoft) with the architecture you and/or your team is contemplating especially when considering collocating other SQL Server services (i.e. SSRS, SSIS, SSAS).

  2. Stupid Question, sort of related.

    In aa virtual environment I'm looking at deploying a new SQL 2014 Server that requires 8 Cores. Should I go with: 4 Sockets with 2 Cores or 2 sockets with 4 Cores? And would there be any tangible difference?

 Leave a Reply

(required)

(required)