Jonathan Kehayias

Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Itzik is a T-SQL trainer, a co-founder of SolidQ, and blogs about T-SQL fundamentals and query tuning.

Itzik’s Posts

Numerous licensing changes were introduced in SQL Server 2012; the most significant was the move from socket-based licensing to core-based licensing for Enterprise Edition. One of the challenges that Microsoft faced with this change was providing a migration path for customers that previously used Server+CAL based licensing for Enterprise Edition prior to SQL Server 2012. Customers under Software Assurance can upgrade to SQL Server 2012 Enterprise Edition and still use Server+CAL licensing (also known as "grandfathering") but with a limitation to 20 logical processors, as documented in the SQL Server 2012 Licensing Guide. This licensing also extends to VMs with a limit of 4 VMs being covered by the Enterprise Server+CAL license, but still with the same 20 logical processor limitation as documented in the SQL Server 2012 Virtualization Licensing Guide.

A lot of people have been caught off guard by the 20 logical processor limitation, even though it is documented in the licensing guides.

An entry is made in the ERRORLOG file when the instance starts up, specifying the number of logical processors and that the 20 processor limitation is being enforced:

Date    11/14/2012 8:15:08 PM
Log     SQL Server (Current – 11/14/2012 8:17:00 PM)
Source  Server
Message
SQL Server detected 2 sockets with 16 cores per socket and 16 logical processors per socket, 32 total logical processors; using 20 logical processors based on SQL Server licensing. This is an informational message; no user action is required.

With the default configuration that SQL Server applies under the 20 logical processor limitation using Server+CAL, the first 20 schedulers are VISIBLE ONLINE and any remaining schedulers are VISIBLE OFFLINE. As a result, performance problems can occur for the instance, due to NUMA node scheduler imbalances. To demonstrate this I created a VM on our Dell R720 test server which has two sockets and Intel Xeon E5-2670 processors installed, each with 8 cores and Hyperthreading enabled, providing a total of 32 logical processors available under Windows Server 2012 Datacenter Edition. The VM was configured to have 32 virtual CPUs with 16 virtual processors allocated in two vNUMA nodes.

vNUMA settings
Figure 1 – vNUMA settings

In SQL Server under the Enterprise Server+CAL licensing model, this results in a scheduler configuration that is similar to the following:

SELECT 
  parent_node_id,
  [status], 
  scheduler_id, 
  [cpu_id], 
  is_idle, 
  current_tasks_count, 
  runnable_tasks_count, 
  active_workers_count, 
  load_factor
FROM sys.dm_os_schedulers;

Scheduler assignment under Enterprise Server+CAL
Figure 2 – Scheduler assignment under Enterprise Server+CAL

As you can see, all 16 of the logical processors in the first NUMA node and only four of the logical processors in the second NUMA node are used by the instance. This results in a significant imbalance of schedulers between the two NUMA nodes that can lead to significant performance problems under load. To demonstrate this, I spun up 300 connections running the AdventureWorks Books Online workload against the instance and then captured the scheduler information for the VISIBLE ONLINE schedulers in the instance using the following query:

SELECT 
  parent_node_id,
  scheduler_id, 
  [cpu_id], 
  is_idle, 
  current_tasks_count, 
  runnable_tasks_count, 
  active_workers_count, 
  load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';

An example output of this query under load is shown in Figure 3 below.

Schedulers under load with Enterprise Server + CAL
Figure 3 – Schedulers under load with Enterprise Server+CAL

You can also see this symptom visually in monitoring tools such as SQL Sentry Performance Advisor:


Figure 4 – NUMA imbalance as shown in SQL Sentry Performance Advisor

This information shows a significant imbalance and performance is going to be affected as a result. This is clearly evident in the runnable tasks counts for the four schedulers in the second NUMA node, which are three to four times the size of those for the schedulers in the first NUMA node. So what exactly is the problem and why does this occur?

At first glance you might think that this is a bug in SQL Server, but it isn't. This is something that occurs by design, though I doubt that this scenario was expected when the 20 logical processor limitation was originally implemented. On NUMA-based systems, new connections are assigned to the NUMA nodes in a round-robin fashion, and then inside of the NUMA node the connection is assigned to a scheduler based on load. If we change the way that we are looking at this data and aggregate the data based on parent_node_id we'll see that the tasks are actually being balanced across the NUMA nodes. To do this we'll use the following query, the output of which is shown in Figure 5.

SELECT 
  parent_node_id, 
  SUM(current_tasks_count) AS current_tasks_count, 
  SUM(runnable_tasks_count) AS runnable_tasks_count, 
  SUM(active_workers_count) AS active_workers_count, 
  AVG(load_factor) AS avg_load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE'
GROUP BY parent_node_id;

NUMA node round-robin balance
Figure 5 – NUMA node round-robin balance

This behavior is documented in Books Online for SQL Server (http://msdn.microsoft.com/en-us/library/ms180954(v=sql.105).aspx). Knowing what I know about SQLOS, SQL Server, and hardware, this makes sense. Prior to the 20 logical processor limitation in SQL Server 2012 Enterprise Edition with Server+CAL licensing, it was a rare scenario that SQL Server would have a scheduler imbalance between NUMA nodes in a production server. One of the problems in this specific case is the way that the virtual NUMA was passed through to the VM. Performing the exact same installation on the physical hardware allows all of the schedulers to be ONLINE VISIBLE since the additional logical processors presented by the hyperthreads are distinguishable by SQL and free.

In other words, the 20-logical processor limit actually results in 40 schedulers ONLINE if (a) it is not a virtual machine, (b) the processors are Intel, and (c) hyper-threading is enabled.

So we see this message in the error log:

Date    11/14/2012 10:36:18 PM
Log     SQL Server (Current – 11/14/2012 10:36:00 PM)
Source  Server
Message
SQL Server detected 2 sockets with 8 cores per socket and 16 logical processors per socket, 32 total logical processors; using 32 logical processors based on SQL Server licensing. This is an informational message; no user action is required.

And the same query as above results in all 32 processors being VISIBLE ONLINE:

SELECT 
  parent_node_id,
  [status], 
  scheduler_id, 
  [cpu_id], 
  is_idle, 
  current_tasks_count, 
  runnable_tasks_count, 
  active_workers_count, 
  load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';

Same configuration on physical hardware
Figure 6 – Same configuration on physical hardware

In this case, since there are only 32 logical processors, the 20 (well, 40) core limit does not impact us at all, and work is distributed evenly across all of the cores.

In scenarios where the 20 processor limitation affects the NUMA balance of schedulers it is possible to manually change the server configuration to balance the number of VISIBLE ONLINE schedulers in each of the NUMA nodes through the use of ALTER SERVER CONFIGURATION. In the VM example provided, the following command will configure the first 10 logical processors in each NUMA node to VISIBLE ONLINE.

ALTER SERVER CONFIGURATION
SET PROCESS AFFINITY CPU = 0 TO 9, 16 TO 25;

With this new configuration, running the same workload of 300 sessions and the AdventureWorks Books Online workload, we can see that the load imbalance no longer occurs.

Balance restored with manual configuration
Figure 7 – Balance restored with manual configuration

And again using SQL Sentry Performance Advisor we can see the CPU load distributed more evenly across both NUMA nodes:


Figure 8 – NUMA balance as shown in SQL Sentry Performance Advisor

This problem is not strictly limited to VMs and the way that virtual CPUs are presented to the OS. It is also possible to run into this problem with physical hardware. For example, an older Dell R910 with four sockets and eight cores per socket, or even an AMD Opteron 6200 Interlagos based server with two sockets and 16 cores per socket, which presents itself as four NUMA nodes with eight cores each. Under either of these configurations, the process imbalance can also result in one of the NUMA nodes being set offline entirely. Consequently, other side effects such as memory from that node being distributed across the online nodes leading to foreign memory access issues can also degrade performance.

Summary

The default configuration of SQL Server 2012 using the Enterprise Edition licensing for Server+CAL is not ideal under all NUMA configurations that might exist for SQL Server. Whenever Enterprise Server+CAL licensing is being used, the NUMA configuration and scheduler statuses per NUMA node needs to be reviewed to ensure that SQL Server is configured for optimum performance. This problem does not occur under core-based licensing since all of the schedulers are licensed and VISIBLE ONLINE.