Numerous licensing changes were introduced in SQL Server 2012; the most significant was the move from socket-based licensing to core-based licensing for Enterprise Edition. One of the challenges that Microsoft faced with this change was providing a migration path for customers that previously used Server+CAL based licensing for Enterprise Edition prior to SQL Server 2012. Customers under Software Assurance can upgrade to SQL Server 2012 Enterprise Edition and still use Server+CAL licensing (also known as "grandfathering") but with a limitation to 20 logical processors, as documented in the SQL Server 2012 Licensing Guide. This licensing also extends to VMs with a limit of 4 VMs being covered by the Enterprise Server+CAL license, but still with the same 20 logical processor limitation as documented in the SQL Server 2012 Virtualization Licensing Guide.
A lot of people have been caught off guard by the 20 logical processor limitation, even though it is documented in the licensing guides.
An entry is made in the ERRORLOG file when the instance starts up, specifying the number of logical processors and that the 20 processor limitation is being enforced:
Log SQL Server (Current – 11/14/2012 8:17:00 PM)
SQL Server detected 2 sockets with 16 cores per socket and 16 logical processors per socket, 32 total logical processors; using 20 logical processors based on SQL Server licensing. This is an informational message; no user action is required.
With the default configuration that SQL Server applies under the 20 logical processor limitation using Server+CAL, the first 20 schedulers are VISIBLE ONLINE and any remaining schedulers are VISIBLE OFFLINE. As a result, performance problems can occur for the instance, due to NUMA node scheduler imbalances. To demonstrate this I created a VM on our Dell R720 test server which has two sockets and Intel Xeon E5-2670 processors installed, each with 8 cores and Hyperthreading enabled, providing a total of 32 logical processors available under Windows Server 2012 Datacenter Edition. The VM was configured to have 32 virtual CPUs with 16 virtual processors allocated in two vNUMA nodes.
In SQL Server under the Enterprise Server+CAL licensing model, this results in a scheduler configuration that is similar to the following:
SELECT parent_node_id, [status], scheduler_id, [cpu_id], is_idle, current_tasks_count, runnable_tasks_count, active_workers_count, load_factor FROM sys.dm_os_schedulers;
Figure 2 – Scheduler assignment under Enterprise Server+CAL
As you can see, all 16 of the logical processors in the first NUMA node and only four of the logical processors in the second NUMA node are used by the instance. This results in a significant imbalance of schedulers between the two NUMA nodes that can lead to significant performance problems under load. To demonstrate this, I spun up 300 connections running the AdventureWorks Books Online workload against the instance and then captured the scheduler information for the VISIBLE ONLINE schedulers in the instance using the following query:
SELECT parent_node_id, scheduler_id, [cpu_id], is_idle, current_tasks_count, runnable_tasks_count, active_workers_count, load_factor FROM sys.dm_os_schedulers WHERE [status] = N'VISIBLE ONLINE';
An example output of this query under load is shown in Figure 3 below.
Figure 3 – Schedulers under load with Enterprise Server+CAL
You can also see this symptom visually in monitoring tools such as SQL Sentry Performance Advisor:
Figure 4 – NUMA imbalance as shown in SQL Sentry Performance Advisor
This information shows a significant imbalance and performance is going to be affected as a result. This is clearly evident in the runnable tasks counts for the four schedulers in the second NUMA node, which are three to four times the size of those for the schedulers in the first NUMA node. So what exactly is the problem and why does this occur?
At first glance you might think that this is a bug in SQL Server, but it isn't. This is something that occurs by design, though I doubt that this scenario was expected when the 20 logical processor limitation was originally implemented. On NUMA-based systems, new connections are assigned to the NUMA nodes in a round-robin fashion, and then inside of the NUMA node the connection is assigned to a scheduler based on load. If we change the way that we are looking at this data and aggregate the data based on parent_node_id we'll see that the tasks are actually being balanced across the NUMA nodes. To do this we'll use the following query, the output of which is shown in Figure 5.
SELECT parent_node_id, SUM(current_tasks_count) AS current_tasks_count, SUM(runnable_tasks_count) AS runnable_tasks_count, SUM(active_workers_count) AS active_workers_count, AVG(load_factor) AS avg_load_factor FROM sys.dm_os_schedulers WHERE [status] = N'VISIBLE ONLINE' GROUP BY parent_node_id;
Figure 5 – NUMA node round-robin balance
This behavior is documented in Books Online for SQL Server (http://msdn.microsoft.com/en-us/library/ms180954(v=sql.105).aspx). Knowing what I know about SQLOS, SQL Server, and hardware, this makes sense. Prior to the 20 logical processor limitation in SQL Server 2012 Enterprise Edition with Server+CAL licensing, it was a rare scenario that SQL Server would have a scheduler imbalance between NUMA nodes in a production server. One of the problems in this specific case is the way that the virtual NUMA was passed through to the VM. Performing the exact same installation on the physical hardware allows all of the schedulers to be ONLINE VISIBLE since the additional logical processors presented by the hyperthreads are distinguishable by SQL and free.
In other words, the 20-logical processor limit actually results in 40 schedulers ONLINE if (a) it is not a virtual machine, (b) the processors are Intel, and (c) hyper-threading is enabled.
So we see this message in the error log:
Log SQL Server (Current – 11/14/2012 10:36:00 PM)
SQL Server detected 2 sockets with 8 cores per socket and 16 logical processors per socket, 32 total logical processors; using 32 logical processors based on SQL Server licensing. This is an informational message; no user action is required.
And the same query as above results in all 32 processors being VISIBLE ONLINE:
SELECT parent_node_id, [status], scheduler_id, [cpu_id], is_idle, current_tasks_count, runnable_tasks_count, active_workers_count, load_factor FROM sys.dm_os_schedulers WHERE [status] = N'VISIBLE ONLINE';
Figure 6 – Same configuration on physical hardware
In this case, since there are only 32 logical processors, the 20 (well, 40) core limit does not impact us at all, and work is distributed evenly across all of the cores.
In scenarios where the 20 processor limitation affects the NUMA balance of schedulers it is possible to manually change the server configuration to balance the number of VISIBLE ONLINE schedulers in each of the NUMA nodes through the use of
ALTER SERVER CONFIGURATION. In the VM example provided, the following command will configure the first 10 logical processors in each NUMA node to VISIBLE ONLINE.
ALTER SERVER CONFIGURATION SET PROCESS AFFINITY CPU = 0 TO 9, 16 TO 25;
With this new configuration, running the same workload of 300 sessions and the AdventureWorks Books Online workload, we can see that the load imbalance no longer occurs.
Figure 7 – Balance restored with manual configuration
And again using SQL Sentry Performance Advisor we can see the CPU load distributed more evenly across both NUMA nodes:
Figure 8 – NUMA balance as shown in SQL Sentry Performance Advisor
This problem is not strictly limited to VMs and the way that virtual CPUs are presented to the OS. It is also possible to run into this problem with physical hardware. For example, an older Dell R910 with four sockets and eight cores per socket, or even an AMD Opteron 6200 Interlagos based server with two sockets and 16 cores per socket, which presents itself as four NUMA nodes with eight cores each. Under either of these configurations, the process imbalance can also result in one of the NUMA nodes being set offline entirely. Consequently, other side effects such as memory from that node being distributed across the online nodes leading to foreign memory access issues can also degrade performance.
The default configuration of SQL Server 2012 using the Enterprise Edition licensing for Server+CAL is not ideal under all NUMA configurations that might exist for SQL Server. Whenever Enterprise Server+CAL licensing is being used, the NUMA configuration and scheduler statuses per NUMA node needs to be reviewed to ensure that SQL Server is configured for optimum performance. This problem does not occur under core-based licensing since all of the schedulers are licensed and VISIBLE ONLINE.
20 thoughts on “Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing”
If you are in this scenario, and you are using a physical machine (not a VM), you can allegedly demand that Microsoft license all of your cores. From a comment on my blog post:
"Notwithstanding anything to the contrary in the Product Use Rights, if SQL Server 2012 Enterprise is run in a physical Operating System Environment (OSE) on the licensed server, that physical OSE may access any number of physical cores."
That is nasty – thanks for the insight Jonathan!
This is great information,thanks for sharing this Jonathan !
This is an enlightening great post from Jonathan. Thank you very much.
Jonathan, that's great work! I would guess Microsoft will want to do something about that in a future version. Meanwhile – and this is probably a thought you've already had but I'll ask anyway – is it possible to design a query to determine this imbalance? The group by parent_node_id shows the symptom, but not the NUMA/scheduler imbalance directly.
The 3rd query listed (results in Figure 5) does this, but maybe it will be more obvious if we take the average of the task/worker counts?
Of course this will only yield useful information when there is actually enough load on the server to show a difference.
Aaron – have you heard anybody having success pursuing this?
Just the one comment on my blog. Can't hurt to send that request back up through your rep, though. And really your rep should have told you about it anyway, and been fully aware of your licensing details *before* allowing you to use SA to buy CAL for 2012 Enterprise… IMHO, anyway.
JR and Aaron,
A minor change to Jonathan’s 3rd query that Aaron references may further highlight the allocated online scheduler imbalance across NUMA nodes.
Add the following line as the second column in the SELECT statement:
COUNT(*) As [# Schedulers],
The problem documented in this post is indicated when the “# Schedulers” column values are different across the NUMA nodes (instead of equal values) in this revised query results.
Great post, Jonathan!
Yes, makes perfect sense, thanks Scott.
Just noticed a typo, the query to display the original CAL licensing should not have the where clause:
WHERE [status] = N'VISIBLE ONLINE';
Corrected, thanks, good catch!
I'm late for every party :) Also commented on Aaron Bertrand's blog post about the ENT + CAL limit. With the dawn of 12 core, 24 Lcpu per socket servers we wanted get geared up for performance testing MSSQL 2012 on 48 Lcpu. MSDN license was ENT + CAL, took a little while to determine this was why the server would only get 82% CPU busy. Once we looked at per-cpu stats it was ALMOST obvious :)
Great information. Thanks for the article.
I see this in one of my SQL Server error logs:
SQL Server detected 1 sockets with 32 cores per socket and 32 logical processors per socket, 32 total logical processors; using 20 logical processors based on SQL Server licensing. This is an informational message; no user action is required.
This is a VM with only one NUMA node assigned. So all procs are on same numa node and I see that cpu_id >19 are in VISIBLE_OFFLINE state.
Does this mean I need to set affinity as follows:
ALTER SERVER CONFIGURATION
SET PROCESS AFFINITY CPU = 0 TO 31
Please ignore my previous question. I went on wrong path there :)
I figured my way out since we already upgraded to Core licensing. We just needed a Edition upgrade from CAL to core based.
I was able to perform a edition upgrade by running the same SQL 2012 setup:
Run the SQL Server setup
Goto Maintenance Tab and run through the wizard
On the licensing page, enter the new product key (based on your core licensing model).
Accept the agreement
Pretty much run through defaults unless you are running any Always On databases (which may need some additional considerations)
It will take SQL Server services or any other instances like SSRS, SSIS, etc down
Upgrade the relevant Binaries (pretty quick in my case, about 2-3 minutes)
All the services are started automatically
Connect to SQL server and run the following query:
Enterprise Edition: Core-based Licensing (64-bit)
Previously I did not have "Core-based Licensing"
SQL Errorlog will also indicate the number of CPUs detected and being used.
References for more detailed info:
I got that situation with the Dell R910:
SQL Server detected 4 sockets with 8 cores per socket and 16 logical processors per socket, 64 total logical processors; using 40 logical processors based on SQL Server licensing. This is an informational message; no user action is required.
How can I fix it?
Either change your licensing to Enterprise Edition Core based licensing with the right number of core licenses or use ALTER SERVER CONFIGURATION as explained in the blog post to map your 40 logical ONLINE schedulers 10 to each of your 4 NUMA nodes to balance the system out.
Comments are closed.