Nov 272012
 

Numerous licensing changes were introduced in SQL Server 2012; the most significant was the move from socket-based licensing to core-based licensing for Enterprise Edition. One of the challenges that Microsoft faced with this change was providing a migration path for customers that previously used Server+CAL based licensing for Enterprise Edition prior to SQL Server 2012. Customers under Software Assurance can upgrade to SQL Server 2012 Enterprise Edition and still use Server+CAL licensing (also known as "grandfathering") but with a limitation to 20 logical processors, as documented in the SQL Server 2012 Licensing Guide. This licensing also extends to VMs with a limit of 4 VMs being covered by the Enterprise Server+CAL license, but still with the same 20 logical processor limitation as documented in the SQL Server 2012 Virtualization Licensing Guide.

A lot of people have been caught off guard by the 20 logical processor limitation, even though it is documented in the licensing guides.

An entry is made in the ERRORLOG file when the instance starts up, specifying the number of logical processors and that the 20 processor limitation is being enforced:

Date    11/14/2012 8:15:08 PM
Log     SQL Server (Current – 11/14/2012 8:17:00 PM)
Source  Server
Message
SQL Server detected 2 sockets with 16 cores per socket and 16 logical processors per socket, 32 total logical processors; using 20 logical processors based on SQL Server licensing. This is an informational message; no user action is required.

With the default configuration that SQL Server applies under the 20 logical processor limitation using Server+CAL, the first 20 schedulers are VISIBLE ONLINE and any remaining schedulers are VISIBLE OFFLINE. As a result, performance problems can occur for the instance, due to NUMA node scheduler imbalances. To demonstrate this I created a VM on our Dell R720 test server which has two sockets and Intel Xeon E5-2670 processors installed, each with 8 cores and Hyperthreading enabled, providing a total of 32 logical processors available under Windows Server 2012 Datacenter Edition. The VM was configured to have 32 virtual CPUs with 16 virtual processors allocated in two vNUMA nodes.

vNUMA settings
Figure 1 – vNUMA settings

In SQL Server under the Enterprise Server+CAL licensing model, this results in a scheduler configuration that is similar to the following:

SELECT 
  parent_node_id,
  [status], 
  scheduler_id, 
  [cpu_id], 
  is_idle, 
  current_tasks_count, 
  runnable_tasks_count, 
  active_workers_count, 
  load_factor
FROM sys.dm_os_schedulers;

Scheduler assignment under Enterprise Server+CAL
Figure 2 – Scheduler assignment under Enterprise Server+CAL

As you can see, all 16 of the logical processors in the first NUMA node and only four of the logical processors in the second NUMA node are used by the instance. This results in a significant imbalance of schedulers between the two NUMA nodes that can lead to significant performance problems under load. To demonstrate this, I spun up 300 connections running the AdventureWorks Books Online workload against the instance and then captured the scheduler information for the VISIBLE ONLINE schedulers in the instance using the following query:

SELECT 
  parent_node_id,
  scheduler_id, 
  [cpu_id], 
  is_idle, 
  current_tasks_count, 
  runnable_tasks_count, 
  active_workers_count, 
  load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';

An example output of this query under load is shown in Figure 3 below.

Schedulers under load with Enterprise Server + CAL
Figure 3 – Schedulers under load with Enterprise Server+CAL

You can also see this symptom visually in monitoring tools such as SQL Sentry Performance Advisor:


Figure 4 – NUMA imbalance as shown in SQL Sentry Performance Advisor

This information shows a significant imbalance and performance is going to be affected as a result. This is clearly evident in the runnable tasks counts for the four schedulers in the second NUMA node, which are three to four times the size of those for the schedulers in the first NUMA node. So what exactly is the problem and why does this occur?

At first glance you might think that this is a bug in SQL Server, but it isn't. This is something that occurs by design, though I doubt that this scenario was expected when the 20 logical processor limitation was originally implemented. On NUMA-based systems, new connections are assigned to the NUMA nodes in a round-robin fashion, and then inside of the NUMA node the connection is assigned to a scheduler based on load. If we change the way that we are looking at this data and aggregate the data based on parent_node_id we'll see that the tasks are actually being balanced across the NUMA nodes. To do this we'll use the following query, the output of which is shown in Figure 5.

SELECT 
  parent_node_id, 
  SUM(current_tasks_count) AS current_tasks_count, 
  SUM(runnable_tasks_count) AS runnable_tasks_count, 
  SUM(active_workers_count) AS active_workers_count, 
  AVG(load_factor) AS avg_load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE'
GROUP BY parent_node_id;

NUMA node round-robin balance
Figure 5 – NUMA node round-robin balance

This behavior is documented in Books Online for SQL Server (http://msdn.microsoft.com/en-us/library/ms180954(v=sql.105).aspx). Knowing what I know about SQLOS, SQL Server, and hardware, this makes sense. Prior to the 20 logical processor limitation in SQL Server 2012 Enterprise Edition with Server+CAL licensing, it was a rare scenario that SQL Server would have a scheduler imbalance between NUMA nodes in a production server. One of the problems in this specific case is the way that the virtual NUMA was passed through to the VM. Performing the exact same installation on the physical hardware allows all of the schedulers to be ONLINE VISIBLE since the additional logical processors presented by the hyperthreads are distinguishable by SQL and free.

In other words, the 20-logical processor limit actually results in 40 schedulers ONLINE if (a) it is not a virtual machine, (b) the processors are Intel, and (c) hyper-threading is enabled.

So we see this message in the error log:

Date    11/14/2012 10:36:18 PM
Log     SQL Server (Current – 11/14/2012 10:36:00 PM)
Source  Server
Message
SQL Server detected 2 sockets with 8 cores per socket and 16 logical processors per socket, 32 total logical processors; using 32 logical processors based on SQL Server licensing. This is an informational message; no user action is required.

And the same query as above results in all 32 processors being VISIBLE ONLINE:

SELECT 
  parent_node_id,
  [status], 
  scheduler_id, 
  [cpu_id], 
  is_idle, 
  current_tasks_count, 
  runnable_tasks_count, 
  active_workers_count, 
  load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';

Same configuration on physical hardware
Figure 6 – Same configuration on physical hardware

In this case, since there are only 32 logical processors, the 20 (well, 40) core limit does not impact us at all, and work is distributed evenly across all of the cores.

In scenarios where the 20 processor limitation affects the NUMA balance of schedulers it is possible to manually change the server configuration to balance the number of VISIBLE ONLINE schedulers in each of the NUMA nodes through the use of ALTER SERVER CONFIGURATION. In the VM example provided, the following command will configure the first 10 logical processors in each NUMA node to VISIBLE ONLINE.

ALTER SERVER CONFIGURATION
SET PROCESS AFFINITY CPU = 0 TO 9, 16 TO 25;

With this new configuration, running the same workload of 300 sessions and the AdventureWorks Books Online workload, we can see that the load imbalance no longer occurs.

Balance restored with manual configuration
Figure 7 – Balance restored with manual configuration

And again using SQL Sentry Performance Advisor we can see the CPU load distributed more evenly across both NUMA nodes:


Figure 8 – NUMA balance as shown in SQL Sentry Performance Advisor

This problem is not strictly limited to VMs and the way that virtual CPUs are presented to the OS. It is also possible to run into this problem with physical hardware. For example, an older Dell R910 with four sockets and eight cores per socket, or even an AMD Opteron 6200 Interlagos based server with two sockets and 16 cores per socket, which presents itself as four NUMA nodes with eight cores each. Under either of these configurations, the process imbalance can also result in one of the NUMA nodes being set offline entirely. Consequently, other side effects such as memory from that node being distributed across the online nodes leading to foreign memory access issues can also degrade performance.

Summary

The default configuration of SQL Server 2012 using the Enterprise Edition licensing for Server+CAL is not ideal under all NUMA configurations that might exist for SQL Server. Whenever Enterprise Server+CAL licensing is being used, the NUMA configuration and scheduler statuses per NUMA node needs to be reviewed to ensure that SQL Server is configured for optimum performance. This problem does not occur under core-based licensing since all of the schedulers are licensed and VISIBLE ONLINE.

  16 Responses to “Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing”

  1. If you are in this scenario, and you are using a physical machine (not a VM), you can allegedly demand that Microsoft license all of your cores. From a comment on my blog post:

    If you had SA on your Server + Cal model before April 1st, 2012 then you can ask Microsoft for Amendment ID M145. Here is the part of the Amendment that you want:

    "Notwithstanding anything to the contrary in the Product Use Rights, if SQL Server 2012 Enterprise is run in a physical Operating System Environment (OSE) on the licensed server, that physical OSE may access any number of physical cores."

    • Aaron – have you heard anybody having success pursuing this?

      • Just the one comment on my blog. Can't hurt to send that request back up through your rep, though. And really your rep should have told you about it anyway, and been fully aware of your licensing details *before* allowing you to use SA to buy CAL for 2012 Enterprise… IMHO, anyway.

  2. That is nasty – thanks for the insight Jonathan!

  3. This is great information,thanks for sharing this Jonathan !

  4. This is an enlightening great post from Jonathan. Thank you very much.

  5. Jonathan, that's great work! I would guess Microsoft will want to do something about that in a future version. Meanwhile – and this is probably a thought you've already had but I'll ask anyway – is it possible to design a query to determine this imbalance? The group by parent_node_id shows the symptom, but not the NUMA/scheduler imbalance directly.

    • What about:
      The 3rd query listed (results in Figure 5) does this, but maybe it will be more obvious if we take the average of the task/worker counts?

      SELECT 
        parent_node_id, 
        AVG(current_tasks_count) AS avg_tasks_count, 
        AVG(runnable_tasks_count) AS avg_tasks_count, 
        AVG(active_workers_count) AS avg_workers_count, 
        AVG(load_factor) AS avg_load_factor
      FROM sys.dm_os_schedulers
      WHERE [status] = N'VISIBLE ONLINE'
      GROUP BY parent_node_id;

      Of course this will only yield useful information when there is actually enough load on the server to show a difference.

      • JR and Aaron,

        A minor change to Jonathan’s 3rd query that Aaron references may further highlight the allocated online scheduler imbalance across NUMA nodes.

        Add the following line as the second column in the SELECT statement:

        COUNT(*) As [# Schedulers],

        The problem documented in this post is indicated when the “# Schedulers” column values are different across the NUMA nodes (instead of equal values) in this revised query results.

        Comments?

        Great post, Jonathan!

        Scott R.

  6. […] Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing – Jonathan Kehayias (Blog|Twitter) […]

  7. Just noticed a typo, the query to display the original CAL licensing should not have the where clause:
    SELECT
    parent_node_id,
    [status],
    scheduler_id,
    [cpu_id],
    is_idle,
    current_tasks_count,
    runnable_tasks_count,
    active_workers_count,
    load_factor
    FROM sys.dm_os_schedulers
    WHERE [status] = N'VISIBLE ONLINE';

  8. I'm late for every party :) Also commented on Aaron Bertrand's blog post about the ENT + CAL limit. With the dawn of 12 core, 24 Lcpu per socket servers we wanted get geared up for performance testing MSSQL 2012 on 48 Lcpu. MSDN license was ENT + CAL, took a little while to determine this was why the server would only get 82% CPU busy. Once we looked at per-cpu stats it was ALMOST obvious :)

  9. Great information. Thanks for the article.

 Leave a Reply

(required)

(required)