Glenn Berry

Memory/Storage Technology Hierarchy and SQL Server

September 21, 2016 by in SQL Performance | 1 Comment
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

There is an old network saying which is: “Bandwidth problems can be cured with money. Latency problems are harder because the speed of light is fixed.”

Traditionally, there have been four primary layers in the overall memory and storage hierarchy of a database server. Starting at the top of the pyramid, you have static random-access memory (SRAM) caching, which is typically split between relatively small, fast L1 and L2 caches for each physical core of a processor and a larger, slower, on-die, shared L3 cache for the entire processor. For example, the 14nm Intel Xeon E7-8890 v4 processor (Broadwell-EX) has a 64KB per core L1 cache and a 256KB per core L2 cache, along with a much larger but slower 60MB L3 cache that is shared across the entire physical processor.

The second layer is dynamic random-access memory (DRAM) with license capacities up to 4TB with Windows Server 2012 R2, going up to 24TB with Windows Server 2016 and with latencies in the nanosecond range. Both SRAM and DRAM are volatile, which means they hold data only when electrical power is applied.

The third layer is SATA/SAS (NAND) SSDs, with individual capacities up to about 4TB and latencies in the microsecond range. One limiting factor with older SATA/SAS (NAND) SSDs is the fact that they are using the SATA/SAS interface which limits their total bandwidth, depending on the SATA/SAS version they are using. They are also limited by using the legacy AHCI protocol which has far more I/O overhead and higher latency than the newer NVMe protocol. The fourth layer is legacy magnetic spinning media hard disk drives, with individual capacities up to 10TB and latencies in the millisecond range.

Each of these traditional layers has higher latency, but lower cost per MB/GB and higher total capacity as you move down the memory/storage hierarchy. The relative latency of these four traditional layers is shown in Table 1.

Interface Type Relative Latency (reads)
On core/On Die CPU SRAM Cache 1x
Direct Attach DDR4 DRAM 10x
PCIe NVMe or SATA/SAS AHCI SSD 100,000x
SAS/SATA HDD 10,000,000x

Table 1: Traditional Memory/Storage Layers

New Memory/Storage Layers

Over the past couple of years, we have seen the introduction and increasing usage of NVM Express (NVMe) PCIe SSDs based on existing NAND flash technology. These typically have latencies in the 50-100 microsecond range. They also use the newer, much more efficient NVMe protocol and the PCIe interface, giving much better performance than older SAS/SATA SSDs using the old AHCI protocol.

Currently, Hewlett Packard Enterprise (HPE) is selling 8GB NVDIMM modules for their HPE Proliant DL360 Gen9 servers and HPE Proliant DL380 Gen9 servers. These modules have 8GB of DRAM which is backed by 8GB of flash for $899.00, which is pretty expensive per gigabyte. These two-socket servers have 24 memory slots that each support up to 128GB traditional DDR4 DIMMs. Any slots you use for NVDIMM modules won’t be available for regular memory usage. You can use a maximum of 16 memory slots for NVDIMM usage, which gives you a maximum capacity of 128GB. You must use Intel Xeon E5-2600 v4 series processors in order to get official NVDIMM support. Micron is scheduled to release larger capacity 16GB NVDIMMs in October of 2016.

Device IOPS Avg Latency (ns) MB/sec
NVM Express SSD 14,553 66,632 56.85
Block Mode NVDIMM 148,567 6,418 580.34
DAX Mode NVDIMM 1,112,007 828 4,343.78

Table 2: 4K Random Write Performance Comparison (1 thread, QD1)

The performance figures in Table 2 are from a Microsoft/Intel presentation (Persistent Memory in Windows) at IDF16 in San Francisco, using this performance testing methodology:

  • Workload: 4KB random writes, 1 thread, 1 outstanding I/O, synchronous I/O, 1GB file, NTFS, 3s warm-up, 7s measurement time
  • Hardware: HPE ProLiant DL380 Gen9, 2x Intel Xeon E5-2650L v3 @ 1.8GHz, 96GB RAM, 2x 8GB NVDIMM-N, 1x 1600GB NVMe SSD
  • Software: Pre-released WS 2016 build, Microsoft internal I/O tool

There is a good Channel 9 video called Accelerating SQL Server 2016 Performance with Persistent Memory in Windows Server 2016 featuring Lindsey Allen and Tobias Klima. A demo in the video shows how SQL Server 2016 running on Windows Server 2016 supports DAX mode (with an unspecified trace flag). Once Windows Server 2016 goes GA, the trace flag won’t be necessary with a future version of SQL Server.

These new memory storage layers will slot in between traditional DRAM memory and SATA/SAS SSDs.

A storage volume that has been formatted in DAX Mode could be used to host a SQL Server 2016 transaction log file in a scenario where you need the absolute best write performance possible, and you did not want to use Delayed Durability or In-Memory OLTP. Another possible scenario would be using a DAX Mode volume to host your tempdb data files if you had a workload that placed extreme stress on tempdb.

Future Memory/Storage Layers

In late 2016/early 2017, we should see the introduction of Intel Optane SSDs using the NVMe protocol. These will be flash storage devices that use Intel/Micron 3D XPoint Technology (pronounced as cross-point), that will work in existing servers. In Intel internal testing, these devices are showing about 10X lower latency, and about 10X higher IOPS than existing, very high performance Intel DC P3700 Series PCIe NVMe SSD devices.

In mid/late 2017, we should also see the rollout of a new form of persistent DIMM that can be used as very large capacity persistent memory, or as extremely high performance low capacity storage. These will be electrically and physically compatible with current DDR4 DIMMs, and will be supported in the next generation Intel Xeon processor based platform (the Skylake “Purley” platform). These products will also be based on Intel/Micron 3D XPoint Technology. These Intel DIMMs will offer up to twice the system memory capacity at a significantly lower cost than traditional DDR4 DRAM, assuming your processor/server platform supports it (and Microsoft raises the license memory limit after Windows Server 2016 is released). These DIMMs won’t have the same latency as traditional DRAM (they will be slower), but they will offer much lower latency than PCIe NVMe storage devices.

From a broader, industry-wide perspective, this new category of Persistent Memory (PM) devices will offer non-volatile storage with near DRAM-like performance. PM devices reside directly on the memory bus, giving them very low latency and high bandwidth. Microsoft supports PM devices in Windows 10 Anniversary Update and Windows Server 2016. There will be support for a new class of storage volume, which is called a Direct Access Storage (DAX) Volume. DAX Volumes use memory mapped files to provide applications with direct access to PM devices for the absolute best performance.

One reason why DAX mode is so much faster is because once you’ve memory-mapped the region of NVDIMM on a DAX volume, further interactions with that storage completely bypass the storage stack. It is literally just a memcopy to have the data be persistent. That is a whole lot of code you don’t have to execute on every interaction with the storage. That’s another very significant contributor to latency (along with the speed of light). DAX Volumes are supported on NTFS and you must choose DAX mode when you format the volume. Applications (such as SQL Server 2016) must have been modified (by Microsoft) in order to support and use DAX mode, and you will also need to enable a trace flag.

For backward compatibility on PM hardware, there will also be Block mode volumes, which maintain all existing storage semantics. All I/O operations will traverse the storage stack to the PM disk driver. This makes Block mode fully compatible with existing applications. If you have a PM hardware device, with a supported operating system, you will get significant storage performance without any application modification in Block mode. This means that down-level versions of SQL Server will be able to use Block mode storage volumes.

The bottom line on all of this is that we will have a lot more flexibility and new options for how to design and configure your memory and storage subsystem layers over the next 12-18 months, as long as you are using SQL Server 2016 on Windows Server 2016 and have new enough hardware that can support PM devices. Older versions of SQL Server will be able to use Block mode PM volumes if they are running on Windows Server 2016. Older hardware and older versions of Windows Server will be able to use Intel Optane SSDs. This will give us many additional choices for improving storage performance!