by Tom Fenton

In the datacenter, there is a never-ending race between processors and storage. Over the past decade, processors have increased the number of cores they contain from one or two to 8, 16, 32 or even 64. Storage technologies over this time have dramatically decreased their latency and increased data throughput. However, the reality is that far too many CPU cycles are being wasted waiting for data residing on PCIe devices or on network storage. We are reaching an inflection point with the emergence of a new type of storage hardware: Storage Class Memory (SCM). SCM resides on the memory channel, which is not hampered by latency and data bandwidth limitations of peripheral storage. Being on the memory channel places SCM far closer to the CPU, meaning it has more concurrent “swim lanes” for shuttling data to modern CPUs. This significantly increases the speed at which data can be accessed by the processor.

As SCM is far faster than the common datacenter storage solution, it has the ability to reshape how we use storage. Although SCM has the potential to put storage in the lead of the datacenter technology race, it will only succeed in doing so if we can take advantage of it—which is entirely possible with the devised capability to effectively harness SCM’s power for data storage. In this article, we will lay out what SCM is and discuss what makes it so transformative: the promise of this technology, our process of testing it, and why we think it may be transformative in the datacenter.

One of the problems with many new hardware technologies is that they can require a rewrite or re-architecture of applications or services in order to take advantage of their capabilities. A prime example of this is when it was necessary to write multithreaded code to take advantage of multi-core processors when they first came out. SCM was in the same situation until a company, Formulus Black, devised a way to allow existing, unmodified applications to take advantage of SCM to speed up application performance. Formulus Black developed memory management software that presents a standard POSIX-compliant block device for applications to leverage SCM without any modification. Early testing by Formulus Black shows that applications that make use of SCM-backed storage demonstrate substantial performance gains. We will verify and quantify these gains in the labs.

Before digging into the particulars of Formulus Black, let us offer a brief refresh of SCM and its background. SCM is far different than any other server storage that we have seen in that processors access it via the memory bus via DIMM slots, rather than via a Peripheral bus (as is the case with NVMe and SSD/HDD)—and this method of processor access translates to a substantial decrease in latency. Unlike DRAM (which isn’t persistent), SCM will retain information after a power loss or after a reboot. Although SCM has other capabilities over SSD/HDD technologies, the speed at which it can be accessed and its persistency are by far the most important ones.

The technology to make SCM a reality took a long time to develop. Because you can’t simply plug NAND (which is currently being used in SSD devices) into DIMM slots and expect it to perform well, a new form of semiconductors needed to be developed. Intel was at the forefront of SCM technology with its 3D XPoint chip which it uses in its Optane DC Persistent Memory product line.

Intel Optane DC Persistent Memory Module

Early testing by Intel shows that 3D XPoint is 100 times faster than NAND, but only 10 times slower than DRAM. Despite being a magnitude slower than DRAM, 3D XPoint supports devices with higher capacity, costs less, and as noted, has data persistency—which DRAM does not. Although there are different SCM PMEM products in the market, for simplicity’s sake and as Intel appears to be the leader in the field at this time, we will focus on its SCM offering in this article.

Once SCM products became available, companies needed to figure out the best way to exploit this technology—and Formulas Black did just that with Forsa. Forsa is a software stack that allows the creation and management of a block-level device called a logical extended memory (LEM), using SCM or DRAM as physical memory media. Being that a LEM is POSIX-compliant, an application can use it directly; you can mount a standard filesystem on it, or it can be used by a virtual machine (VM). For clarification, Forsa can also be used with DRAM, but the testing we will be performing in our lab will be with Optane DC Persistent Memory (DCPMM).

There are other block-level device drivers for DCPMM but unlike other block-level devices, Formulus Black endowed the LEM with enterprise-storage features like data integrity, real-time data reduction, clones, snapshots, high availability, etc. These features can be utilized (regardless if the LEM is being used by a VM) as a filesystem, or directly by an application. Furthermore, FORSA deals with the complexities that are unique to using the memory channel, such as stock DCPMM does not have NUMA awareness, whereas FORSA LEMs are NUMA aware due to their NURA architecture. For example, instead of having to provision and manage four separate SCM storage regions on the Lenovo SR950 server in our test lab, Forsa maps all SCM memory regions across all NUMA nodes on a multi-socket server and enables you to provision and manage SCM-based LEMs using the total SCM capacity of all.

Creating and enabling LEMs with the above-mentioned enterprise features is very straightforward, as Forsa has a slick web-based user interface. However, as Formulus Black has an API-first mindset, all LEM management features can be accessed via their RESTful API. 

To ensure data integrity, Forsa has a Central Fault Tolerance Manager (CFTM) that does memory-error checking and Bad Block Replacement (BBR).

Enhancing data efficiency, Formulus Black also offers a data reduction feature, a real-time inline algorithm that uses their proprietary Formulus Bit Marker (FbM) technology to reduce duplicate data. Early testing by Formulus Black indicates that FbM can increase the amount of raw data that can be stored in the same physical memory media and decrease the effective cost per GB of using memory as a fast tier of storage. In a niche case test in which they deployed many RHEL VMs instances, however, they claim that FbM increased the effective storage capacity of memory by over 20x. This is due to FbM’s ability to detect data patterns such as golden image instances of RHEL and other application data running across multiple virtual machine instances.

For data protection, Forsa can be used in High Availability (HA) mode in which it creates a mirror image of the LEM you want to protect on a second node. We see HA mode being extremely useful with high-value LEMs, or when using DRAM as storage backing, as it is non-persistent. 

StorageReview Formulus Black Protected LEMs

You can also protect LEMs by backing them up to an SSD storage device. Forsa’s backup feature, BLINK, is one which you can use on all, or just some, of the LEMs on a system. Like HA mode, we see BLINK being extremely useful with high value LEMs or when using DRAM as storage backing as it is non-persistent.

StorageReview Formulus Black Selective Blink

There may be cases in which the LEM you want to create exceeds the capacity of the DRAM or SCM on a single server. To accommodate these situations, you can use Forsa to create a LEM that spans two servers that are running Forsa.

StorageReview Formulus Black Expanded LEM

The requirements to run Forsa are rather loose, and you can find them on the Formulus Black website. The requirements for Intel Optane DC Persistent Memory are more restrictive, as it is only supported on certain motherboards and certain models of their latest processors. For our testing, we will be using a well-equipped Lenovo SR950 server. The SR950 that we will be using has 768 GB of RAM, 4 x 8280M CPUs, an onboard SATA m.2 SSD that will be used for boot, and 12x 1.6TB Intel P4610 NVMe SSDs. In our previous testing, we saw impressive performance results. In VDBench workloads, it was able to deliver over 5 million IOPS in 4K read, and 3.2 million IOPS in 4K write. This is the perfect system to test Forsa, as it will not be bottlenecked by any CPU performance issues. A full review of the SR950 can be found here

A few applications, such as SAP HANA, have been rewritten or modified to take advantage of DCPMM technology, but the vast majority have not. Besides having the potential to use this extremely fast storage, Forsa extends the capabilities of DCPMM as it supports features that enterprise customers demand, such as HA, backup, and data reduction via FbM. Formulus Black Forsa has a lot of promise, and we are looking forward to working with it in our lab. The ability to take advantage of SCM technology without rewriting or re-architecting applications could make Forsa the killer application for DCPMM.

Formulas Black has made some bold claims regarding Forsa being the fastest block storage interface for persistent memory on the market and claims Forsa LEMs have even outperformed persistent memory native filesystems. At StorageReview, we are looking forward to working with it in our lab and testing these claims. 

Formulus Black Free Trial

Formulus Black Product Brief (PDF)

Dicsuss on Reddit

This report is sponsored by Formulus Black. All views and opinions expressed in this report are based on our unbiased view of the product(s) under consideration.