Industry Outlook: NVMe and NVMe-oF For Storage

This week, Industry Outlook asks David Woolf, Senior Engineer of Datacenter Technologies at the University of New Hampshire InterOperability Laboratory (UNH-IOL), about NVMe and NVMe-oF and their role in storage and the data center. David has developed dozens of industry-reviewed test procedures and implementations as part of the team that has built the UNH-IOL into a world-class center for interoperability and conformance testing. He has also helped to organize numerous industry interoperability test events at both at the UNH-IOL facility and off-site locations. David has been an active participant in a number industry forums and committees addressing conformance and interoperability, including the SAS Plugfest Committee and SATA-IO Logo Workgroup. He has also served as cochair of the MIPI Alliance Testing Workgroup in addition to coordinating the NVMe Integrators List and Plugfests.

Industry Outlook: Why is the storage industry moving toward NVMe and NVMe-oF? What’s driving this shift?  

David Woolf: Before diving into why storage is shifting to Non-Volatile Memory Express (NVMe), it's important to understand what NVMe and NVMe Over Fabrics (NVMe-oF) are. NVMe is a storage interface designed specifically for solid-state drives. NVMe-oF allows remote fabric access to flash storage with latency similar to local access over Fibre Channel, Ethernet and other such transports.

In the high end client space, NVMe has taken over the spot previously held by Serial AT Attachment (SATA). Even high-end smartphones are now using an NVMe interface, whereas they had previously used embedded Multi-Media Controller (eMMC). A similar shift is happening in the enterprise space. For applications where top-end performance is necessary, NVMe is becoming the industry standard.

The fact that NVMe was designed specifically for flash memory, unlike other SATA and other storage interfaces, is important. SATA inherited the AT Attachment (ATA) command set, which has been under development for decades, and includes many features specific to handling spinning-disk media. The first solid-state drives (SSDs) to hit the market with SATA interfaces were indeed operating at a higher performance than their spinning counterparts. But they were unable to fully reap the performance benefit of flash media, since the controllers needed to use the ATA command set, which had become fairly complicated. It’s easy to see why a flash-memory medium would be faster than a mechanical spinning-disk medium. The thing that gives NVMe its advantage over SSD interfaces is that its command set is optimized and simplified for flash memory. This approach allows the design of a streamlined controller that understands the less complicated NVMe protocol rather than the more complicated ATA protocol.

NVMe was originally designed to employ the PCI Express (PCIe) transport. PCIe is recognized as a robust communication channel from the CPU to any peripheral device. It has low latency, high bandwidth and the ability to use multiple lanes—all critical features for a storage application. But PCIe’s primary use is in a server.

NVMe-oF allows the NVMe protocol to be mapped over a fabric technology. This capability allows storage nodes containing NVMe SSDs using PCIe to connect using a fabric technology. The result is that much larger pools of storage, connected through high-performance storage-area networks, are possible. Applications that use large data sets will be able to run with much better performance.

IO: What fabrics are NVMe-oF deployments using? What are the benefits and challenges of each? Is the protocol likely to use other fabrics?

DW: The primary fabrics in use today, and tested at recent NVMe plugfests at UNH-IOL, are RDMA over Converged Ethernet (RoCE) and Fibre Channel. InfiniBand and iWARP are other technologies that enable deployment of NVMe-oF.

Fibre Channel (FC) has a rich history as a storage fabric, and many enterprises that use it are happy to continue to leverage that investment as they bring NVMe into their networks. With Gen6 Fibre Channel at 32Gbps FC per lane (128Gbps for four combined lanes on a single QSFP28 connector), they are keeping pace at the signaling layer with Ethernet at 25Gbps per lane (100Gbps for four combined lanes on a single QSFP28 connector).

Of course, Ethernet already has massive numbers of ports deployed, numerous IT pros who are fluent in the technology, strong interoperability at the physical protocol layers and low cost per port. In truly massive hyperscale deployments, the low per-port cost of Ethernet can make a dent in the cost of outfitting a data center. Some traffic-management and congestion-avoidance features that must sit on top of Ethernet to make it a lossless fabric can increase latency. Several companies, however, are working to mitigate the higher latency by integrating those features in silicon.

Looking to the future, the NVMe organization is actively working on a binding specification for NVMe-oF over TCP.

IO: What are the challenges in implementing NVMe-oF? 

DW: Interoperability is one challenge, simply owing to the number of components that must work together for a system to function properly, but we’ll address that issue later. A major challenge will be tuning systems for the best performance. NVMe will handle different types of workloads. Streaming audio/video, for example, has a different traffic fingerprint than pushing a data set through a machine-learning algorithm. Configuring fabrics to handle traffic management and congestion avoidance while delivering the best performance for a particular workload will raise challenges.

IO: Is NVMe-oF secure enough for critical data? 

DW: NVMe does provide secure send and receive functions, as well as some cryptographic erase capability. NVMe-oF depends on the security of the fabric it’s deployed on. Since many enterprises employ Fibre Channel and RoCE for critical data, it’s reasonable to say that NVMe-oF can handle critical data as well.

IO: Why is interoperability important for NVMe-oF technology?

DW: NVMe-oF allows creation of large storage pools in a data center. As such, interoperability is about more than components playing nice inside a single server or storage node. Instead, many more pieces must work in concert for a high-performing system.

In a storage node or server, the NVMe driver—whether the in-box open-source driver that comes with the operating system or a proprietary driver—must interoperate with the NVMe SSD and the PCIe chipset. The NVMe-oF driver must interoperate with the Ethernet NIC or FC HBA on the host side. The target side could use an off-the-shelf HBA or NIC that plugs into the PCIe interface in a storage node, or it could be integrated in the silicon. The sheer variety of combinations possible with NVMe-oF demands that the community address interoperability. The NVMe organization is doing so through a series of plugfests.