Revolutionizing Data Center Operations and the Impact of NVMe Host Managed Live Migration

The UNH-IOL has been steadily working with the NVM Express (NVMe) working group for over a decade, beginning with the development of the NVM Express standard, which is still being used today. In close partnership with NVMe collaborators, the IOL designed and created IOL INTERACT, a software tool aimed at streamlining and automating the testing process for NVMe compliance. This tool is utilized for pre-testing storage devices before official testing and is employed worldwide to ensure product validity. Additionally, the IOL helped establish the Integrators Lists, which features NVMe products that have successfully met compliance and interoperability standards, with IOL INTERACT™.

Every six months, the UNH-IOL develops and releases NVMe test plans that highlight the latest features and changes to the standard. The introduction of NVMe Technical Proposals (TP) and Errata Change Notices (ECN) keeps the IOL informed about new requirements before they make it to the official NVMe specifications. This approach enables the IOL to develop test scripts for IOL INTERACT during the test plan development process, allowing vendors to verify the compliance of their devices as quickly as possible.

Major releases of IOL INTERACT are published alongside the latest test plans every six months, while tertiary releases occur approximately every two months to fix bugs, enhance UX/UI, and meet user expectations. The latest major release of INTERACT (v23.0a) allows vendors to test against their implementations of Live Migration. PCIe Infrastructure for Live Migration test plans can be found in Group 12 of the UNH-IOL NVM Command Set Conformance Test Plan v23.0. 

In July 2024, the NVM Express® (NVMe®) working group released TP4159: PCIe Infrastructure for Live Migration. By August, it was officially integrated into the NVMe Base Specification Revision 2.1. This blog discusses the functionality of NVMe Host Managed Live Migration (HMLM),the groups it impacts, and the reasons for implementation, all from the perspective of Austin Snow, a recent UNH Computer Science graduate and employee at the UNH-IOL.

The release of the NVMe HMLM capability addresses security concerns and management challenges within data centers. It provides support for the migration of virtual machines (VMs) by both hosts and controllers, while also enhancing various data center maintenance processes.

Definitions

An NVM Subsystem that supports HMLM will consist of one or more Migration Management Controllers (MMC), indicated by the HMLMS bit, and all other controllers within the subsystem will be considered Migratable Controllers (MC). This subsystem is hosted by a Migration Management Host (MMH), and Controllers can be migrated between a Source MMH (MMHS) and a Destination MMH (MMHD).

Benefits

Since an MMC is used to perform the migration of an MC, the MC is entirely unaware that the migration is occurring. To support this, MMC has the ability to suspend/resume the MC using the Migration Send command, which preserves the static state within the MMHS. The Migration Send command can explicitly set the Controller State, while  the Migration Receive command can retrieve the Controller State. Data Centers can coordinate multiple MMHs to fully migrate the static state of suspended controllers without losing or exposing any user data, thereby preserving customer security. The migration of controllers also allows for system servicing when needed and helps with load balancing, ensuring that server space is allocated and utilized effectively for customers when necessary.

Live Migration

You might be wondering why the process is called Live Migration when MMHs require suspending controllers to a somewhat inactive “static state” before performing the migration. When a controller is suspended, it does not completely halt all operations. Instead, the Controller continues to process commands that already exist within its submission queues, but will not receive any new commands on these queues. The drive will operate as if there is no downstream traffic coming from the MMH. If any command that is processed after the Controller is suspended modifies the static state of the controller (or any of its attached namespaces), then these updates can be reported by the MMC.

The MMC has the ability to log changes in user data that occur on namespaces associated with the MC to an area in MMH memory known as a Controller Data Queue (CDQ). MMCs can create and delete CDQs using the Controller Data Queue command. Manufacturers can implement their own vendor specific CDQs, but NVMe® conformance certification requires the support of the User Data Migration Queue (UDMQ) Queue Type. 

To begin logging user data changes on the namespace, the MMH must issue a Track Send command that specifies a valid CDQ identifier and the Start Logging action for the Log User Data Changes operation. Once the MMC starts logging, the MMHS can begin copying the original namespace information to an MMHD.

Now that the MMC is reporting changes in host memory, the MMHS can prompt the MMC to track these changes in host memory as the MC processes commands from its queues. To do this, the MMHS can submit a new Track Send command to the MMC, specifying the Controller Identifier (CNTLID) of the MC and indicating the Start Tracking action for the Track Memory Changes operation. This command also uses a Track Memory Changes data structure populated by the MMHS, which outlines the specific memory ranges to be tracked. In order to see the tracked memory changes, the MMHS can issue Track Receive commands to the MMC specifying the CNTLID of the MC.

Once the MMHS has completed copying the namespaces to an MMHD, it will determine a time to suspend the MC. The MMHS will continue to copy the user data changes that are reported in 

the UDMQ to the MMHD, utilizing the tracked host memory changes reported by the Track Receive command(s). Next, the MMHS will suspend the MC by submitting a Migration Send command that specifies the Suspend Operation to a MMC, where it can migrate the remaining user data reported in the UDMQ (again utilizing the tracked memory changes from Track Receive commands). At this stage, all user data has been migrated from the MMHS to the MMHD, and only the state of the controller remains. The MMHS can simply issue Migration Receive command(s) to the MMC specifying the Get Controller State operation, to transfer the necessary data to the MMHD.Following this, the MMHD can issue Migration Send command(s) specifying the Set Controller State operation to its own MMC. Lastly, the MMHD can issue a Migration Send command specifying the Resume operation to its MMC to resume the MC and complete the migration without interruption.