R&D: uNVMe-TCP, User Space Approach to Optimizing NVMe-oF TCP Transport
Compared with kernel solution, uNVMe-TCP shows 15% to 30% latency improvement on average with FIO benchmark.
This is a Press Release edited by StorageNewsletter.com on February 5, 2020 at 2:12 pmSpringer Nature Switzerland AG has published, in IOV 2019: Internet of Vehicles. Technologies and Services Toward Smart Cities proceedings, an article written by Ziye Yang, Qun Wan, Gang Cao, Intel, Shanghai, China, and Karol Latecki,Intel, Gdansk, Poland.
Abstract: “Recently, NVM Express has released the new TCP transport specification for NVMe over fabrics (NVMe-oF). And there are two kinds of implementations, i.e., one in kernel space and the other in user space. The implementation in the kernel (e.g., Linux kernel) is feasible, but there are several drawbacks such as performance, flexibility, and stability. In this paper, we would like to introduce uNVMe-TCP, which follows the specification and provides the NVMe/TCP transport in user space with improved performance and usage experience. We choose the optimization in user space since it is very difficult to optimize the whole NVMe I/O stack in kernel space through different kernel modules, and the optimization may affect other applications in user space. The idea of uNVMe-TCP is to optimize the whole NVMe I/O stack on TCP transport, i.e., leveraging the lock-free user space NVMe I/O stack and configurable network I/O stack (both kernel and user space TCP stack can be supported). Currently uNVMe-TCP provides the solution on both target and initiator side, and it can be tested against Linux kernel solution with good interoperability. Besides, some experiments are conducted to demonstrate the performance of uNVMe-TCP. Compared with the kernel solution, uNVMe-TCP shows 15% to 30% latency improvement on average with FIO benchmark. And the per CPU core performance of uNVMe-TCP is promising, i.e., it is 2.2 times of the kernel on average with the increasing number of connections. Furthermore, uNVMe-TCP is also scalable in CPU aspect.“











