What are you looking for ?

R&D: Leveraging NVMe SSDs for Building Fast, Cost-effective, LSM-Tree-Based KV Store

Adapting RocksDB system to utilize selective deployment of high-speed SSDs

ACM Transactions on Storage has published an article written by Cheng Li, Hao Chen, Chaoyi Ruan, University of Science and Technology of China, China, Xiaosong Ma, Qatar Computing Research Institute, HBKU, Qatar, and Yinlong Xu, University of Science and Technology of China, China and Anhui Province Key Laboratory of High Performance Computing, China.

Abstract: Key-value (KV) stores support many crucial applications and services. They perform fast in-memory processing but are still often limited by I/O performance. The recent emergence of high-speed commodity non-volatile memory express solid-state drives (NVMe SSDs) has propelled new KV system designs that take advantage of their ultra-low latency and high bandwidth. Meanwhile, to switch to entirely new data layouts and scale up entire databases to high-end SSDs requires considerable investment. As a compromise, we propose SpanDB, an LSM-tree-based KV store that adapts the popular RocksDB system to utilize selective deployment of high-speed SSDs. SpanDB allows users to host the bulk of their data on cheaper and larger SSDs (and even hard disc drives with certain workloads), while relocating write-ahead logs (WAL) and the top levels of the LSM-tree to a much smaller and faster NVMe SSD. To better utilize this fast disk, SpanDB provides high-speed, parallel WAL writes via SPDK, and enables asynchronous request processing to mitigate inter-thread synchronization overhead and work efficiently with polling-based I/O. To ease the live data migration between fast and slow disks, we introduce TopFS, a stripped-down file system providing familiar file interface wrappers on top of SPDK I/O. Our evaluation shows that SpanDB simultaneously improves RocksDB’s throughput by up to 8.8\(\) and reduces its latency by 9.5–58.3%. Compared with KVell, a system designed for high-end SSDs, SpanDB achieves 96–140% of its throughput, with a 2.3–21.6\(\) lower latency, at a cheaper storage configuration.