R&D: Four Articles on NAND Flash, SSDs and NVMe

R&D: Page-Overwrite Data Sanitization in 3D NAND Flash, Challenges, Feasibility, and the PULSE Solution

Experimentally evaluate page-overwrite-based sanitization on commercial 3D NAND flash memory chips and uncover significant threshold voltage disturbances in erased cells on adjacent pages within the same layer but across different sub-blocks.

ACM Transactions on Embedded Computing Systems has published an article written by Matchima Buddhanoy, Electrical and computer engineering, Colorado State University, Fort Collins, USA, Aleksandar Milenkovic, Electrical and computer engineering, The University of Alabama in Huntsville, Huntsville, USA, Sudeep Pasricha, and Biswajit Ray, Biswajit Ray Electrical and computer engineering, Colorado State University, Fort Collins, USA.

Abstract: “Instant data deletion (or sanitization) in NAND flash devices is essential for achieving data privacy, but it remains challenging due to the mismatch between erase and write granularities, which leads to high overhead and accelerated wear. While page-overwrite-based instant data sanitization has proven effective for 2D NAND, its applicability to 3D NAND is limited due to the unique sub-block architecture. In this study, we experimentally evaluate page-overwrite-based sanitization on commercial 3D NAND flash memory chips and uncover significant threshold voltage disturbances in erased cells on adjacent pages within the same layer but across different sub-blocks. Our key findings reveal that page-overwrite sanitization increases the median raw bit error rate (RBER) beyond correction limits (exceeding 0.93%) in Floating-Gate (FG) Single-Level Cell (SLC) technology, whereas Charge-Trap (CT) SLC 3D NAND flash memories exhibit higher robustness. In Triple-Level Cell (TLC) 3D NAND, page-overwrite sanitization proves impractical, with the median RBER of ∼13% for FG and ∼5% for CT devices. To overcome these challenges, we propose PULSE, a low-disturbance sanitization technique that balances sanitization efficiency (\({{\eta }_{san}}\)) and data integrity (RBER). Experimental results show that PULSE eliminates RBER increases in SLC devices and reduces the median RBER to below 0.57% for FG and 0.79% for CT in fresh TLC blocks, demonstrating its practical viability for 3D NAND flash sanitization.“

R&D: Fraggle, Reducing File Fragmentation on DRAM-less SSD

Paper presents Fraggle, a host-device co-design that mitigates file fragmentation on DRAM-less SSDs by preserving the efficacy of preallocation within an extended LBA space.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems has published an article written by Weizhou Huang, Hepei Wu, Shuhan Bai, Jian Zhou, Tong Zhang, and Fei Wu, Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China.

Abstract: “Mobile devices adopt DRAM-less SSDs as the flash storage to reduce power consumption and manufacturing costs, yet DRAM-less SSDs are significantly impacted by file fragmentation, thereby degrading the responsiveness of mobile devices. Existing methods to mitigate file fragmentation, including preallocation and defragmentation, face critical limitations. Preallocation prevents file fragmentation by reserving contiguous free LBA space for individual files. However, within the limited LBA space exposed by the SSD, contiguous free space is gradually consumed, and fragmented by scattered data blocks, reducing the effectiveness of preallocation. Defragmentation recovers file access performance by reorganizing fragmented file layouts into contiguous ones. However, when applied to DRAM-less SSD, defragmentation introduces excessive cache eviction of dirty mapping entries to the slow NAND flash, causing significant performance disruption. This paper presents Fraggle, a host-device co-design that mitigates file fragmentation on DRAM-less SSDs by preserving the efficacy of preallocation within an extended LBA space. To effectively manage the extended LBA space, Fraggle introduces two key components: (1) HashFTL, a compact, cache-efficient FTL mapping mechanism specifically optimized for DRAM-less SSDs, and (2) a device-assisted space allocator that simultaneously reduces space allocation overhead and resolves layout imbalance issues in HashFTL caused by sparse LBA distribution. Evaluation results demonstrate that, compared to F2FS and state-of-the-art preallocation methods, Fraggle accelerates execution time by up to 9.26× and 8.07× under SQLite and real-world mobile application workloads.“

R&D: LEPA, Latency Estimation-based Page Allocation for Improving SSD Performance

Proposed method effectively improves SSD system I/O performance and reduces request response time by 14.7%.

IEICE Electronics Express has published an article written by Wentian Wu, Qi Wang, Institute of Microelectronics of the Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Qianhui Li, Institute of Microelectronics of the Chinese Academy of Sciences, Tong Qu, Institute of Microelectronics of the Chinese Academy of Sciences, and University of Chinese Academy of Sciences, and Zongliang Huo, Yangtze Memory Technologies Co., Ltd., Wuhan 430205, China.

Abstract: “Modern solid state drives (SSDs) based on NAND Flash Memory (NFM) have multi-level parallel resources to enhance their I/O performance. The page allocation policy, which is responsible for allocating logical pages to physical parallel resources, directly affects the efficiency of SSD parallelism utilization. Traditionally, the load-balancing page allocation policy relies on the number of commands rather than their actual latency. However, this policy fails to effectively balance the execution latency skew among the various parallel units of SSD, leading to parallelism loss.“

“To address these problems, we propose a load-balancing method based on latency estimation called LEPA. Instead of relying on the number of commands, LEPA estimates the waiting latency of commands in pending to determine the load. Our experimental results indicate that our latency estimation-based load balancing page allocation significantly reduces the die load skew (by 85.9% on average) caused by the inaccurate estimations of traditional methods, with minimal overhead. Moreover, LEPA demonstrates an 8.0% improvement in plane-level parallelism. As a result, the proposed method effectively improves SSD system I/O performance and reduces request response time by 14.7%.“

R&D: SAKER, Software Accelerated Key-value Service via NVMe Interface

Authors proposed a software-based facility, named SAKER, at host side to remove or alleviate performance bottleneck at the interface.

ACM Digital Library has published, in SYSTOR ’25: Proceedings of the 18th ACM International Systems and Storage Conference, an article written by Chen Zhong, Wenguang Wang, and Song Jiang, University of Texas at Arlington, USA.

Abstract: “The NVMe Key Value (NVMe-KV) Command Set has been standardized to enable access to an NVMe device with a key rather than a block address and make an NVMe device a KV service provider. This new interface opens an exciting opportunity of offloading extensive data management chores to an external KV device and streamlining the KV-based data processing at the host. However, the interface itself may become a major performance bottleneck with small KV access and make the technology hard to be deployed in diverse application scenarios. In this paper we proposed a software-based facility, named SAKER, at the host side to remove or alleviate the performance bottleneck at the interface. SAKER, which was prototyped in an NVMe-KV SSD emulator, demonstrates that it can effectively keep the NVMe-KV interface from becoming the performance bottleneck even with small KV requests in most workloads.“