R&D: Integrating De-Dupe Function Into SSDs
Helps avoid writing duplicate contents to NAND flash chips.
This is a Press Release edited by StorageNewsletter.com on December 19, 2019 at 2:10 pmIEEE Xplore has published, in 2019 35th Symposium on Mass Storage Systems and Technologies (MSST) proceedings an article written by Zhichao Yan, Hong Jiang, Song Jiang, University of Texas at Arlington, Yujuan Tan, Chongqing University, and Hao Luo, Twitter.
Abstract: “Integrating the data deduplication function into SSDs helps avoid writing duplicate contents to NAND flash chips, which will not only effectively reduce the number of Program/Erase (P/E) operations to extend the device’s lifespan but also proportionally enlarge the effective capacity of SSD to improve the performance of its behind-the-scenes maintenance tasks such as wear-leveling (WL) and garbage-collection (GC). However, these benefits of deduplication come at a non-trivial computational cost incurred by the embedded SSD controller to compute cryptographic hashes. To address this overhead problem, some researchers have suggested replacing cryptographic hashes with error correction codes (ECCs) already embedded in the SSD chips to detect the duplicate contents. However, all existing attempts have ignored the impact of the data randomization (scrambler) module that is widely used in modern SSDs, thus making it impractical to directly integrate ECC-based deduplication into commercial SSDs. In this work, we revisit SSD’s internal structure and propose the first deduplicatable SSD that can bypass the data scrambler module to enable the low-cost ECC-based data deduplication. Specifically, we propose two design solutions, one on the host side and the other on the device side, to enable ECC-based deduplication. Based on our approach, we can effectively exploit SSD’s built-in ECC module to calculate the hash values of stored data for data deduplication. We have evaluated our SES-Dedup approach by replaying data traces in an SSD simulator and found that it can remove up to 30.8% redundant data with up to 17.0% write performance improvement over the baseline SSD.“











