IEEE Xplore has published, in 2019 IEEE 37th International Conference on Computer Design (ICCD) proceedings, an article written by Chunxue Zuo, Fang Wang, Huazhong University of Science and Technology, China, Ping Huang, Temple University, USA, Yuchong Hu, and Dan Feng, Huazhong University of Science and Technology, China.
Abstract: “Data deduplication is a widely deployed technique to remove duplicate content to save storage space, which is however incapable of eliminating the redundancy between nonidentical but similar data blocks. To achieve further space savings in deduplicated storage systems, delta compression is employed to compress post-deduplication data. Both deduplication and delta compression introduce content references among blocks, which inevitably undermines the reliability of deduplicated and delta compressed storage systems. To ensure better reliability, existing approaches utilize either replication or erasure codes to redundantly distribute data across multiple nodes. In deduplicated and delta compressed storage systems, we observe that delta compressed chunks (DCCs) are far smaller than regular chunks called non-DCCs. Motivated by this observation, we suggest a straightforward approach in which replication is used to protect DCCs and erasure code is deployed to protect non-DCCs. However, we need to address two critical challenges to ensure this solution effective. First, the random placement of DCCs replicas destroys cache locality. Second, the separate and individual recovery and restore cache could cause storage containers to be accessed repeatedly. To address these two challenges, in this paper, we propose RepEC-Duet which employs both replication and erasure codes to ensure high reliability and performance for deduplicated and delta-compressed storage systems. RepEC-Duet introduces a delta-utilization-aware filter to select and replicate containers based on the percentage of DCCs in the containers to maintain cache locality. Moreover, to avoid unnecessary container reads, we design a cooperative cache scheme that is aware of both failure recovery and regular restore cache. Our experimental results based on three real-world datasets demonstrate that RepEC-Duet significantly improves the restore performance by 26%-59%, and reduces the storage overhead by 54%-98% than the existing approaches.“