What are you looking for ?
RAIDON

R&D: Preventing Data Popularity Concentration in HDFS-Based Cloud Storage

Simulation results demonstrate promising benefits of proposed popularity-aware balancer by evaluating uniform distribution of popular data across nodes without compromising amount of data transfers and variance in disk space.

ACM Digital Library has published, in UCC ’19 Companion Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion, an article written by Thanda Shwe, Mandalay Technological University, Mandalay, Myanmar, and Masayoshi Aritsugi, Kumamoto University, Kumamoto, Japan.

Abstract: Hadoop Distributed File System (HDFS) often experiences skew in data storage over time, mainly because of random data block allocation policy, datanode failure, replica reconstruction, and client activity, leading to utilization and load imbalance in the system. Although HDFS provides tools to rebalance the data in the cluster, balancer only considers balancing disk space utilization among nodes which re-allocates the data from highly utilized nodes to low utilized nodes. Thus, data access skew which is caused by piling a large amount of popular data in one node is not addressed in the default HDFS balancer. To address this issue, we present popularity-aware balancer based on node popularity score which spreads the popular data uniformly among datanodes, resulting in the balance of future access load balancing and reduction of hot spots in the cloud storage system. Simulation results demonstrate the promising benefits of proposed popularity-aware balancer by evaluating the uniform distribution of popular data across nodes without compromising the amount of data transfers and variance in disk space.

Articles_bottom
SNL Awards_2026
AIC