R&D: Dynamically Updated Hash Index Clustering Method for DNA Storage
Method solves high redundancy problem after DNA sequence clustering, improves accuracy of data reading, and promotes development of DNA storage.
This is a Press Release edited by StorageNewsletter.com on October 31, 2023 at 2:00 pmComputers in Biology and Medicine has published an article written by Penghao Wang, Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, 116622, Dalian, China, Ben Cao, School of Computer Science and Technology, Dalian University of Technology, 116024, Dalian, China, Tao Ma, Brain Function Research Section, The First Hospital of China Medical University, 110001, Shenyang, China, Bin Wang, Qiang Zhang, Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, 116622, Dalian, China, and Pan Zheng, Department of Accounting and Information Systems, University of Canterbury, 8140, Christchurch, New Zealand.
Abstract: “The exponential growth of global data leads to the problem of insufficient data storage capacity. DNA storage can be an ideal storage method due to its high storage density and long storage time. However, the DNA storage process is subject to unavoidable errors that can lead to increased cluster redundancy during data reading, which in turn affects the accuracy of the data reads. This paper proposes a dynamically updated hash index (DUHI) clustering method for DNA storage, which clusters sequences by constructing a dynamic core index set and using hash lookup. The proposed clustering method is analyzed in terms of overall reliability evaluation and visualization evaluation. The results show that the DUHI clustering method can reduce the redundancy of more than 10% of the sequences within the cluster and increase the reconstruction rate of the sequences to more than 99%. Therefore, our method solves the high redundancy problem after DNA sequence clustering, improves the accuracy of data reading, and promotes the development of DNA storage.“