R&D: Encoding Scheme to Enlarge Practical DNA Storage Capacity by Reducing Primer-Payload Collisions
Investigation result shows collision between primers and DNA payload sequences is major factor limiting DNA storage capacity.
This is a Press Release edited by StorageNewsletter.com on July 11, 2024 at 7:57 pmACM Digital Libray has published, in ASPLOS ’24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 proceedings, an article written by Yixun Wei, University of Minnesota Twin Cities, Minneapolis, MN, USA, Bingzhe Li, University of Texas at Dallas, Richardson, USA, and David H. C. Du, University of Minnesota Twin Cities, Minneapolis, USA.
Abstract: “Deoxyribonucleic Acid (DNA), with its ultra-high storage density and long durability, is a promising long-term archival storage medium and is attracting much attention today. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. Many encoding schemes have been proposed to enlarge DNA storage capacity by increasing DNA encoding density. However, only increasing encoding density is insufficient because enhancing DNA storage capacity is a multifaceted problem.“
“This paper assumes that random accesses are necessary for practical DNA archival storage. We identify all factors affecting DNA storage capacity under current technologies and systematically investigate the practical DNA storage capacity with several popular encoding schemes. The investigation result shows the collision between primers and DNA payload sequences is a major factor limiting DNA storage capacity. Based on this discovery, we designed a new encoding scheme called Collision Aware Code (CAC) to trade some encoding density for the reduction of primer-payload collisions. Compared with the best result among the five existing encoding schemes, CAC can extricate 120% more primers from collisions and increase the DNA tube capacity from 211.96 GB to 295.11 GB. Besides, we also evaluate CAC’s recoverability from DNA storage errors. The result shows CAC is comparable to those of existing encoding schemes.“