R&D: Multiple Errors Correction for Position-Limited DNA Sequences with GC Balance and no Homopolymer for DNA-Based Storage
Simulation results indicate that proposed coding scheme can offer outstanding error protection to DNA sequences.
This is a Press Release edited by StorageNewsletter.com on April 11, 2023 at 2:00 pmBriefings in Bioinformatics has published an article written by Xiayang Li, Moxuan Chen, School of Mathematics, Tianjin University, Tianjin, 300372, China, and Huaming Wu, Center for Applied Mathematics, Tianjin University, Tianjin, 300372, China.
Abstract: “Deoxyribonucleic acid (DNA) is an attractive medium for long-term digital data storage due to its extremely high storage density, low maintenance cost and longevity. However, during the process of synthesis, amplification and sequencing of DNA sequences with homopolymers of large run-length, three different types of errors, namely, insertion, deletion and substitution errors frequently occur. Meanwhile, DNA sequences with large imbalances between GC and AT content exhibit high dropout rates and are prone to errors. These limitations severely hinder the widespread use of DNA-based data storage. In order to reduce and correct these errors in DNA storage, this paper proposes a novel coding schema called DNA-LC, which converts binary sequences into DNA base sequences that satisfy both the GC balance and run-length constraints. Furthermore, our coding mode is able to detect and correct multiple errors with a higher error correction capability than the other methods targeting single error correction within a single strand. The decoding algorithm has been implemented in practice. Simulation results indicate that our proposed coding scheme can offer outstanding error protection to DNA sequences.“











