R&D: Capacity-Achieving Constrained Codes with GC-Content and Runlength Limits for DNA Storage
Extensive experiments showed that if GC-content is too high (low), or homopolymer run exceeds 6 in DNA sequence, there will give rise to dramatical increase of insertion, deletion and substitution errors.
This is a Press Release edited by StorageNewsletter.com on January 11, 2023 at 2:00 pmIEEEXplore has published, in 2022 IEEE International Symposium on Information Theory (ISIT) proceedings, an article written by Yajuan Liu, Xuan He, and Xiaohu Tang, School of Information Science and Technology, Southwest Jiaotong University, China.
Abstract: “GC-content and homopolymer run are two constraints of interest in DNA storage systems. Extensive experiments showed that if GC-content is too high (low), or homopolymer run exceeds six in a DNA sequence, there will give rise to dramatical increase of insertion, deletion and substitution errors. Committing to study the DNA sequences with both constraints, a recent work (Nguyen et al. 2020) proposed a class of (ϵ, ℓ)-constrained codes that can only asymptotically approach the capacity, but may have reasonable loss for finite code lengths.In this paper, we design the first (ϵ, ℓ)-constrained codes based on the enumeration coding technique which can always achieve capacity regardless of code lengths. In addition, motivated by the influence of local GC-content, we consider a nontrivial case that the prefixes of a DNA sequence also hold GC-content constraint for the first time, called (δ,ℓ)-prefix constrained codes.“











