R&D: Optimal Codes Correcting Single Indel/Edit for DNA-Based Storage
Provide linear-time map that translates binary messages into GC-balanced codewords.
This is a Press Release edited by StorageNewsletter.com on October 24, 2019 at 2:08 pmarXiv.org has published an article written by Kui Cai, Singapore University of Technology and Design, Singapore 487372, Singapore, Yeow Meng Chee, Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore 119077, Singapore, Ryan Gabrys, Spawar Systems Center, San Diego, CA 92152, USA, Han Mao Kiah, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore, and Tuan Thanh Nguyen, Singapore University of Technology and Design, Singapore 487372, Singapore.
Abstract:“ An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this paper, we investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with log n + O(log log n) redundancy bits, while the other corrects a single indel with log n + 2 redundant bits. These two encoders are order-optimal. The former encoder is the first known order-optimal encoder that corrects a single edit, while the latter encoder (that corrects a single indel) reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, we impose an additional constraint: the GC-balanced constraint and require that exactly half of the symbols of any DNA codeword to be either C or G. In particular, via a modification of Knuth’s balancing technique, we provide a linear-time map that translates binary messages into GC-balanced codewords and the resulting codebook is able to correct a single indel or a single edit. These are the first known constructions of GC-balanced codes that correct a single indel or a single edit.“











