What are you looking for ?
itpresstour
RAIDON

R&D: Five Articles on DNA Data Storage

Published by Advanced Science, ACS Nano, arXiv, and iMeta

R&D: Highly Secure In Vivo DNA Data Storage Driven by Genomic Dynamics
By combining computational logic with the biological complexities of living systems, the ICBP offers a transformative strategy for secure DNA data storage

Advanced Science has published an article written by Jiaxin Xu, Department of Pulmonary and Critical Care Medicine, Post-Doctoral Scientific Research Station of Basic Medicine, Shenzhen Key Laboratory of Respiratory Disease, Shenzhen Clinical Research Center for Respiratory Disease, Shenzhen Institute of Respiratory Diseases, Shenzhen People’s Hospital, (The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, Guangdong, China, and College of Pharmacy, Jinan University, Guangzhou, Guangdong, China, Yu Wang, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, China, Haibo Zhou, College of Pharmacy, Jinan University, Guangzhou, Guangdong, China, Mingen Li, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, China, Yang Wang, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, Lingwei Wang, Department of Pulmonary and Critical Care Medicine, Post-Doctoral Scientific Research Station of Basic Medicine, Shenzhen Key Laboratory of Respiratory Disease, Shenzhen Clinical Research Center for Respiratory Disease, Shenzhen Institute of Respiratory Diseases, Shenzhen People’s Hospital, (The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, Guangdong, China, Hui Mei, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, China, Junbiao Dai, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, China, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China, Shanze Chen, Department of Pulmonary and Critical Care Medicine, Post-Doctoral Scientific Research Station of Basic Medicine, Shenzhen Key Laboratory of Respiratory Disease, Shenzhen Clinical Research Center for Respiratory Disease, Shenzhen Institute of Respiratory Diseases, Shenzhen People’s Hospital, (The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, Guangdong, China, and Xiaoluo Huang, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, China.

Abstract: DNA is a promising medium for next-generation data storage because of ultrahigh information density and stability. DNA storage within living organisms presents further advantages, such as self-replication, compactness, and concealment. Early efforts primarily developed predetermined methods for encoding and decoding data using in vivo DNA sequences. However, these methods may pose a security risk while opening a clear channel for potential data access and breaches. To address these challenges, we propose a unified paradigm, integrated computational–biological programming (ICBP), by exploiting the intrinsic digital characteristics within computational and microbial systems. ICBP involves the construction of dynamic code tables from gene regulatory networks or complete genomes across diverse species, expanding the key space by more than 100 orders of magnitude compared with existing methods. The encryption algorithm in ICBP benefits from DNA encoding, computing, and computational operations, leading to superior encryption quality and resistance to brute force and statistical attacks. Furthermore, we demonstrated the practical utility of ICBP via the successful encryption, microbial storage, and decryption of digital files within living systems, achieving 100% data recovery after 100 generations of replication. By combining computational logic with the biological complexities of living systems, the ICBP offers a transformative strategy for secure DNA data storage.“

 

R&D: High-Data-Density, High-Decoding-Speed, and High-Decoding-Accuracy DNA Data Ink for Digital Preservation
Technology establishes a foundation for practical DNA data storage solutions applicable to cultural heritage preservation, autonomous vehicle data management, and matrix-based machine learning applications

ACS Nano has published an article written byTaeseok Kang, Doyeon Lim, Department of Nano-Bioengineering, Incheon National University, 119 Academy-ro, Incheon 22012, Republic of Korea ,Wonjin Lee, Department of Intelligent Semiconductor Engineering, Incheon National University, 119 Academy-ro, Incheon 22012, Republic of Korea, Jungwoo Kim, Department of Nano-Bioengineering, Incheon National University, 119 Academy-ro, Incheon 22012, Republic of Korea, Xiaohua Huang, Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States, Jinchul Kim, Ageing Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea , and Youngjun Song, Department of Nano-Bioengineering, Incheon National University, 119 Academy-ro, Incheon 22012, Republic of Korea, Department of Intelligent Semiconductor Engineering, Incheon National University, 119 Academy-ro, Incheon 22012, Republic of Korea, and Standard Bioelectronics, Co., 511 Michuhol Tower Hall Tower Gaetbeol-ro 12, Incheon 21999, Republic of Korea.

Abstract: DNA-based storage offers exceptional information density, durability, and energy efficiency compared to conventional digital media, yet practical implementation faces challenges including high synthesis costs, sequencing errors, and slow access speeds. Here, we present an integrated DNA storage system with optimized encoding and processing strategies to address practical implementation issues. Our approach achieves 9.78 bits/nt net information density with a flexible index allocation system handling data volume from 0.37 KB to 2.79 × 1022 YB. The decoding process delivers 360× faster throughput than traditional methods, processing 4.57 million reads (1.63 GB) in 34.5 s and demonstrating perfect data retrieval from down-sampled (×5.33) sequencing in 2.47 s. Our error correction system combines inner Reed-Solomon with outer XOR code, ensuring reliable recovery with large reading sequences (92,626) and low copy number data (×0.52). The streamlined NGS preparation workflow reduces processing time from ∼4.5 to ∼2 h while decreasing per-sample costs from ∼$60 to ∼$0.50. The system demonstrates versatility through high-fidelity DNA data storage ink and implementations ranging from physical stamps to VR platforms. This technology establishes a foundation for practical DNA data storage solutions applicable to cultural heritage preservation, autonomous vehicle data management, and matrix-based machine learning applications.

 

R&D: On the Capacity of Noisy Frequency-based Channels
Authors investigate the capacity of noisy frequency-based channels, motivated by DNA data storage in the short-molecule regime, where information is encoded in the frequency of items types rather than their order

arXiv has published an article written by Yuval Gerzon, Technion, Israel, Ilan Shomorony, University of Illinois at Urbana-Champaign, USA, and Nir Weinberger, Technion, Israel

Abstract: We investigate the capacity of noisy frequency-based channels, motivated by DNA data storage in the short-molecule regime, where information is encoded in the frequency of items types rather than their order. The channel output is a histogram formed by random sampling of items, followed by noisy item identification. While the capacity of the noiseless frequency-based channel has been previously addressed, the effect of identification noise has not been fully characterized. We present a converse bound on the channel capacity that follows from stochastic degradation and the data processing inequality. We then establish an achievable bound, which is based on a Poissonization of the multinomial sampling process, and an analysis of the resulting vector Poisson channel with inter-symbol interference. This analysis refines concentration inequalities for the information density used in Feinstein bound, and explicitly characterizes an additive loss in the mutual information due to identification noise. We apply our results to a DNA storage channel in the short-molecule regime, and quantify the resulting loss in the scaling of the total number of reliably stored bits.“

 

R&D: Error-Correcting Codes for the Sum Channel
Authors introduce the sum channel, a new channel model motivated by applications in distributed storage and DNA data storage

arXiv has published an article written by Lyan Abboud, and Eitan Yaakobi, Computer Science Department,Technion – Israel Institute of Technology, Haifa, Israel.

Abstract: We introduce the sum channel, a new channel model motivated by applications in distributed storage and DNA data storage. In the error-free case, it takes as input an \ell-row binary matrix and outputs an (\ell+1)-row matrix whose first \ell rows equal the input and whose last row is their parity (sum) row. We construct a two-deletion-correcting code with redundancy 2\lceil\log_2\log_2 n\rceil + O(\ell^2) for \ell-row inputs. When \ell=2, we establish an upper bound of \lceil\log_2\log_2 n\rceil + O(1), implying that our redundancy is optimal up to a factor of 2. We also present a code correcting a single substitution with \lceil \log_2(\ell+1)\rceil redundant bits and prove that it is within one bit of optimality.

 

R&D: Fast bootstrap and reliable readout using hidden references for DNA data storage
Authors propose a fast and reliable readout framework in a bootstrap manner tailored for data storage using watermarked large DNA fragments

iMeta has published an article written by Weigang Chen, School of Microelectronics, Tianjin University, Tianjin, China, State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, China, and Frontiers Science Center for Synthetic Biology (Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin, China, Shuang Liu, Quan Guo, Rui Qin, Qi Ge, Tingting Qi, School of Microelectronics, Tianjin University, Tianjin, China, and Yingjin Yuan, State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, China, and Frontiers Science Center for Synthetic Biology (Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin, China.

Abstract: Data storage using large DNA fragments enables low-cost in vivo replication and offers a promising strategy for distributed data applications. However, data readout from massive, unordered sequencing reads requires alignment based on overlapping regions and is complicated by diverse sequencing errors, especially insertion and deletion (indel) errors. Here, we propose a fast and reliable readout framework in a bootstrap manner tailored for data storage using watermarked large DNA fragments. Our scheme transforms the de novo readout into a resequencing-like workflow through multiple-fold hidden references, substantially reducing readout complexity. The framework is compatible with sequencing platforms exhibiting diverse error profiles. For technologies with low indel rates, we employ correlation-based identification and bit-wise consensus to enable rapid decoding. For indel-prone platforms, we incorporate progressive read alignment using multiple-fold hidden references and the forward-backward algorithm to ensure robust recovery. In vivo experiments on large DNA fragments with different coding efficiencies validated the proposed framework. Error-free recovery was achieved using Illumina reads (raw error rate of ~0.2%) at a coverage of 0.6–2.5× and using nanopore reads (raw error rate ~5%) at a coverage of 1.6× or 4.3×. These results demonstrate the practicality and scalability of large-fragment DNA storage for real-world applications.

Articles_bottom
SNL Awards_2026
AIC