R&D: Four Articles on DNA Data Storage Technologies
Evolved DNAzymes and stable activation chemistry enable high-efficiency DNA ligation; Quantitative assessment of randomized DNA base sequences using multi-model physical analysis for high-fidelity data storage; Approaching single-molecule assembly-free readout from medium-length encoded DNA; From molecules to megabytes, comprehensive review of DNA-based data storage systems
This is a Press Release edited by StorageNewsletter.com on December 24, 2025 at 2:00 pmR&D: Evolved DNAzymes and Stable Activation Chemistry Enable High-Efficiency DNA Ligation
Advances improve the practicality of DNAzymes for ligation-based applications and broaden their potential use in DNA data storage.
ChemEurJ has published an article written by Connor Nurmi, Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada, and Biointerfaces Institute, McMaster University, Hamilton, Ontario, Canada, Gemma Mendonsa, Mengdi Bao, Seagate Research Group, Seagate Technology, Shakopee, USA , and Yingfu Li, Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada.
Abstract: “DNA ligation is a fundamental reaction used in a variety of applications, typically performed by T4 DNA ligase. For certain applications, such as DNA data storage, DNA-ligating DNAzymes offer a more stable and cost-effective alternative to protein enzymes. The E47 DNAzyme is among the most efficient DNA-ligating DNAzymes reported, but it still lacks sufficient activity for widespread adoption and requires a highly unstable phosphoimidazole-activated DNA substrate to function. In this study, we performed in vitro selection using a pre-structured library based on E47 and identified sequences with more than a twofold increase in ligation activity, representing the fastest DNA-ligating DNAzymes reported to date. We also screened alternative imidazolide compounds for substrate activation and found that phosphobenzimidazole-activated DNA substrates are significantly more stable, remaining intact for at least 24 h at room temperature. These advances improve the practicality of DNAzymes for ligation-based applications and broaden their potential use in DNA data storage.“
R&D: Quantitative Assessment of Randomized DNA Base Sequences Using Multi-Model Physical Analysis for High-Fidelity Data Storage
Multi-model assessment establishes a robust strategy for designing DNA sequences with superior stability, reliability, and scalability for future molecular data storage systems.
Advanced Science has published an article written by Seongjun Seo, Thi Hong Nhung Vu, Anshula Tandon, Suyoun Park, Thi Bich Ngoc Nguyen, Department of Physics, Institute of Basic Science, and Sungkyunkwan Advanced Institute of Nanotechnology (SAINT), Sungkyunkwan University, Suwon, 16419 Republic of Korea, Shinsuke Kawai, Faculty of Science, Yamagata University, Yamagata, 990-8560 Japan, and Sung Ha Park, Department of Physics, Institute of Basic Science, and Sungkyunkwan Advanced Institute of Nanotechnology (SAINT), Sungkyunkwan University, Suwon, 16419 Republic of Korea.
Abstract: “DNA is emerging as a promising medium for ultra-dense, long-term digital data storage, yet sequence design remains hindered by homopolymer formation and compositional bias, which compromise synthesis, sequencing, and decoding accuracy. Here, the study introduces a quantitative framework to evaluate and optimize randomized DNA base sequence design rules using three physics-inspired models: translational and rotational active particle trajectories, the inverse Ising model, and a 3-input 1-output logic algorithm system. Encoding schemes with varying homopolymer constraints are systematically applied to binary image data. Rigorous analysis reveals that stringent randomization rules markedly reduce homopolymer length, balance GC content, and enhance sequence randomness. Experimental validation via polymerase chain reaction (PCR) amplification and Sanger sequencing confirms high decoding fidelity (95–98%). This multi-model assessment establishes a robust strategy for designing DNA sequences with superior stability, reliability, and scalability for future molecular data storage systems.“
R&D: Approaching Single-molecule Assembly-free Readout from Medium-length Encoded DNA
Method enables error-free recovery in near single-molecule scenarios, highlighting the potential of PNC-LDPC encoded medium-length DNA for data storage applications.
Nature Communications has published an article written by Weigang Chen, School of Microelectronics, Tianjin University, Tianjin, China, State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, China, and Frontiers Science Center for Synthetic Biology (Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin, China, Rui Qin, Quan Guo, Jian Guo, Qi Ge, School of Microelectronics, Tianjin University, Tianjin, China, Yingjin Yuan, State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, China, and Frontiers Science Center for Synthetic Biology (Ministry of Education), School of Synthetic Biology and Biomanufacturing, Tianjin University, Tianjin, China.
Abstract: “For DNA data storage, nanopore sequencing can facilitate rapid readout but suffers from severe insertion/deletion errors, which are quite computationally expensive to correct. Here, we propose a nearly single-molecule and assembly-free readout scheme for medium-length pseudo-noise piloting DNA fragments. Specifically, we devise medium-length DNA fragments using low-density parity-check codes companioned by pseudo-noise sequence (PNC-LDPC). A single cleavage on this encoded DNA by transposase generates DNA fragments of approximately full length. Using the readout-aware pseudo-noise sequences, noisy nanopore reads with arbitrary start points are directly located, and base insertions/deletions are corrected, enabling fast and reliable recovery even at very low coverages. Experimental results indicate that the data can be reliably recovered at a coverage of 1.24–3.15× with a typical nanopore sequencing error rate of 1.83%. This method enables error-free recovery in near single-molecule scenarios, highlighting the potential of PNC-LDPC encoded medium-length DNA for data storage applications.“
R&D: From Molecules to Megabytes, Comprehensive Review of DNA-based Data Storage Systems
Work discusses the potential transformative power of DNA as a feasible medium for data storage, considering technical and economic feasibility in the light of enabling a tectonic shift in the paradigm of data storage in response to the immense, growing needs of data.
AIP Conference Proceedings has published an article written by Suresh Kaswan, R. S. Krishna, Aman Raj, Department of Computer Science and Engineering, Chandigarh University, Mohali, India-140413, Rabina Bagga, Chandigarh group of Colleges, Jhanjeri, Mohali-140307, Punjab, India, and Department of Computer Science & Engineering, Chandigarh College of Engineering, and Lisha Yugal, Department of Computer Science and Engineering, Sharda University, Greater Noida, India-201306 .
Abstract: “The rapid production of data by the digital age has led to the development of new ways of storage; among these, DNA data storage has proven to be one of the most promising. This paper gives an advanced review of the methodologies and frameworks that are used in DNA data storage and the potentials it can offer in terms of storage density, durability, and stability. We review different encoding schemes, synthesis techniques, and sequencing methods that enable the translation of binary data from and into nucleotide sequences and the reverse process. Much attention is also paid to the challenges associated with error correction, retrieval speed, and cost-effectiveness. In this paper, we have considered all DNA-based data storage systems available so far and re-compare them in terms of progress in the automation and integration that can lead them towards practical large-scale realization. This work discusses the potential transformative power of DNA as a feasible medium for data storage, considering technical and economic feasibility in the light of enabling a tectonic shift in the paradigm of data storage in response to the immense, growing needs of data.“












