From Courtney Chandler – UConn Health, University of Connecticut

University of Connecticut (UConn) researchers deploy CRISPR technology to make it easier to search for data stored in DNA – an emerging medium for storage.

DNA is emerging medium for storage
(Photo : Changchun Liu)

The digital age has led to the explosive growth of data of all kinds. Traditional methods for storing data – such as HDDs – are beginning to face challenges due to limited storage capacities. With the growing demand for storage on the rise, alternate mediums of data storage are becoming increasingly popular – and necessary.

DNA is one of the emerging solutions to store data due to its physical density, data longevity, and data encryption ability. Any information that can be stored in a HDD – such as texts, images, sounds, and movies – can also be converted into DNA sequences.

But while DNA is a promising solution to help meet the demand of data storage needs, performing a search within a strand of DNA can be cumbersome and difficult.

“Archiving information in synthetic DNA has emerged as an attractive solution to deal with the exploding growth of data in the modern world. However, quantitatively querying the data stored in DNA is still a challenge,” says Changchun Liu, professor, department of biomedical engineering, UConn Health.

In Nature Communications, Liu and a team of researchers discovered a way to simply and effectively search for data stored in DNA using a clustered regularly interspaced short palindromic repeats (CRISPR) powered quantitative search engine.

In the paper, Liu introduces Search Enabled by Enzymatic Keyword Recognition (SEEKER), which utilizes CRISPR-Cas12a to quantitatively identify the keyword in files stored in DNA.

“DNA is a promising medium for data storage because of its stability and high information density. Theoretically, one gram of DNA can store 215PB of data, the data size of about 100 million movies. Like a HDD which stores information in binary data bits, DNA stores information in sequences of four nucleobases – adenine (A), thymine (T), cytosine (C) and guanine (G). The developments in DNA synthesis technology and next-gen sequencing are turning DNA data storage into reality,” explains Jiongyu Zhang, graduate student, Liu’s lab and first author of the paper.

Liu utilized his expertise in CRISPR technology to help come up with a better solution to search within a strand of DNA.

CRISPR is an acquired immune mechanism that can identify a specific infectious DNA sequence in a cell overwhelmed with interfering genes, similar to a keyword search in a database.

SEEKER, utilizing CRISPR, rapidly generates visible fluorescence, or light, when a DNA target corresponding to the keyword of interest is present. It is able to successfully perform quantitative text searching since the growth rate of the fluorescence intensity is proportional to the keyword frequency.

In the paper, the researchers successfully identified keywords in 40 files with a background of approximately 8,000 irrelevant terms.

“Overall, the SEEKER provides a quantitative approach to conducting parallel searching-including metadata search – over the complete content stored in DNA with simple implementation and rapid result gen,” explains Liu.

Article: CRISPR-powered quantitative keyword search engine in DNA data storage

Nature Communications has published an article written by Jiongyu Zhang, Chengyu Hou, Department of Biomedical Engineering, University of Connecticut Health Center, Farmington, CT, 06030, USA, and Department of Biomedical Engineering, University of Connecticut, Storrs, CT, 06269, USA, and Changchun Liu, Department of Biomedical Engineering, University of Connecticut Health Center, Farmington, CT, 06030, USA.

Abstract: “Despite the growing interest of archiving information in synthetic DNA to confront data explosion, quantitatively querying the data stored in DNA is still a challenge. Herein, we present Search Enabled by Enzymatic Keyword Recognition (SEEKER), which utilizes CRISPR-Cas12a to rapidly generate visible fluorescence when a DNA target corresponding to the keyword of interest is present. SEEKER achieves quantitative text searching since the growth rate of fluorescence intensity is proportional to keyword frequency. Compatible with SEEKER, we develop non-collision grouping coding, which reduces the size of dictionary and enables lossless compression without disrupting the original order of texts. Using four queries, we correctly identify keywords in 40 files with a background of ~8000 irrelevant terms. Parallel searching with SEEKER can be performed on a 3D-printed microfluidic chip. Overall, SEEKER provides a quantitative approach to conducting parallel searching over the complete content stored in DNA with simple implementation and rapid result generation.“