Data storage in a drop of DNA

Wednesday, 08 March, 2017

US researchers have come up with a novel method of storing the world’s ever-increasing amount of data, turning to a storage technology that humans would quite literally not be able to live without — DNA.

The concept is not an entirely new one, with researchers at the European Bioinformatics Institute (EMBL-EBI) demonstrating in 2012–13 the storage of 739 KB of data in DNA. And according to the authors of the current study, published in the journal Science, DNA has all the characteristics to make it an ideal storage medium:

It is ultracompact — about one million times more so than regular digital media.
It comes in a liquid state, so it is not bound by the physical limitations of other storage mediums.
It can last for hundreds of thousands of years if kept in a cool, dry place, as demonstrated by the recent recovery of DNA from the bones of a 430,000-year-old human ancestor found in a cave in Spain.

“DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete — if it does, we have bigger problems,” said study co-author Yaniv Erlich, from Columbia University and the New York Genome Center (NYGC).

Erlich and his colleague Dina Zielinski, an associate scientist at NYGC, chose six files to encode into DNA: a full computer operating system, the 1895 French film Arrival of a train at La Ciotat, a $50 Amazon gift card, a computer virus, a Pioneer plaque and a 1948 study by information theorist Claude Shannon. They compressed the files into a master file, and then split the data into short strings of binary code made up of ones and zeros.

Using their own customised version of an erasure-correcting algorithm called fountain codes — originally designed for streaming video on a smartphone — the researchers randomly packaged the strings into so-called droplets, and mapped the ones and zeros in each droplet to the four nucleotide bases in DNA: A, G, C and T. The algorithm deleted letter combinations known to create errors and added a barcode to each droplet to help reassemble the files later.

The scientists generated a digital list of 72,000 DNA strands, each 200 bases long, and sent it in a text file to DNA synthesis start-up Twist Bioscience, which specialises in turning digital data into biological data. Two weeks later, they received a vial holding a speck of DNA molecules.

To retrieve their files, the researchers used sequencing technology to read the DNA strands, followed by software to translate the genetic code back into binary. They recovered their files with no errors. They also demonstrated that a virtually unlimited number of copies of the files could be created with their coding technique by multiplying their DNA sample through polymerase chain reaction (PCR) and that those copies, and even copies of their copies, could be recovered error-free.

The capacity of DNA data storage is around 1.8 binary digits per nucleotide base, accounting for the biological constraints of the material as well as the need to include redundant information for reassembly. By applying their version of fountain codes, called DNA Fountain, the researchers ensured the reading and writing process was as efficient as possible. They succeeded in packing an average of 1.6 bits into each base nucleotide — at least 60% more data than previously published methods, and close to the 1.8-bit limit.

The downside of the study was that cost remained a barrier: the researchers spent $7000 to synthesise the DNA they used to archive their 2 MB of data and another $2000 to read it. The price of DNA synthesis may be reduced, however, if lower-quality molecules are produced and coding strategies like DNA Fountain are used to fix molecular errors.

Ultimately, the researchers showed that their coding strategy packs a whopping 215 PB of data on a single gram of DNA — 100 times more than the method published by EMBL-EBI. According to Erlich, this makes it the highest density data-storage device ever created.

Image courtesy of Caroline Davis2010 under CC BY 2.0