Copying Nature’s Blueprint for Data Storage

There is something fascinating about the very small and the very large; from the size of a proton to the number of stars in the universe. As a biologist, I used to be fascinated by my measurements of the tiny currents that flow across the walls of our nerve cells, as an IT professional I’m amazed by the sheer volume of data we are generating. The Internet of Things (IoT) is going to result in the creation of vast volumes and Big Data suggests we need to keep all of it.  The way we store data is going to have to change, there will come a point when our storage requirements exceed our physical ability to hold it. We need reliable ways of storing (really) large volumes of data on (really) tiny footprints.


There is a line of research that might provide an answer, and as someone whose career has spanned both physiology and data storage it is of particular interest to me – and that line of research involves storing data on DNA – nature’s data storage device.


DNA is incredibly robust and amazingly space efficient. It has been extracted and sequenced from woolly mammoths frozen in ice for tens of thousands of years and a cubic millimetre of DNA can hold a staggering 5.5 petabits of information.  Compare that to our storage devices – 3.5” spinning disks with a reliable lifespan of between 1 and 10 years and a capacity of 8 terabytes? We are getting better, but we aren’t in the same ballpark.


Early tests on storing data on DNA have proved successful. Building on work performed in 2012 by George Church at Harvard, scientists from the European Bioinformatics Institute in the UK the following year coded into DNA: all 154 of Shakespeare’s sonnets, 26 seconds of Martin Luther King’s “I have a dream speech”, Watson and Crick’s seminal paper on the structure of DNA, a photo of the Institute and a file of how the data was converted. They used an error correcting coding mechanism and recovery tests showed that the data could be reliably reproduced with an accuracy of between 99% and 100%.

Storing data on DNA is clearly possible, and recent research has by ETH Zurich has created a mechanism to preserve the integrity of DNA for long periods of time – in fact in optimal conditions data stored on DNA could last for over a million years. DNA data storage, could therefore, be the answer to a lot of impending problems. There are however a couple of issues – reading and writing is expensive, way too expensive to be considered commercially and the performance isn’t great either.  But these are frequent challenges to the development any new technology.


As the need to store more data increases – and the need is increasing – the commercial incentive to improve the reading/writing speed and efficiency will increase as well. It is unlikely that DNA will be ever used for primary storage (it just won’t ever be fast enough), but it holds incredible promise for ‘apocalypse proof’ archive storage stores which hold huge volumes of data in tiny spaces.