After evaluating various technologies including Blu-ray and magnetic disk, Microsoft came to the conclusion that tape was still the way to go for Azure's archival storage tier, said Microsoft Azure CTO Mark Russinovich.
"We believe tape is the new tape", so the company builds automated tape libraries with up to 72 tape drives and 12,000 tape cartridges, plus one to two robotic arms to load the required cartridges.
The downside is the resulting high latency: the robotics have to load a cartridge into an idle drive, and then the drive has to seek the right place on the tape to start reading the data, he explained.
The problem is that the power required by so many spinning drives would normally greatly exceed the budget for a single rack. Microsoft overcame that limitation by spinning up drives only when they are actually needed. Having a just a subset of the drives online at any time keeps the rack within the power budget, Russinovich explained.
The price is higher latency. If the maximum number of drives is already active, there is an initial delay while one becomes idle and is spun down. Then the required disk must spin up before the data can be read.
"But what this gives is this intermediate price point between tapes and standard hard disk storage," he said.
Microsoft isn't resting on its laurels: it is working on two technologies that have the potential to provide archival storage at prices lower than that of tape.
Project Silica — a collaboration between Microsoft Research, Azure and the University of Southampton (UK) — aims to store data on glass.
The advantages of such a system is that once written, the data is truly permanent. Where SSDs, disks and tapes require data to be rewritten every few years or every decade to avoid bit rot, "if you can store data in glass there is no decay at all. It will literally last for the rest of the lifetime of the planet Earth", Russinovich said.
Furthermore, glass is extremely cheap as the main raw material is sand.
The stumbling block as been that it is very hard to etch data into glass without compromising the integrity of the medium. Using standard lasers to do the etching results in microscopic cracks that eventually make the data impossible to read.
Project Silica gets around this by using lasers that can produce pulses as short as a femtosecond – one quadrillionth of a second. It also encodes three bits of data into one voxel (a point in three-dimensional space), and writes the data in multiple layers inside one piece of glass.
"Another really cool characteristic of glass is you can always create a reader for it," he said. "For reading glass all you need is a light. You read the reflections coming out of it."
"But that's not the only promising technology for storing data in an archival way very efficiently, very low cost," said Russinovich.
Project Palix — a collaboration between Microsoft Research, Azure and the University of Washington (US) — aims to encode data into DNA strands.
DNA lasts for around 2000 years if stored at around 10 degrees centigrade, and can be read using existing gene sequencing technology.
The real promise is in the remarkable density: as much as one zettabyte could be stored in one rack. A zettabyte is 1000 exabytes; an exabyte is 1000 petabytes; and a petabyte is 1000 terabytes. Russinovich explained it another way: "To give you an idea how big it is, people are estimating by the year 2020 there will be about 20 zettabytes of digital data on the entire planet. So [the proposition is] one-twentieth of the planet's data stored on a single rack.
"We believe we are close to making this commercially viable in the very near future," he said. "This has huge potential... [for] providing extremely low-cost archival storage."
Russinovich was in Australia for the opening of Azure's Australia Central 1 and Australia Central 2 (Canberra) regions.