Prof Yang pointed out that data sets can be huge. Astronomers may log as much as 1GB per second. The researchers produced six intermediate datasets from a particular astronomical dataset, and determined the costs of regenerating or storing them based on Amazon's published prices.
The minimum cost for one hour of observation data from the telescope and storing intermediate data for 30 days was $200; for storing no data and regenerating when needed, $1000; and for storing all intermediate data, $390.
"We could delete the intermediate datasets that were large in size but with lower generation expenses, and save the ones that were costly to generate, even though small in size," Prof Yang said.
The researchers are woking on models that will allow these decisions to be made on the fly.
The research is not only applicable to public cloud services such as Amazon. Such models also could be employed by users of internal IT services that are charged on a utility basis.