Until quite recently, the normal approach to storing large amounts of data was to keep the parts that are considered important and active on fast, immediately accessible storage, and to move the rest to the cheapest storage tier or discard it completely, said Kixmoeller. It wasn't quite that black and white, as there were usually more than two tiers of storage.
But "it's a time of change," he said.
Where data warehouses and more recently data lakes were constructed to support human-driven analytics, machine-based analytics require much greater performance and scalability.
Secondly, the cost of flash storage has fallen to the point where organisations can "collapse the tiers" and store everything in flash.
That's fortunate, because at least part of the rationale of applying machine learning and so on is that human analysts can't determine which parts of the overall data set are valuable and which aren't. What's needed is to make one big pool of data available to the organisation's data scientists, then they can apply various algorithms to reveal what is useful.
Kixmoeller pointed to the example of the Memorial Sloan Kettering Cancer Centre in New York, which has been keeping cell samples from cancer biopsies on glass slides for 30 years. Millions of these slides are being digitised so they can be used to train AI systems that will assist pathologists.
Other organisations probably have 20 to 30 years worth of data about their customers and other types of business activities, and they now have an opportunity to use AI to analyse it and reveal new relationships.
Different industries are interested in different things, he said, but examples include insurance claim validation, sentiment analysis, and face recognition (eg, to automate check-in at airports).
"Everybody's got a lot of data, and everybody has a lot of ideas about how to use AI."
But that means storing the data in one place with fast access.
Hence Pure Storage's Data Hub, which is based on the company's FlashBlade hardware and is designed to deliver, share and unify data to unlock its value.
If you provide data scientists with better performance, they quickly find ways to take advantage of it, said Kixmoeller.
Data Hub is designed to provide high-throughput file and object storage regardless of the access pattern, and to handle massively parallel workloads, thanks in part to its scale-out design.
"More and more Australian organisations are exploring initiatives around AI and real-time data, but are at risk of not being able to realise their aspirations. Today’s data is largely stuck in silos and the storage industry as a whole is still too concerned with just that - storage. For organisations to get the most out of data, they need to focus less on storing it, and more on unifying and sharing it for real impact," said Pure Storage's APJ vice-president of technical services, Mark Jobbins.
"Data Hub is Pure Storage’s vision to refocus the storage industry around data sharing, helping organisations in Australia and across the globe truly put their data to work and tap the potential of AI.”
Most organisations that have implemented a data lake are now facing a refresh cycle, said Kixmoeller, which makes this "a perfect time to reconsider the storage solution".
As AI researchers are community minded, much of the software needed is released under open source licences, and therefore the barrier to entry is "lower than most people think", said Kixmoeller.