Big Data Market Segment LS
Big Data Market Segment RS
Tuesday, 18 September 2018 15:44

Out with the data lake, in with the Data Hub, says Pure

Pure Storage FlashBlade Pure Storage FlashBlade

iTWire talks to Pure Storage vice-president of strategy Matt Kixmoeller about the company's recently introduced Data Hub.

Announced earlier this month, Data Hub is Pure's approach to providing storage for unstructured, data-intensive workloads.

Until quite recently, the normal approach to storing large amounts of data was to keep the parts that are considered important and active on fast, immediately accessible storage, and to move the rest to the cheapest storage tier or discard it completely, said Kixmoeller. It wasn't quite that black and white, as there were usually more than two tiers of storage.

But "it's a time of change," he said.

Firstly, the volume of data and its value have increased. A growing proportion of data is being generated by "things" rather than as a result of processes with direct human involvement, and machine learning and related technologies are increasingly being used to analyse all that data and extract value that humans could not.

Where data warehouses and more recently data lakes were constructed to support human-driven analytics, machine-based analytics require much greater performance and scalability.

Secondly, the cost of flash storage has fallen to the point where organisations can "collapse the tiers" and store everything in flash.

That's fortunate, because at least part of the rationale of applying machine learning and so on is that human analysts can't determine which parts of the overall data set are valuable and which aren't. What's needed is to make one big pool of data available to the organisation's data scientists, then they can apply various algorithms to reveal what is useful.

Kixmoeller pointed to the example of the Memorial Sloan Kettering Cancer Centre in New York, which has been keeping cell samples from cancer biopsies on glass slides for 30 years. Millions of these slides are being digitised so they can be used to train AI systems that will assist pathologists.

Other organisations probably have 20 to 30 years worth of data about their customers and other types of business activities, and they now have an opportunity to use AI to analyse it and reveal new relationships.

Different industries are interested in different things, he said, but examples include insurance claim validation, sentiment analysis, and face recognition (eg, to automate check-in at airports).

"Everybody's got a lot of data, and everybody has a lot of ideas about how to use AI."

But that means storing the data in one place with fast access.

Hence Pure Storage's Data Hub, which is based on the company's FlashBlade hardware and is designed to deliver, share and unify data to unlock its value.

If you provide data scientists with better performance, they quickly find ways to take advantage of it, said Kixmoeller.

Data Hub is designed to provide high-throughput file and object storage regardless of the access pattern, and to handle massively parallel workloads, thanks in part to its scale-out design.

"More and more Australian organisations are exploring initiatives around AI and real-time data, but are at risk of not being able to realise their aspirations. Today’s data is largely stuck in silos and the storage industry as a whole is still too concerned with just that - storage. For organisations to get the most out of data, they need to focus less on storing it, and more on unifying and sharing it for real impact," said Pure Storage's APJ vice-president of technical services, Mark Jobbins.

"Data Hub is Pure Storage’s vision to refocus the storage industry around data sharing, helping organisations in Australia and across the globe truly put their data to work and tap the potential of AI.”

Most organisations that have implemented a data lake are now facing a refresh cycle, said Kixmoeller, which makes this "a perfect time to reconsider the storage solution".

Furthermore, "it's relatively easy to get going in AI" thanks to Pure's Airi (AI-Ready Infrastructure) products that incorporate Nvidia's DGX-1 integrated deep learning systems.

As AI researchers are community minded, much of the software needed is released under open source licences, and therefore the barrier to entry is "lower than most people think", said Kixmoeller.


26-27 February 2020 | Hilton Brisbane

Connecting the region’s leading data analytics professionals to drive and inspire your future strategy

Leading the data analytics division has never been easy, but now the challenge is on to remain ahead of the competition and reap the massive rewards as a strategic executive.

Do you want to leverage data governance as an enabler?Are you working at driving AI/ML implementation?

Want to stay abreast of data privacy and AI ethics requirements? Are you working hard to push predictive analytics to the limits?

With so much to keep on top of in such a rapidly changing technology space, collaboration is key to success. You don't need to struggle alone, network and share your struggles as well as your tips for success at CDAO Brisbane.

Discover how your peers have tackled the very same issues you face daily. Network with over 140 of your peers and hear from the leading professionals in your industry. Leverage this community of data and analytics enthusiasts to advance your strategy to the next level.

Download the Agenda to find out more


Stephen Withers

joomla visitors

Stephen Withers is one of Australia¹s most experienced IT journalists, having begun his career in the days of 8-bit 'microcomputers'. He covers the gamut from gadgets to enterprise systems. In previous lives he has been an academic, a systems programmer, an IT support manager, and an online services manager. Stephen holds an honours degree in Management Sciences and a PhD in Industrial and Business Studies.



Recent Comments