Each Pure Storage array generates between 600MB and 1GB of telemetry data per day, including behavioural data concerning workload characteristics, Pure Storage international chief technology officer Alex McMullan told iTWire.
Different types of data are directed to different streams. So temperature alerts and information about network issues flow to the help desk for immediate attention. Some issues can be fixed remotely, often before an actual fault occurs; others are brought to the customer's attention.
This type of service has led to Pure achieving an NPS (net promoter score) in the mid 80s, he said. For comparison, Macquarie Telecom claims "Australia's best" customer experience based on an NPS of 76, and the average NPS of the Australian retail industry is 15 according to the Perceptive Group.
But back to Pure's telemetry. Applying machine learning to the data also allows the company to identify issues caused by hardware or software provided by other vendors. In addition, customers can use it to predict the effect of making changes to their arrays, such as upgrading a controller.
The company is very aware that there are significant differences between workloads. Pure Storage was originally used largely in conjunction with VMware, etc, but now software such as Mongo and Cassandra is commonplace, and these workloads have very different characteristics in terms of storage use. So the models used to analyse the telemetry data keep changing — Pure's "data science team never stops", said McMullan.
To process all this data, Pure augments its on-premises infrastructure with AWS, which McMullan describes as "a great force multiplier."
Pure has more than 10PB of data stored on AWS, but "much more" is stored on premises. The company is moving even more data on-premises in order to take advantage of its own FlashBlade hardware to improve analytics performance.
Looking at AI more generally, McMullan sees it as "an undisciplined, unregulated space." What regulations there are vary significantly in different jurisdictions, there's no agreement on how accurate a model needs to be (see, for example, recent concerns over the accuracy of face recognition used by the police in the UK), and the 'black box' nature of most models leaves people wondering whether any conscious or unconscious bias has gone into their development.
McMullan suggests that if the international community can agree on air traffic lanes, it should be able to come up with overarching guidelines for AI.
He's not suggesting that all applications should be regarded in the same way. But there will be a high level of reliance on some AIs (eg, autonomous vehicles), so lots of ongoing checks are reasonable, especially when a given set of inputs do not necessarily lead to the same output.
It's important to realise that the computer isn't always right, he suggested.
Another issue that needs attention is data ownership (does healthcare and vehicle data belong to the individual or the owner, or to the manufacturer or a third-party provider?), he said.
That raises some interesting issues. Should a hospital be allowed to train an AI using patients' data without their explicit consent? Is that consent meaningful if it was granted as part of 'take it or leave it' terms and conditions, eg no consent means no treatment. Should future patients only benefit from their predecessors' contribution to the development of AI-assisted diagnosis and treatment if they in turn allow their data to be used in that tool's ongoing development and training?