Home Data SAS striving to automate distributed analytics

SAS striving to automate distributed analytics

Certain workloads need to stay on premises for a variety of reasons such as regulatory restrictions on where the data is stored. In other situations, the cost advantages of public cloud may prevail. So most organisations are going to have a hybrid approach to IT. But how are they going to analyse data spread across multiple locations?

Replicating all the required data to one place is "probably not the best long-term [solution]", SAS director of product management for cloud and platform technologies Mike Frost told iTWire.

Yet the issue of data location cannot be ignored, because a person conducting an analysis may not know where all the required data lives, and so will be unable to ensure it is done efficiently and cost effectively.

SAS's approach, Frost said, is to create a distributed compute infrastructure as a foundation for such analyses. This environment should know where data is and what analysis is to be performed, and use that information to assemble resources to do the work, taking into consideration existing jobs, the priority of the new analysis, and the cost of resources.

Another part of the problem is that analytics tasks aren't very predictable in the sense of being able to determine the resources needed to perform them. In general they do need plenty of memory and fast network interconnections, so working in the cloud requires the careful selection of appropriate VM instances. SAS is starting to work with cloud providers, he said, but while getting lots of memory per core is usually straightforward, carving out a separate network doesn't fit comfortably with the public cloud model.

That said, SAS has good relationships with Amazon, Google and Oracle, and is "in contact" with Microsoft Azure.

Oracle has "shown a real willingness to work closely with us", said Frost, noting that many SAS customers use the Oracle database.

One major cloud provider "has shown great reluctance to work on this problem", while another sees it as a potential way to differentiate its offerings from those of other cloud companies, he said.

Returning to the complexity of building and automating a distributed analytics infrastructure is the fact that some tasks can be distributed, but others cannot.

For example, to calculate the mean of a set of data split across multiple locations, the "average of averages" can be used – calculate the mean of the subset of each location, and then calculate the mean of those results. That usually works, though there can be problems stemming from the precision of the various calculations, he warned.

But if you need to sort a list, the whole list needs to be accessed from one place.

What's needed is a system that can determine the best way of combining data, but that's "a humongous challenge", according to Frost. Part of the problem is that such an "environment has to be self-learning" to accommodate changes in business rules, regulations, and the questions being asked.

Data storage is an associated issue. Currently, the default position is to store everything in case it comes in useful. But growing volumes make that hard to sustain. Frost predicts that during the next business downturn people will start asking 'do we need to store all this data?' and practices will change.

One of the trends contributing significantly to data growth is the internet of things. While IoT devices generally have minimal compute capability, even a Raspberry Pi has sufficient grunt to detect impending problems by watching the data stream.

Training models to determine the rules to apply is best done using data aggregated from all the similar devices - one SAS customer is pouring 25TB of data per day into a Hadoop cluster for analysis, he said.

The results of the training process will need to be communicated with the devices, whether to add new rules or remove those that are no longer relevant, and doing this at scale is non-trivial.

One problem with AI systems is that they tend to be black boxes, and people tend to expect at least a hypothesis about the nature of the mechanism that has been discovered. For example, the data might say that a slight change of colour on a Web page can usefully change the outcome, but simply accepting that at face value seems a very shallow form of knowledge – what actually happens to change the behaviour?

The lack of any explanation by an AI may make the results unacceptable in some situations, but "I think it's going to be a generational thing" and post-millennials may be more likely to accept these results, said Frost.

SAS's goal in applying AI to analytics is to make data scientists more productive, allow non-specialists to do more before calling in a data scientist, and generally automating the discovery of potentially valuable relationships and reducing "the time to insight", he said.


Did you know: 1 in 10 mobile services in Australia use an MVNO, as more consumers are turning away from the big 3 providers?

The Australian mobile landscape is changing, and you can take advantage of it.

Any business can grow its brand (and revenue) by adding mobile services to their product range.

From telcos to supermarkets, see who’s found success and learn how they did it in the free report ‘Rise of the MVNOs’.

This free report shows you how to become a successful MVNO:

· Track recent MVNO market trends
· See who’s found success with mobile
· Find out the secret to how they did it
· Learn how to launch your own MVNO service


Stephen Withers

joomla visitors

Stephen Withers is one of Australia¹s most experienced IT journalists, having begun his career in the days of 8-bit 'microcomputers'. He covers the gamut from gadgets to enterprise systems. In previous lives he has been an academic, a systems programmer, an IT support manager, and an online services manager. Stephen holds an honours degree in Management Sciences and a PhD in Industrial and Business Studies.