Home Data SAS striving to automate distributed analytics

Certain workloads need to stay on premises for a variety of reasons such as regulatory restrictions on where the data is stored. In other situations, the cost advantages of public cloud may prevail. So most organisations are going to have a hybrid approach to IT. But how are they going to analyse data spread across multiple locations?

Replicating all the required data to one place is "probably not the best long-term [solution]", SAS director of product management for cloud and platform technologies Mike Frost told iTWire.

Yet the issue of data location cannot be ignored, because a person conducting an analysis may not know where all the required data lives, and so will be unable to ensure it is done efficiently and cost effectively.

SAS's approach, Frost said, is to create a distributed compute infrastructure as a foundation for such analyses. This environment should know where data is and what analysis is to be performed, and use that information to assemble resources to do the work, taking into consideration existing jobs, the priority of the new analysis, and the cost of resources.

Another part of the problem is that analytics tasks aren't very predictable in the sense of being able to determine the resources needed to perform them. In general they do need plenty of memory and fast network interconnections, so working in the cloud requires the careful selection of appropriate VM instances. SAS is starting to work with cloud providers, he said, but while getting lots of memory per core is usually straightforward, carving out a separate network doesn't fit comfortably with the public cloud model.

That said, SAS has good relationships with Amazon, Google and Oracle, and is "in contact" with Microsoft Azure.

Oracle has "shown a real willingness to work closely with us", said Frost, noting that many SAS customers use the Oracle database.

One major cloud provider "has shown great reluctance to work on this problem", while another sees it as a potential way to differentiate its offerings from those of other cloud companies, he said.

Returning to the complexity of building and automating a distributed analytics infrastructure is the fact that some tasks can be distributed, but others cannot.

For example, to calculate the mean of a set of data split across multiple locations, the "average of averages" can be used – calculate the mean of the subset of each location, and then calculate the mean of those results. That usually works, though there can be problems stemming from the precision of the various calculations, he warned.

But if you need to sort a list, the whole list needs to be accessed from one place.

What's needed is a system that can determine the best way of combining data, but that's "a humongous challenge", according to Frost. Part of the problem is that such an "environment has to be self-learning" to accommodate changes in business rules, regulations, and the questions being asked.

Data storage is an associated issue. Currently, the default position is to store everything in case it comes in useful. But growing volumes make that hard to sustain. Frost predicts that during the next business downturn people will start asking 'do we need to store all this data?' and practices will change.

One of the trends contributing significantly to data growth is the internet of things. While IoT devices generally have minimal compute capability, even a Raspberry Pi has sufficient grunt to detect impending problems by watching the data stream.

Training models to determine the rules to apply is best done using data aggregated from all the similar devices - one SAS customer is pouring 25TB of data per day into a Hadoop cluster for analysis, he said.

The results of the training process will need to be communicated with the devices, whether to add new rules or remove those that are no longer relevant, and doing this at scale is non-trivial.

One problem with AI systems is that they tend to be black boxes, and people tend to expect at least a hypothesis about the nature of the mechanism that has been discovered. For example, the data might say that a slight change of colour on a Web page can usefully change the outcome, but simply accepting that at face value seems a very shallow form of knowledge – what actually happens to change the behaviour?

The lack of any explanation by an AI may make the results unacceptable in some situations, but "I think it's going to be a generational thing" and post-millennials may be more likely to accept these results, said Frost.

SAS's goal in applying AI to analytics is to make data scientists more productive, allow non-specialists to do more before calling in a data scientist, and generally automating the discovery of potentially valuable relationships and reducing "the time to insight", he said.

LEARN HOW TO REDUCE YOUR RISK OF A CYBER ATTACK

Australia is a cyber espionage hot spot.

As we automate, script and move to the cloud, more and more businesses are reliant on infrastructure that has the high potential to be exposed to risk.

It only takes one awry email to expose an accounts’ payable process, and for cyber attackers to cost a business thousands of dollars.

In the free white paper ‘6 Steps to Improve your Business Cyber Security’ you’ll learn some simple steps you should be taking to prevent devastating and malicious cyber attacks from destroying your business.

Cyber security can no longer be ignored, in this white paper you’ll learn:

· How does business security get breached?
· What can it cost to get it wrong?
· 6 actionable tips

DOWNLOAD NOW!

10 SIMPLE TIPS TO PROTECT YOUR ORGANISATION FROM RANSOMWARE

Ransomware attacks on businesses and institutions are now the most common type of malware breach, accounting for 39% of all IT security incidents, and they are still growing.

Criminal ransomware revenues are projected to reach $11.5B by 2019.

With a few simple policies and procedures, plus some cutting-edge endpoint countermeasures, you can effectively protect your business from the ransomware menace.

DOWNLOAD NOW!

Stephen Withers

joomla visitors

Stephen Withers is one of Australia¹s most experienced IT journalists, having begun his career in the days of 8-bit 'microcomputers'. He covers the gamut from gadgets to enterprise systems. In previous lives he has been an academic, a systems programmer, an IT support manager, and an online services manager. Stephen holds an honours degree in Management Sciences and a PhD in Industrial and Business Studies.

 

Popular News

 

Telecommunications

 

Sponsored News

 

 

 

 

Connect