Cloudera's concept of the enterprise data hub addresses the potential of big data by providing a home for any type of data along with access mechanisms to support different workloads, Cloudera co-founder and CTO Amr Awadallah told iTWire.
Unlike traditional databases, Hadoop can store unstructured data such as images, videos and PDF files alongside structured data, he explained.
Where SQL can only express a subset of tasks that people may want to perform on these large data sets, Cloudera supports Apache Spark and other frameworks to allow a wider range of workloads. This allows organisations to leverage all of their data and ask better questions, Mr Awadallah said.
Previous ideas such as the enterprise data warehouse depended on the idea of a database that could only store structured data, but the modern environment isn't all structured. For example, organisations may want to combine data such as transactions, clickstreams and voice recordings to estimate customer sentiment, determine the probability of losing a particular customer, and decide whether to make a retention offer.
An important aspect of the enterprise data hub is support for both 'schema on write' and 'schema on read' in order to handle routine and exploratory workloads.
Schema on write (as with traditional databases) provides good performance as it is possible to lay out the data efficiently, as well as good governance.
Schema on read allows users to store any data as the system looks more like a file system than a database. It effectively performs ETL (extract, transform, load) on the fly at read time, generating the appropriate schema as part of the process. This means an additional column of data can be provided for analysis very quickly.
"You want both," he said, likening the two situations to two very different types of commercial kitchen. The kitchen at a McDonald's store is optimised to prepare the same limited range of items every day, whereas that at a high-end restaurant has a range of ingredients and equipment allowing the preparation of dozens of different dishes.
Cloudera was the first commercial Hadoop vendor, Mr Awadallah said, adding that it "is the world leader" ahead of Hortonworks, MapR, Pivotal and IBM.
The company has more experience than anyone else, he said, and the founders of the major Hadoop projects work for Cloudera.
"We use our own technology" to monitor the operation of customers' systems, so Cloudera can quickly correlate the relevant data if someone reports a problem.
Cloudera combines open source components with its proprietary technology for backup and recovery, security and auditing. Furthermore, the company certifies each of its releases interoperates with a large ecosystem of applications such as SAS and Splunk, he said.
"None of the other vendors have this breadth and depth," Mr Awadallah said.