The data to be stored and analysed is coming from a variety of sources in both structured and semi-structured forms. Sources can be anything from sales, inventory, and financial systems to mobile platforms, social media, and the Internet of Things (IoT).
For a while, it appeared the answer to the challenge of taming the data deluge was to create a data lake. This large repository could be used to securely hold data until specific use cases could be found for it. Once this was determined, subsets could then be transferred into a data warehouse for processing and analysis.
During the past few years, cloud-based data warehouses and lakes have also increased in popularity. Organisations have been attracted by the ability they provide to scale without the need to invest in and manage additional on-premise infrastructure.
However, while cloud-based data warehouses and lakes offer significant benefits, they don’t help when it comes to reducing complexity. Other tools and capabilities must be bolted on to allow analytics tasks to be completed and data migrated from a lake to a warehouse for processing.
The rise of the platform
To overcome these challenges, organisations are now taking a different approach to data storage and management. Rather than deploying a hosted warehouse or lake, they are taking advantage of a cloud data platform.
The architecture of a cloud data platform allows it to offer some significant features. It can be a single repository that supports a range of different workloads that can include data engineering, science, applications and exchange.
The cloud data platform is secure and has high-performance and instant scalability. It is also maintenance free and much more cost effective than an equivalent on-premise infrastructure.
A platform of this type can also incorporate a range of reporting and analytics toolsets that, previously, would have had to be bolted on to a data warehouse or lake. Having these already in place removes complexity and improves performance. Also, because all tools are looking at the same stored data, the need to create copies in different locations is removed.
There are five steps that need to be taken to successfully select and deploy a cloud data platform.
They are:
1. Scope out requirements: Take the time to evaluate the types, sources and volumes of data that currently exist within your organisation. Consider what types of tools are being used and what new ones might be required in the future.
2. Plan your migration: A new cloud data platform project will begin with an assessment of how much of the existing environment will be migrated to the new one. If starting from scratch is an option, this can be the best way to get maximum value and benefits from the move.
3. Confirm end goal: It’s important to be able to measure the success of a new project, so take time at the start to confirm what a successful outcome would be. This could be a measured improvement in performance or a reduction in operating costs.
4. Select a platform: With clear criteria established, carefully evaluate the cloud data platforms on the market to determine which will be the best match for your particular requirements. Ideally the platform should be able to natively integrate structured and semi-structured data, streamline data pipelines, and dedicate resources to each workload that is undertaken.
5. Measure your ROI: A final important step is to calculate the return on investment thatshifting to a cloud data platform has achieved. Consider the impact that scalability of resources has on the final number.
By taking these steps, you can be confident that a cloud data platform will be deployed that meets user needs and delivers significant benefits to the business. You’ll have in place an infrastructure that supports today’s activity while also having the ability to seamlessly scale up as those activities grow.