Without a strategy for data, the methodology of getting a lot of storage space, gathering all the possible data and then setting up a team to analyse it can easily turn you into a data hoarder. Although it feels great to brag about how many Tb of data we have stored, we tend to forget that having more Tb of data will limit how fast and flexible we can be when the time comes to analyse it: quality over quantity still applies to data analysis.
Firstly, we need to go back to why we collect data. It is the first step in a process that also involves transforming that data into a format suitable for analysis followed by a methodical examination of relationships, patterns and trends to find answers to questions that are usually not immediately obvious. Collecting data is after all just a means to an end.
Also, deciding what data to track can be hard as it’s not always obvious how certain variables can be used in the future or even what the business goals will be down the line. The fact is, we are constantly expanding our analysis and enriching our datasets, so we may easily overlook the relevance of new data as it becomes available. In my experience as a consultant and CDO I have found myself in situations where I was invited to join or execute a “dream project” involving “Big Data” just to find out that the data collected was often duplicated, redundant or simply not fit for the purpose. After spending years and large sums collecting and storing this data, they realized that doing so without first having a clear use case was an expensive mistake.
Other usual mistakes that I find include collecting data that:
- Cannot be linked to the existing datasets by any universal identifier/key or used to enrich existing datasets;
- Is not needed to solve any underlying business problem or cannot be associated to any of the goals or strategic priorities;
- Has a large cost to collect, store and analyse but has no clear value proposition;
- Is collected erratically or generates datasets that are unbalanced in ways that cannot be tracked and corrected, generating biased analysis and leading to wrong conclusions
- Has no associated plan on when and how to be used, what are the desired results to be achieved and how they can be measured.
The best way to prevent falling into the “Big Data trap” is to start by defining a strategy for data, setting clear business goals and assessing the value to the business, then working backwards to find out what kind of analysis and datasets we need to best achieve these goals, pointing us towards what we need to collect and analyse both now and into the future.
After all, if the reason we need information it to make better decisions, understanding which ones are the most important and deliver the best return to our businesses will guarantee the relevance of our projects, contributions towards the overall success and ultimately improve the satisfaction of both our teams and stakeholders.