Centralised data platforms present various problems, says Colls. They can be bottlenecks, which limits innovation – a criticism that has previously been applied to other aspects of centralised IT.
And data scientists don't necessarily understand business requirements, while people working on the business side tend to lack data skills. This latter point helps explain why siloed projects tend to fail.
But data mesh recognises and meets the need of data consumers by decentralising ownership to those close to the production or use of the specific data set, with a governance layer to take care of issues including lineage, discovery, access, change management, and the reuse of patterns and resources.
In this way, data mesh improves speed and quality, Colls says.
Importantly, adopting data mesh does not require a big bang approach. "It can be done iteratively in small steps," he says.
You can carve off an individual use case and use that to prove the idea is relevant and can deliver business value.
So an organisation might start with product and customer data, plus relevant telemetry and third-party data. Once they have been exposed in consistent formats, they can be built into downstream applications.
For example, a retailer might make POS data available more widely, including to live analytics. Or geospatial data could be exposed in realtime and in various formats giving users a choice between freshness and consistency.
Such projects are mostly about "encapsulating existing data with a consistent way of accessing it," and making it easy to access through "a series of mini data warehouses," although "the data mesh concept of a 'data product' provides different and additional capabilities over a data warehouse."
Traditional data warehouses tend to be modelled with a particular view in mind, but data mesh's decentralised approach mean these mini data warehouses can support different requirements. For instance, security and marketing departments have very different views of web traffic data.
This approach is beginning to be reflected in conventional data stores from a variety of major vendors, where transaction processing and analytics applications are combined.
Discoverability – "knowing what data you have" – is an important consideration.
According to Colls, there is lots of 'dark data' in many organisations, ie, data that is not discoverable or easy to access.
This can lead to duplicated effort or squandered resources, but cataloguing and maintaining data becomes an overwhelming task if done centrally.
The data mesh model accommodates a network of data stewards responsible for cataloguing and maintaining particular datasets and the associated metadata, and explaining what it can be used for.
This is as much an organisational problem as a technical problem, he explains, as the organisation and the individual stewards need to see the value of good custodianship.
There's a parallel with the DevOps mindset, Colls suggests. That can be seen as cooperation to make data available more quickly and reliably.
Governance is necessary, but it is important to balance the freedom to work according to the specific circumstances with the need to ensure the community works as an ecosystem, says Colls.
So it's necessary to define the key interfaces in terms of the data <I>and</I> the governance controls, and then do the implementation appropriately.
Service levels must be taken into consideration, as they need to reflect the needs of data consumers in other parts of the organisation. Architectural forums and fitness functions can play a part, but evidence-based decisions are essential. It is important that teams are evaluated not just in terms of the data functions they deliver but also the achievement of agreed service levels.
ThoughtWorks has completed data mesh projects that were built on blob storage, relational databases and streaming services, both on-premises and in the cloud.
There can be challenges around resource limits. Colls warns, especially as billing models may limit the elasticity of the underlying services.
And while there are efficiencies in reusing data resources, it is important to leave each originating team with the flexibility they need when working within their own domain.
ThoughtWorks' engagements with clients has led to the development of a shared set of data mesh principles and architectures. The company now wants to engage with others to develop standards and tooling to help broader adoption.
This has the potential to increase speed when working with data, and improve the quality of outputs, he says.
"We're trying to avoid this being a proprietary thing."
Industries adopting or intending to adopt data mesh include retail, financial services, health, consumer products and media.
"As each industry becomes more digitised, there's more data to work with," says Colls.