In today's digital era, data is the lifeblood of any organization. However, traditional approaches to data management, such as data warehouses and data lakes, have proven to be inadequate in meeting the evolving needs of modern enterprises. Enter data mesh, a popular new concept that promises to transform the way organizations collaborate, manage, and use data.
But what exactly is data mesh, and what are its core principles?
According to Zhamak Dehghani, the creator of data mesh, the defining characteristic of this approach is collaboration through decentralization. With data mesh, multi-disciplinary teams can publish and consume data products simultaneously, enabling an unprecedented degree of data collaboration. Organizations can federate data control down to all authorized individuals and constituencies, enhancing both the use and management of data, which benefits multiple aspects of the operation, from direct gains to compliance.
The benefits of data mesh extend beyond operational and compliance advantages. When an organization can transform itself so that data is central to its operation, it achieves the holy grail of the digital era: a data-centric enterprise that is truly agile. The data mesh concept is foundational to this imperative.
Now that we know what data mesh is and its core principles, how does it differ from other data management technologies like data warehouses, data lakes, and data fabric?
A data warehouse is more about storage, analysis, and queries than the collaborative use enabled by data mesh. It is typically visualized in terms of a large structure that exists to collate massive volumes of unfiltered data from multiple sources. As a clear benefit, a data warehouse helps structure the data for analysis and query.
A data lake, on the other hand, typically exists earlier in the process - it's a data repository in which raw data is stored, with no unified schema. All the benefits are perceived to lie ahead, and the data remains available for use as needed, particularly when converted into a given corporate schema.
Data fabric seeks to establish a contemporary architecture to connect the data and metadata inside those silos. Along the way, it enables the development of a permissions-based system to control access to a single copy, a crucial step toward ending the age-old practice of making endless copies of even sensitive data.
But should you choose just one of these technologies or combine them?
Despite the overlaps, it's always wise to be wary of a one-size-fits-all approach. Each of these technological constructs has a particular origin and delivers particular benefits, from storage to compliance. Most organizations will have different needs at different times, which will be met by different technologies. The priority is the operational need, and the collection of datasets and technologies can be deployed to meet those needs.
So how should you combine them?
A new generation of solutions draws from these technology disciplines to deliver a broader range of benefits than ever before. For example, a data collaboration platform offers a unique data architecture that makes it possible to bring the data mesh concept to life, providing a powerful and operational environment. These focus heavily on control, and the network-based architecture of data collaboration technologies ensures that the data can be readily organized into data products, just as the controls established for the governance of these data products don't get eroded or compromised through unrestricted replication.