Data Mesh + Data Fabric
A perspective on the Next-Generation Enterprise Data Platform
Data Mesh is currently one of the hottest topics in the global community of data & analytics practitioners. Many enterprises have experienced the pitfalls of central data platforms (i.e. Data Warehouses, Data Lakes, Data Lakehouses), or more precisely the bottlenecks caused by central teams managing these data platforms. While a central team of experts is suitable for accelerating the development of selected data products on a data platform at an early stage, it does not scale once the number of data products, data pipelines and requests from data consumers grows rapidly. Therefore, many corporate data & analytics leaders now aim for a more decentral operating model in which the data domains take ownership of their data assets and responsibility for sharing re-usable data products with data consumers across the enterprise. This shall increase quality and re-usability of data products as well as the speed of delivery to data consumers, thus driving higher value generation from data.
The Data Mesh concept initially proposed by Zhamak Dehghani has received more and more attention over the last years. Based on our recent survey of corporate IT managers in Germany, more than 80% are either planning to adopt the Data Mesh concept or have already implemented some elements of it. However, 3 out of 10 do not have a clear Data Mesh strategy, and more than half are still lacking data literacy, data quality or limited by legacy technology – all of these are critical challenges for implementing a Data Mesh. So what does it take to succeed?
In our recently published study “Data Mesh – Just another buzzword or the next-generation enterprise data platform?”, we took a critical look at the Data Mesh concept and outlined our perspective on the essential building blocks of the next-generation data platform. From our perspective, the Data Mesh principles focused on decentralization of data management capabilities need to be complemented by the right central orchestration capabilities to avoid ending up in a data mess - this is where Data Fabric comes into play.
Data Fabric currently receives much lower attention in the global community of data & analytics practitioners than Data Mesh, but we consider it an equally important part of a data platform architecture with decentrally managed data products and heterogeneous distributed data repositories, in which intelligent metadata management becomes even more crucial. As Gartner describes, “Data Fabric utilizes continuous analytics over existing, discoverable and inferenced metadata assets to support the design, deployment and utilization of integrated and reusable data across all environments, including hybrid and multi-cloud platforms.” We currently see many enterprises investing in a Data Catalog to better manage metadata centrally, but only few are yet applying AI to their collected metadata to continuously learn new relationships and automatically feed a knowledge graph that supports data engineers and data consumers in integrating and using data in their specific business context. However, doing so could really boost consumption and re-usability of data products in a Data Mesh.
In conclusion, the next-generation enterprise data platform is not only founded on Data Mesh, but rather Data Mesh + Data Fabric. Converging both concepts and enriching them with our own best practices, we suggest 8 key capabilities (4 decentralized and 4 centralized) for the next-generation data platform.
Decentralized capabilities:
1. Data is owned, processed, and used decentrally in domains
The company maintains data domains that are responsible for data objects in their areas. Each domain has a data architect who manages the logical data product portfolio.
2. Each domain creates and shares valuable data products across the enterprise
Data products are developed once and shared within the entire organization to be re-used by many people. They comply with data integration and quality standards and are easily usable.
3. Data products are created and operated by autonomous, interdisciplinary teams
Recommended by LinkedIn
Behind each data product is an interdisciplinary product team in the data domain. It consists of a data product owner, business process experts, data engineers and, if required, data scientists.
4. Data product teams have professional DataOps practices
Data product teams use DataOps – a set of practices that ensure efficient data operations and high-quality data products. DataOps applies agile and DevOps principles to data engineering & analytics.
Centralized capabilities:
5. All metadata about data and use cases is managed in a central data catalog
The enterprise maintains a central data catalog to create visibility of raw and processed data and its lineage. The metadata feeds a knowledge graph to learn relationships in the data and suggest relevant data products and connections to data consumers. The data catalog also forms the technical backbone for a data marketplace and a virtual data integration layer.
6. Data products can easily be found and accessed through a data marketplace
A central data marketplace makes it easier for users to find and use data products. The marketplace works similarly to an online shop and, in addition to search & browse functionality, also includes an intelligent recommendation system enabled by the knowledge graph.
7. Access to data products is mostly virtualized
To mask the technical complexity of distributed and interlinked data products from users, the central data platform has a virtual integration layer that complements traditional data integration patterns. It makes the locally managed data products easily accessible from the central data marketplace.
8. Data governance rules are centrally enforced for all data products
The organization has a central data governance office which defines data guidelines and aligns them with the data domain architects and data product owners. These guidelines are then technically enforced for all data products when deployed to production.
We see many enterprises already developing these capabilities and applying them on a small data or use case scope, but only few have really integrated them into an effective operating model and platform, and scaled them across the entire organization. It requires a strategic commitment of business and IT leaders, as well as comprehensive change management on all levels to master this transformation. In our study we outline a four phased approach to implement the next-generation enterprise data platform, infused by Data Mesh and Data Fabric. If you would like to learn more, download our full study and reach out to us.
Director Sales DACH at Qlik
2yGreat article! The separation of the 8 key key capabilities (4 decentralized and 4 centralized) for the next-generation data platform I would really like to discuss in more detail (e.g. how much "guidance" should be given from the central team to the decentralized units on best practices, tools/frameworks for Data engineering, Data quality, etc.
Partner, Head of Delivery Transformation for Consulting Solutions Germany and for Consulting EMEA (member of management teams)
2yAndreas Odenkirchen thanks for the great insights 🙏
Empowering intelligent decisions with Data & AI @PwC | Professor @JLU Gießen | Podcast DebugTheFuture
2y🎯Without data excellence, no forecasting excellence!