A Tale Of Two Data Architectures: Data Mesh vs Data Fabric
Table of Content
Introduction
Data Fabric and Data Mesh. It's easy to get confused by the names, yet both are used to describe a concept that is similar in many ways. The two terms may sound confusing at first, but if you take a closer look at their definitions and applications, understanding the concept becomes much easier.
Data Fabric and Data Mesh are two approaches to reduce the complexity of data silos and make it easier to ensure consistency, trust, and governance across different platforms as well as within a single platform. Data Fabric is an architecture framework that offers solutions for monitoring, discovering, and accessing distributed data sources or repositories within an organization. Data Mesh aims to help enterprises integrate their internal applications by providing self-service access from a single interface.
Data Fabric and Data Mesh provide two different ways to solve the same problem. Both products aim at empowering the collaboration of business users by making it easy for them to work with data in both structured and unstructured formats, from any source. Both are part of an ever-growing category called Integrated Information Platform (IIP).
As companies begin to turn their attention towards big data initiatives, the question quickly becomes how to develop a sustainable approach that can scale as more data is added. Data Fabric and Data Mesh are two models that are aimed at improving the sharing of internal data, but both have strengths and weaknesses — what works for one organization might not work for another.
Data Fabric and Data Mesh are not interchangeable.
In the beginning, when "data" meant "business transactions," data architecture was based on transactional databases. These were typically structured as rows and columns of data. Later on, when businesses started to require more complexity in their data handling—for example, processing customer interactions in real-time—the concept of Data Warehousing was developed. This involved a separate database that would store historical versions of rows and columns of data, to provide a single source of truth for reporting.
Eventually, this led to further complexities, such as the need for more rapid updates and more real-time reporting. To accommodate this, companies started creating additional methods called an Extract Transform Load (ETL) pipelines, or nowadays ELT to focus more on loading first and transforming later.
The evolution of data architecture has been a long journey from the traditional data warehouse to the modern-day data lake. Enterprises have gradually moved from being attached to their on-premises infrastructure to embracing cloud solutions. This is what has given rise to terms such as Data fabric and Data Mesh, which are confusingly used interchangeably in the big data world. But these terms do not mean the same thing. In fact, they refer to different architectural concepts that build on each other.
So, what exactly is the difference between data fabric and data mesh?
A data fabric means that all of your organization's data is connected through a single, unified system irrespective of where it resides (on-premises or cloud). This system serves as an abstraction layer between users and underlying technologies with no limitations on where you can move your data. This abstraction layer can be used by your teams to access, analyze and manage your data in various different formats. The advantage of having a data fabric is that it can run complex queries at high speeds and integrate new technologies without any downtime.
A data mesh, on the other hand, centres around organizing your company's data into domains rather than around specific applications. Unlike a data fabric, which can handle multiple applications, a data mesh is only concerned with organizing the results of each application into domains, they are often built on microservices architecture and expose data through APIs that can be consumed by other applications.
Data Fabric is an administration of the data layer within a business's enterprise software ecosystem.
At its core, a data fabric is an administration of the data layer within a business's enterprise software ecosystem. It weaves together disparate systems, allowing them to work as one, and facilitating the movement of data between them with unprecedented speed and simplicity. By doing this, Data Fabric enables businesses to support an increasingly diverse range of use cases and workflows. These include data science, artificial intelligence (AI), machine learning (ML), business intelligence (BI), reporting, analytics, and visualization.
A data fabric is a foundation for a modern approach to data management that enables businesses to take advantage of their physical and virtual assets without compromising on integrity or security. It allows for:
Data fabrics are built, usually, on three core components:
Recommended by LinkedIn
Data Mesh is a system of data products that evolve independently, informally, and mutually.
If you're looking for a way to improve the data capabilities at your company, you might be interested in investigating a new concept called "Data Mesh." Let's look at what this new approach is, how it works, and if it might be right for your company.
Data Mesh is a system of data products that evolve independently, informally, and mutually. It's built on top of a foundation of self-service data delivery systems, like data warehouses and data lakes.
Each product in the mesh is built autonomously (agile development), and each product can be composed of other products (like an API). The mesh uses an event-driven architecture to communicate between components.
The goal of the mesh is to enable self-service data access across the whole organization while maintaining quality and governance.
Data products are the building blocks of the Data Mesh. Each data product encapsulates a domain-specific application with its own data model and business logic. For example, a customer relationship management application for a financial institution would be one data product; another would be the risk prediction engine for the same institution.
Data products are developed by cross-functional teams and are independently deployed, managed, and scaled. They capture a specific business capability or process (such as "credit card fraud detection" or "customer onboarding") and can either perform an action ("fraud detected") or produce information ("customer income level").
Data Mesh provides benefits that are difficult to obtain through other platforms. It gives you the ability to:
A key element in both concepts is "abstraction."
To build modular, modern data architectures, you’ve got to know how to abstract your data.
What does that mean? Abstraction is about setting up a barrier between different parts of a system so that there is no direct connection between those elements. In other words, a part of one system doesn’t depend on the details of another part of the system.
Why is this important? Because it means the components can be replaced or improved without having any impact on the rest of the architecture. You can make changes in one place without having to worry about making related changes somewhere else.
So what is an abstraction like in the data architecture world? Well, if we think about Data Fabric and Data Mesh, we can see a lot of abstraction involved.
Take Data Mesh, for example. Data Mesh provides a blueprint for building an enterprise-scale data architecture—but it does so in a way that allows each part to be updated and changed as needed. True to its name, Data Mesh is built up from small parts (or “microservices”), which are independent of the rest of the system (i.e., they are abstracted).
The idea is that by simplifying systems into smaller pieces—and by allowing those pieces to operate independently of each other—development teams can be more agile in their work, making updates and modifications on their local units without having to worry about how it will affect the rest of the system.
Abstraction is becoming a key element in any data architecture, but they are particularly important here. Without abstractions, you would have to understand the whole system to be able to use any part of it. As the size of the system grows, more and more information is needed to perform any operation on it. By introducing new layers of abstraction, we can hide this complexity and give users easy access to what they need.
Conclusion
To summarize, Data Fabric is a term that describes the administration of the data layer within a business's enterprise software ecosystem. It's a set of tools, policies and practices used to manage and improve the quality, accessibility and consistency of data across an organization.
Data Mesh is a system of data products that evolve independently, informally, and mutually. It's a way to organize your initiatives around domain-specific teams that have full ownership and accountability for the systems they build; it doesn't rely on centralized governance or central modelling.
These two terms are not interchangeable. Organizations can use Data Fabric on their path toward building a Data Mesh architecture because both concepts offer strategies for handling large volumes of data at scale.