Distributed and Non-Disruptive Master Data Management.

Tim Ward

CEO at CluedIn - Helping companies become data driven. Microsoft recommended MDM.

Published Jan 13, 2020

If there is something that CluedIn has a reputation for, it is disrupting the data market. We are not traditional in any shape or form. Now we have our eyes set on the MDM (Master Data Management) market. Traditional MDM is a typically centralised place to curate business critical data. Modern MDM systems can support different types of models, but the most common we see is the centralised approach. Makes sense. However, this misses one critical and important piece. These MDM systems are not writing back to the source systems so that the users of those source systems can also benefit from the golden record curation process. There are MDM architectures available to support this notion, but yet again, we rarely run into it in the wild. Having those users play a role in the MDM / stewarding process could yield large benefits. From version 1.3 of CluedIn, we had our Mesh API, allowing customers to write the good quality, integrated, clean data back to the source systems so everyone could benefit from the cleaning processes, governance and more.

Prompting this article, I was talking to a colleague in the data space recently and they asked me to ask our current customers what MDM system they would recommend. It was not a pretty answer.

Most customers were either moving off a platform and building their own bespoke MDM system or were just not happy with their current solution but couldn’t do a lot about it.

Recalling the discussions I had been having with our customers in the past, this is where the idea for a decentralised MDM system was born. Ask yourself this question - where should the single truth of a customer live? The MDM system or the CRM system? Where should the single truth of your products or transactions sit? The MDM system or the PIM or ERP system?

Here is the analogy to put things into perspective. Imagine if you and 5 friends all gave your data to a bank. That bank then curated it, validated it, corrected it, enriched it and stamped it as "accurate". Then you and your friends ask for the data back, and the bank says “no, but every time you need to know something just call us”. Not great right? I could argue that from a governance perspective, benefits from a centralised management would be great, but not great from an operational level.

Take this analogy and then imagine the bank answering “not a problem, where should I place this data, in what system and in what format and what extra data do you want?”. This is decentralised MDM and this is what CluedIn is bringing to the market to disrupt the traditional MDM approach.

To answer those questions around where the single truth of a customer should live, the real answer is that it should be available everywhere where you need customer data. There are some systems that would not facilitate this, but most systems today would allow you to store custom properties and have API’s to support this functionality. MDM is so much more than just a source of data. It is a place for curation, stewarding, building a perfect model of data (*questionable), forming golden records, building hierarchies of data and more.

What differentiates CluedIn from the traditional MDM systems is that CluedIn does not offer forms to edit and mutate data. Rather, that we use the source data, combined with external data to form a statistically confident view of data. This data could still be wrong, but with the full lineage of where this data came from, quality metrics that back up why CluedIn has decided that a certain piece of data is better than another, can fuel a Data Steward with everything they need, to be confident with the choices made. So then why does CluedIn not just add some forms, get it over and done with and slap an MDM sticker on the box. Because we at CluedIn believe that this will just not work - it is the reason why our original question to our customers was met with some resentment. It has not worked in the past, it will not work now, it will not work in the future.

Data manipulation is not something that can be generalised into “one UI to rule them all” - especially not for an MDM system that people will enjoy using. As we always say, please get in touch if you have an MDM system that you thoroughly enjoy using and we are happy to stand corrected.

We believe that the next MDM era is decentralised stewarding, facilitated by a centralised orchestrator. This negates the need for traditional data manipulation in a common MDM system, but rather pushing that responsibility back to the source systems that were designed to curate and work with editing data in purpose built user interfaces. The role of the centralised orchestrator is to send signals to these systems to facilitate golden records in the source systems and that platforms like CluedIn will be your “read-only” view of the Golden Records connected and unified in a centralised fashion. There are good reasons for the “read-only” stamp and that is that it would cause a big governance mess having to align form validation across different systems.

This idea also spawns from another reality, which is that the business rules that determine the state of data for the entire business, are different to the rules for the source systems. Every new customer we work with solidifies this fact. So then it brings up the complexity of “Do you need to copy all of the business rules from the different systems into a centralised orchestrator?”. For those of you who use CluedIn already, you will know that we often preach that your integrations should stay “dumb” and not have any business logic in it, hence we don’t want our business logic there. Why? Because rules change, and a centralised rules engine will never scale for the types of problems that CluedIn aims to solve i.e. huge integration complexities. So then we need a mechanism to register and monitor business rules. Think about all of the places that we register business rules. It happens in code, configuration, user interfaces, in people’s head, on napkins. In typical left to right architectures this is not as important as you are really only wanting to please downstream consumers and hence the rules you need are more facades over existing data. This doesn’t mean that we can’t think about business rules in source systems, but rather it becomes that much more scalable.

Here are some practical examples:

1: A new rule in the source system has been introduced that if a customer has not renewed their license then we mark their customer name as "Customer Name XXXXXX”, with XXXXXX indicating that this customer meets a certain business rule. CluedIn then ingests this data, but without any real indication, CluedIn would continue to think they are a customer and then deliver this data to all the downstream consumers with simply an update in name. Either the source team would need to broadcast that rule to CluedIn or CluedIn would need to ask that source team for regular business rule updates, or the final option being that CluedIn is the central orchestrator of business rules and hence those rules in the source system would need to be registered in CluedIn.

2: The global business has decided that a “customer" is deemed as someone who has a current license, paid in the invoicing system, in the CRM deal pipeline stage of “Customer” and that in the local business register is not marked as Bankrupt. But in the CRM team, they have an extra rule that says that they are not a customer if they don’t have more than 3 products purchased. Hence, the CluedIn engine would want to push back to the CRM team that a particular customer is still a customer and the CRM team would need to reject that change. This would mean that a true sync of golden source systems to consumers would not be complete. The good news is that the data lineage piece of CluedIn would show that the CRM team has not decided to ingest that recommended change.

3: We have a rule in the CluedIn system that dictates that if a Product has passed its lifetime by 3 years then we no longer need it in the system and then would instruct the PIM system source to remove a record. However the PIM team need to keep a historical archive of all products. The PIM team decide to update that record and then on the next data ingestion, CluedIn will ingest that record again because it had a modified date update. Of course in CluedIn you can have rules that reject the ingestion of certain data such as Products that have a lifetime that is older than 3 years.

Now let’s cover the non-distruptive piece of it. Introducing a new system is exciting and daunting. Those that will be users of the system are hopefully excited, but those who are impacted by it, will be less. Hence this new approach eliminates that extra concern. Introducing the new decentralised MDM concept is as good as telling the different system owners that we have something in place that will simply delivery better data to your source systems. I don’t think many people would disagree with that.

Suhair Liyakath

Development Architect - Digital Manufacturing Cloud /Edge

This is understandable as most large organisations there have almost unique master data landscape which often requires a custom solution fitting that unique need.

Distributed and Non-Disruptive Master Data Management.

Tim Ward

CEO at CluedIn - Helping companies become data driven. Microsoft recommended MDM.

More articles by Tim Ward

Insights from the community

Others also viewed

Master Data Management Market: Trends, Challenges, and Future Opportunities

What is Master Data Management (MDM) and Why It Matters

Master Data Management: Building a Single Source of Truth

Choosing the Right MDM Solution: A Critical Decision for Business Success

The Legacy of Master Data Management Initiatives

The Crucial Role of Reference and Master Data Management in Technology Enablement

The People-Process-Tech Trifecta: Unlocking the True Potential of MDM

Implementing Master Data Management (MDM)

Master Data Management (MDM): Unraveling its Core Essence

Master Data Management: Navigating the Real-World Challenges and Ensuring Sustainability

Explore topics

More articles by Tim Ward

Exploring Apache Spark for Master Data Management in CluedIn

If it isn't in Purview - it doesn't exist!

Lakes, Lakehouses, Warehouse and.....MDM?

Building an amazing experience for sports fans with data.

How we gave our team access to data that was ready for insight with CluedIn and Azure Purview?

Why is MDM quicker to implement on CluedIn, in Microsoft Azure?

Why is Master Data Management justified now, more than ever?

The marriage of Azure Purview and CluedIn

Why is a Cloud-Native Master Data Management platform important?

What is the Data Fabric?