Part 4 – Data Governance and Quality—The Foundation of AI Success

Part 4 – Data Governance and Quality—The Foundation of AI Success

Now that we’ve spent a few weeks talking about building out your infrastructure or exploring deployment to the cloud - let’s move on to the part where we at DAI Group really start to get excited!  The subject of Data Governance and Quality is near and dear to our hearts, as we see a lot of clients really struggle in this area. Never fear - we can help!

Our approach is based on years of experience in this field. To make these projects successful it is critical to focus on the business challenges and have a business/customer-centric viewpoint. It is also vital to collaborate closely with the data and IT functions to define lean, clean and efficient processes.  

Without proper data governance and data quality your analytics and AI projects will

  • Use incorrect or irrelevant data,
  • Make predictions using low quality inputs.

Just imagine you have to navigate in an unknown city with 200 year-old maps (wrong data) and you don’t have a GPS satellite signal (low quality data). Low-quality or irrelevant data could result in you being unable to find your way!  In the same vein, would you want to bet your business and your career on this type of data?

In the following text, we’ll  explore the critical role of data governance and quality in AI initiatives. We’ll discuss strategies to implement effective data governance frameworks and enhance data quality, ensuring your AI models deliver accurate and trustworthy results.

The Importance of Data Governance in AI

Let’s start with the official definition: data governance refers to the overall management of data availability, usability, integrity, and security within an organization. It encompasses the policies, procedures, and standards that ensure data is managed effectively throughout its lifecycle.

On a practical level data governance just tells you what data you can use for what purpose. For example, you are a bank and you store notes of the communication between a relationship manager and the clients, and you can use it to conduct business on a regular basis.  It is usually not allowed to do sentiment analysis on this information (to mathematically analyze the mood and intentions of the client) without additional opt-in from the banking client. Regulations make things really complicated for a data scientist: there is the right data, quality is high and still, legally it is not allowed to be used. Or you could buy some data… but are you allowed to use it? The temptation is big and enterprises must have the right processes in place that prevent usage of the data for AI and analytical purposes if law forbids their use.  This is an extremely important topic in Europe, very much in the forefront in the DACH region and an absolute central topic in Switzerland!

In order to adhere to all these regulations, picture now a process:  The data scientist must be able to navigate something with 300 checkpoints just to be able to issue that one little SELECT… statement - well, you know what it might be like in some very big and complex organizations! Our aim at DAI Group is to enable the data and IT function of our clients to impress their business sponsors and make these processes as lightweight as possible. The more success the data and IT organizations are delivering - the bigger our satisfaction is as consultants!

As a summary - why Data Governance Matters for AI

  • Regulatory Compliance: Proper governance ensures adherence to laws like GDPR, CCPA, and industry-specific regulations. Special care must be taken when managing PII (Personally identifiable information). Medical data is even on the next level!
  • Risk Management: Reduces the risk of data breaches and misuse.
  • Operational Efficiency: Streamlines data processes, reducing redundancy and inefficiencies.
  • Strategic Decision-Making: Provides a solid foundation for data-driven decisions based on the right data.

Key Components of Data Governance

0. Common Business Data Model

  • Business-oriented, high-level data model: In course of the many projects we have delivered, we developed our own methodology to deliver data governance. Whenever possible we focus on creating a common language for the business by building a data model. This is usually closer to a mind map that to an ER-diagram and serves the purpose to be able to pick up and contextualize business concepts properly. This helps to separate overlapping concepts and find the right owners. There is much more to this topic, it has been crucial in helping us to resolve conflicts and move forward in super challenging situations in our projects.
  • Data Catalog: the concepts, objects and their connections are documented in a business understandable way.

It is critical that although the above deliverables are inspired by IT projects, they are business focused - such as the whole data governance and data quality exercise!

1. Data Policies and Standards

  • Data Policies: High-level directives that outline the organization’s approach to data management. Policies can be of general nature or can focus on certain elements of the data model.
  • Data Standards: Specific rules and definitions that ensure data is consistent and usable. Again this must be business focused by translating the business requirements into IT/data needs. The data must make sense to the business and the rules when the data actually make sense must be understandable and actionable by computers.

2. Roles and Responsibilities around data

  • 4-roles to rule them all: in our experience all roles around data can boil down to one of four archetypes: Data Owner, Data User, Data Custody and Data Steward. Different methodologies apply different definitions. Still proper separation of ownership, usage and storing/analyzing the data must be given. Consider that the roles are similar to owning a house: the roles can overlap but it is not necessarily the case. The owner, the tenant, the plumber and the janitor can be the very same person - but it is not a must.
  • Data Owner: This is simply the most important role. This is the person who is interested in keeping the data clean, up-to-date and protected. This does not mean that they are doing all that - they are just the person who can make things happen. Usually they play this critical role of defining the requirements. If nothing else is defined you should aim to define the data owner as an absolute minimum!

3. Data Quality Management

  • The classical aspects of data quality involve accuracy, consistency, completeness and timeliness of the data. In practice it is super hard to know what data is “right”. In fact nothing else can decide between right and wrong but the fact if the business can use the data or not.
  • The relation to data model: it requires a lot of work and communication to create awareness for overlapping concepts that business uses in ambiguous ways. Typical examples are addresses, phone numbers, contracts, in multilingual countries like Switzerland, the language of contact etc. Here the data model helps us to put things into the right context.
  • DAI Group’s multi-layered model: in the practice we apply our 4-layer model to probe and increase data quality:

4. Data Security and Privacy

  • Access Controls: Defining who can access or modify data. Usually different for subsets of data. Also defines the infrastructure and the architecture data can be reached (VPNs, network security layers, jump hosts etc.)
  • Encryption: Protecting data at rest and in transit.
  • Anonymization: Removing personally identifiable information when necessary.

5. Compliance and Legal Considerations

  • Regulatory Requirements: Adhering to laws and directives governing data usage and protection. In our practice usually directives on the client side tend to be much more restrictive than the actual legal environment. 
  • Audit Trails, recorded sessions etc.: Maintaining records of data usage and changes for accountability and audit purposes.

Implementing an Effective Data Governance Framework

An effective data governance framework builds on multiple key components like processes/team/standards and data quality KPIs. We believe that the framework needs to be customized to the clients needs to the extreme!

In our experience most of the organizations prefer a process-oriented approach. In these cases DAI Group usually starts with (a data model 🙂) a standard process map that is operated by a standard set of roles - and we go into heavy customization.

One of the central questions we usually face is: where within IT or Operations does the data ownership function sit? In our view, usually, data ownership is a business function, as is data stewardship.

Enhancing Data Quality for Reliable AI Outputs

High-quality data is essential for AI models to function correctly. Most of the data quality (DQ) projects focus on repairing an existing problem instead of preempting the creation of the problems. In DAI Group’s Cloud Based Data Hub approach we enable organizations to take exactly that approach!

In our experience you can measure and correct data problems on an ongoing basis - if you don’t address the root cause of the problem this will remain an uphill battle. Multimaster data situations (multiple systems managing to some extent the same data) play a central role in bad data quality. With the data hub we introduce a centralized GUI that enables the organization to manage the data absolutely customized to its needs in a centralized manner, and push down the changes to all downstream systems, whenever possible, in a bi-directional transaction-oriented manner. This opens up the way to centrally measure and manage data quality - but also (maybe even more importantly) to integrate your systems using a centralized component building a hub-and-spoke architecture and not one that is utilizing a fully connected matrix building on point-to-point connections. This allows your IT architects to think about replacing an application in your landscape without a major impact on all surrounding systems.

We will dedicate a separate article to the Data Hub early next year 🙂!


Case Study: Optimizing AI Outcomes Through Data Governance

Background:

Our client inherited a huge number of new systems and data management organizations due to inorganic growth (acquisitions!!).

Challenges:

  • Central vs. Decentral: Our client needed to establish a fine balance between data managed centrally on a group level and data managed on the sister company/business unit level. After the first discussions it was obvious that centralizing everything would overload the data function and decentralizing everything would be a nightmare from the compliance point of view.
  • PII: The client is working on a regulated market managing huge amounts of personal data - this makes the whole situation even more complicated.
  • Data Silos: the new, composite IT landscape contained systems with overlapping functionalities and overlapping content. Clients that were working together with multiple, earlier independent companies needed to be consolidated - on contractual and IT level as well.
  • Incompatible technologies: inorganic growth resulted in introducing practically all possible technological answers to the same business question. Google Cloud, Azure, AWS - our client had and used it all.
  • Inconsistent Data Formats: The above technology landscape was just “designed” to serve as a greenhouse of inconsistent data: data came from various sources with different structures, concepts had different meanings in different business units etc.

  • Regulatory Pressure: after the series of acquisitions our client faced increased regulatory rigor and a series of audits concentrating primarily on the management of PII data.

Actions Taken:

Recycle, recycle, recycle

  • As a first step we interviewed an absolute minimum of key decision makers to identify already existing data management practices that had the potential to be rolled out to the entire organization. We dedicated 2 weeks to do all the resource gathering and one more week to consolidate the results.
  • With this step we not only saved a lot of money and resources for our client, but also nominated the champions of the new, overarching data culture.
  • We explicitly documented the existing practices that the interviewees did not want to migrate over to the “new world”.
  • We established a centralized data model for the organization (detail level dependent on the maturity of each sister company) together with the documentation of the contents and stored it on a central location accessible to all forum members (providing access to the details was delegated to the respective data owner).

Draft, communicate, refine, agree

  • Based on DAI Group’s own framework, and the inputs collected in the first step, we designed a draft data governance framework and summarized the main principles in a 5-slide-presentation.
  • We reported back to the key stakeholders how the overarching (composite) data governance framework, and indicated readiness to customize the draft as long as there are inputs.
  • Buy-in from the C-level helped to cut back on the iterations necessary and we could arrive to a high-level framework that was feasible to execute in ca. 3 weeks. We used this support to speed up the nomination process for the data owners.
  • 5 weeks into the project we had a data governance framework that was agreed upon and supported by all major stakeholders of the organization, a high-level data model and data owners.

Set up decision forums on the expert level

  • We believe to set up small forums to resolve data issues on the expert level that are very hands-on. With this we pilot, we helped define the data people who are enabled to resolve data issues themselves and set the boundaries of when the issues should be escalated. 
  • In the initial phase DAI Group’s experts moderated these forums and ensured that the escalation took place if needed. Typically, our guidance is not needed after 2-3 months of oversight.
  • As decision levels and forums started to work, the data model was getting well documented and data quality management methods started to crystalize. We established a central code repository to store common data quality remedies (in lack of a centralized location to store cleansed data).

Establish mid-level forums and escalation paths - and repeat

  • The next iteration of forums were established - they are on the division/sister company level.
  • Escalation paths from the expert forums were also established.

Processes, roles and responsibilities - hand in hand with the C-level forums

In this step multiple actions were carried out in parallel:

1. The processes of escalating issues were documented and put in context of the overall data management context

2. De-escalating processes were also designed (e.g. a regulator is in contact with the group-level functions - this needs to trickle down to the data experts)

3. Roles and responsibilities crystalized as problems were solved - and as inherited from the original processes and frameworks. We then documented the roles and responsibilities for handover to the client.

Results:

  • Quick results: established data management framework within 3 months for an organization employing ca. 10k employees worldwide.
  • Reducing insecurity: the support of the management made it possible to settle the new data organization within 3 months of time. This reduced the insecurity of the team who could concentrate on their jobs, rather than worrying about if they will have a job.
  • Regulatory compliance: regulators welcomed the clear and straightforward data governance framework.
  • Operational efficiency: time-to-market of the management and regulatory reports was implemented in bi-monthly iterations, which was warmly welcomed by the business stakeholders. The delivery time decreased by a factor of 3-4. At the same time the anticipated quality of the business reports was dramatically increased.
  • AI enablement: the increased data quality and the clear decision making processes enabled the data science function, which originally was part of one of the sister companies to perform group-level AI and ML projects.

Conclusion

Data governance and quality are foundational to the success of any AI initiative. By ensuring data is accurate, consistent, and compliant, organizations can trust the outputs of their AI models and make informed decisions. Implementing a robust data governance framework requires commitment and collaboration across the organization yields significant benefits in performance, compliance, and competitive advantage.

As AI continues to evolve, so too must our approaches to managing the data that fuels it. Embracing data governance and prioritizing data quality positions organizations to harness the full power of AI responsibly and effectively.

Coming Up Next: Part 5 – Selecting the Right AI Tools and Platforms

In the next article, we’ll explore how to choose AI tools and platforms that align with your enterprise needs. We’ll discuss considerations for integrating solutions with existing systems and maximizing your return on investment.

Stay tuned for more insights! Follow DAI Group on LinkedIn or visit our Website to keep up with the series. Your comments and questions are always welcome—let’s continue the conversation below.

Casey Walker

Unlocking business potential through Cloud, Data and AI.

6mo

Great article! Very interesting and insightful 👌

Shoba D

Innovative Business analytic Product & Delivery Leader | User-Centric Solutions and Business Growth | CSM | IIT Certified in Data Mining | Expertise in Medical Devices, Industrial Automation, Automotive SW, LLMs, RAG "

6mo

Looka like you do have an exhaustive DG Model, Kudos!

VERY INFORMATIVE...... The analogy of outdated maps effectively illustrates the risks of poor data management. With regulations like GDPR complicating matters, how can organizations balance innovation and compliance in their AI strategies?

Like
Reply

To view or add a comment, sign in

More articles by DAI Group

Explore topics