Order! Order! - Data Governance!

Order! Order! - Data Governance!

 Data Governance is a very essential part of your data ecosystem. However, I have been thinking and re-thinking on how to write this post. For the last couple of decades, we have been trying to properly govern, standardize and secure data while trying to maintain data quality. We have been trying to create a "Single Source of Truth", so to speak.

To this end, there are several sub disciplines like metadata management, master data management, data cataloging etc. that have evolved over the years. The challenge has always been enforcing the governance principles across the organization and instilling that discipline in the teams who are data producers, data stewards etc.

Many of you may have seen these diagrams or variations of these:

Article content
Figure 1. Sample Governance Framework


Article content
Figure 2. Sample ETL Process

The challenge we have, is that these diagrams haven't changed much in the last decade or so. With the advent of Artificial Intelligence and Machine Learning, I think that a more dynamic and pragmatic approach to Data Governance is needed. Enforcement of Data Governance principles across the organization also remains a challenge, to this day.  

Let's take Figure 2, for example. If you try to implement governance at the data source level, it is nearly impossible, as the number of data sources and formats will continue to grow and morph based on the business needs. At the ETL layer also, the story is the same as the requirements will keep changing. Now, the logical place to implement data governance becomes at the warehouse/data lake level. The idea here is to create a "Single Source of Truth" for the entire organization.

Today, the tools and technologies have evolved to the point wherein several governance principles can be automated. So, let's talk about three governing principles of Data Governance that I think can be used effectively and enforced with minimal human intervention, to create our Single Source of Truth.

    1. Standardization

        "Pi, Pie and Py, not so elementary, my dear Watson", Sherlock Holmes would have to say, if he were

to standardize data across the organization. Efforts to standardize data is not so simple. For example, if an organization has a Customer_ID, Customer_No, Customer# all pertaining to a unique id representing a customer (this is a very common scenario), the onus is upon the Data Practitioner to ensure that this terminology is standardized across the organization. This used to be done in consultation with the Data Governance Council, usually meeting once a month. I am not saying that this process should be changed drastically, but this can be done much faster with the tools and technologies available in the market today. For example, today's tools can semantically analyze data (standardize nomenclature), collect metadata, create a data catalog and do most of the governance functions automatically.

    2. Security

        Security is another important aspect of data governance that the Data Practitioner has to look through, with a fine tooth and comb. Like I had mentioned earlier in one of my posts, almost 83% of data breaches occur due to collusion from internal actors. Data lineage, traceability and auditability are important functions that we have to pay attention to. Again, a lot of today's tools and technologies have these capabilities built in.

    3. Quality

        Data Quality, as I had mentioned earlier, requires business context. For example, if you are dealing with Healthcare data, and your customer says that I need to get my ICD coding right, or CPT mapping done, pay attention to these. These are Healthcare specific taxonomies, but they have financial implications for the customer. 

To give you a real life example, during the COVID-19 pandemic, a new ICD (International Classification of Diseases) code was introduced called U07.1 to denote that the patient has been diagnosed with COVID. Now, let's think about this for a second. If a patient has been diagnosed with COVID, quarantine, ventilator, etc. would have been the norm in treating that patient. The following charges would be the norm for that patient:

1. Room charges for isolation (Quarantine)

2. Ventilator charges (depending on severity)

3. PPE kits for the staff

4. Medication charges

5. Lab charges etc. (this is not  an exhaustive list, but you get the idea)

Now, if the patient had presented with bilateral pneumonia (which is a symptom of COVID), but the RT-PCR test was negative, then the charges would just include consultation charges and any medication given in the hospital. The patient would have been discharged at that point and no more charges would have been incurred. The patient/insurance company would have not been charged like a COVID patient. There is only one issue - the ICD code for  bilateral pneumonia is J18.9 and not U07.1! So, getting this wrong has major financial implications for the patient as well as the payer (insurer). 

So, when you are governing data or putting principles in place to ensure data quality, ensure that you keep the business in mind. If we can get the above three principles right, then we can ensure that we are governing data to a good extent and effectively for the organization. There is a lot more to Data Governance like compliance, governing laws etc. that we are not covering here, but getting these basics right will help us with all of that.

Today's tools and technologies have evolved to the point where terminology can be standardized, security can be monitored and quality ensured with business context automatically. Next, we will talk about Delivering Data

To view or add a comment, sign in

More articles by Kishore Nair

  • Your Order is Here! - Delivering Data

    So, we have so far explored data acquisition, data management, data security, data governance and data quality good…

  • Data Quality - AI also needs diet and exercise !!!

    I often jest with my colleagues: "..

  • How? - Considerations for Organizing Data

    Now that we have a fairly good idea of how to think about data, let's talk about the How. As we discussed previously…

  • Confucius said - "When?"

    Unlike Confucius, who was very clear and precise in his messages, data can be very confusing. Data Management, Data…

  • Thinking about Data - The Why?

    Folks have asked me over the years, how to get into data, what tools and technologies to focus on, how to become a data…

    1 Comment

Insights from the community

Others also viewed

Explore topics