Data quality, completeness, and its impact on analytical insights.
There is no doubt that we are in the age of digitization.
Data is the most important commodity and wherever you look, be in industry, education, entertainment, security, everyone is looking to Organize, Productize and monetize data.
Visual data stories enable easy and non-person dependent reporting. They give the CXO suite, the ability to get a bird -eye view on the overall business, take strategic decisions or corrective actions as the situation demands
Drill down reporting can help locating problem areas. Trends can help formulate the next steps that need to be take or even facilitate direction correction or change as required
All this is common knowledge. The fact I want to emphasize on is that al of this is direct result of the input data. The data granularity, completeness, authenticity, and integrity, in short data quality plays a big part in the effectiveness and validity of the end-result.
What is data quality
In any function data is collected from various sources. The sources of data can be
All these sources can generate data individually as well as co-related / co-dependent data.
The question then what are the attributes that define good quality of data?
Data that is to be used for analysis or insights needs to be dependable
Ensuring Data Quality
The big question is “How to ensure quality of data which will be used for analysis and reporting”
There are multiple ways to check source data to ensure that the foundation on which the analysis will be built. We can apply business rules based on the following
Recommended by LinkedIn
Relational values: Many times, there exist data dependencies or relations which need to be fulfilled for completeness or integrity. For example, any credit entry must be matched with debit entry.
Completeness of values: Incomplete information can lead to erroneous conclusion. Only complete information gives the true picture. For example, incomplete timesheets not only impact the monthly and weekly billing, but it will also impact the salaries and nobody benefits in the end.
Limits: Some data points are only valid if the value exists between acceptable defined limits. Especially in case of machine data. For example, production amount cannot be negative.
Authenticity: Only authentic information should be considered. Any spurious or incorrect entries can cause havoc by creating skewed analysis.
Process compliance of Data: A product or service is an amalgamation of multiple processes continuously interacting with other. These processes leverage multiple data systems and broken links in between systems can cause entire products to fail. For example, raw materials in an inventory system that is not connected to a financial system has a huge impact on the billability of a product.
Thus, before beginning analysis or reporting we should ensure the sanctity of the source.
So how do we go about it
Define / Identify business rules, limits, relationships. A data template led approach which is configuration driven,
Create a mechanism to parse source data. This mechanism can be a workflow which evaluates various parameters and rules. Additionally, dashboards and charts can be used to identify inconsistencies.
This can be done manually or by using tools to accelerate and simplify the task. How to easily create workflows and what can be the structure of dashboards, we will discuss in following articles.