Data quality, completeness, and its impact on analytical insights.

There is no doubt that we are in the age of digitization.

Data is the most important commodity and wherever you look, be in industry, education, entertainment, security, everyone is looking to Organize, Productize and monetize data.

Visual data stories enable easy and non-person dependent reporting. They give the CXO suite, the ability to get a bird -eye view on the overall business, take strategic decisions or corrective actions as the situation demands

Drill down reporting can help locating problem areas. Trends can help formulate the next steps that need to be take or even facilitate direction correction or change as required

All this is common knowledge. The fact I want to emphasize on is that al of this is direct result of the input data. The data granularity, completeness, authenticity, and integrity, in short data quality plays a big part in the effectiveness and validity of the end-result.

What is data quality

In any function data is collected from various sources. The sources of data can be

  • Existing enterprise systems
  • Applications
  • Manual entry
  • Machines / Devices / Sensors

All these sources can generate data individually as well as co-related / co-dependent data.

The question then what are the attributes that define good quality of data?

No alt text provided for this image
 Data that is to be used for analysis or insights needs to be dependable 

Ensuring Data Quality

The big question is “How to ensure quality of data which will be used for analysis and reporting”

There are multiple ways to check source data to ensure that the foundation on which the analysis will be built. We can apply business rules based on the following

No alt text provided for this image

Relational values: Many times, there exist data dependencies or relations which need to be fulfilled for completeness or integrity. For example, any credit entry must be matched with debit entry.

Completeness of values: Incomplete information can lead to erroneous conclusion. Only complete information gives the true picture. For example, incomplete timesheets not only impact the monthly and weekly billing, but it will also impact the salaries and nobody benefits in the end.

Limits: Some data points are only valid if the value exists between acceptable defined limits. Especially in case of machine data. For example, production amount cannot be negative.


No alt text provided for this image

Authenticity: Only authentic information should be considered. Any spurious or incorrect entries can cause havoc by creating skewed analysis. 


Process compliance of Data: A product or service is an amalgamation of multiple processes continuously interacting with other. These processes leverage multiple data systems and broken links in between systems can cause entire products to fail. For example, raw materials in an inventory system that is not connected to a financial system has a huge impact on the billability of a product.

Thus, before beginning analysis or reporting we should ensure the sanctity of the source.


So how do we go about it

Define / Identify business rules, limits, relationships. A data template led approach which is configuration driven,

Create a mechanism to parse source data. This mechanism can be a workflow which evaluates various parameters and rules. Additionally, dashboards and charts can be used to identify inconsistencies.

This can be done manually or by using tools to accelerate and simplify the task. How to easily create workflows and what can be the structure of dashboards, we will discuss in following articles.

To view or add a comment, sign in

More articles by Niranjan Dikshit

  • Data lake dichotomy – objective or end result ?

    A journey of a thousand miles starts with one small step Do we start with a specific end destination in mind, or do we…

  • Reporting Vs Analytics

    "Elementary my dear Watson" As I was watching “Sherlock” the other day, it suddenly struck me on how striking the…

  • AI & IOT

    Artificial Intelligence (AI) and Data Science are instrumental in realizing the full potential of the Internet of…

  • Data Steward – The Bridge

    As the concept of “Data as a Service” gains traction in the industry, there are various roles which assume key…

  • Visual Storyboarding Basics

    Visual analytics, or representation of data in a visual format is a necessary requirement for reporting in today’s age.…

  • Be Data Smart!!

    Context Data driven decision making is on the rise in enterprises across the spectrum. It is natural as everyone has…

  • Data Governance Basics

    Context The word governance has many interpretations in our minds. Typically, to summarize governance represents…

    1 Comment
  • Are you writing good code?

    Quality of code plays a major role in the success of any software. It has a direct impact on the total amount of bugs/…

    2 Comments
  • The Connection Conundrum - Things (Devices, Sensors, Protocols)

    The world around us is connected in more ways than one. The number of devices, sensors, and their applications is…

  • IOT/ IIOT solution adoption challenges

    IOT, IIOT, Industry 4.0 are terms which are much used, sometimes abused terms in the market today.

Insights from the community

Others also viewed

Explore topics