What is Data Quality?

What is Data Quality?

Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability and whether it's up to date. Measuring data quality levels can help organizations identify data errors that need to be resolved and assess whether the data in their IT systems is fit to serve its intended purpose.

The emphasis on data quality in enterprise systems has increased as data processing has become more intricately linked with business operations and organizations increasingly use data analytics to help drive business decisions. Data quality management is a core component of the overall data management process, and data quality improvement efforts are often closely tied to data governance programs that aim to ensure data is formatted and used consistently throughout an organization.

Why data quality is important

Bad data can have significant business consequences for companies. Poor-quality data is often pegged as the source of operational snafus, inaccurate analytics and ill-conceived business strategies. Examples of the economic damage data quality problems can cause include added expenses when products are shipped to the wrong customer addresses, lost sales opportunities because of erroneous or incomplete customer records, and fines for improper financial or regulatory compliance reporting.

Consulting firm Gartner said in 2021 that bad data quality costs organizations an average of $12.9 million per year. Another figure that's still often cited is a calculation by IBM that the annual cost of data quality issues in the U.S. amounted to $3.1 trillion in 2016. And in an article he wrote for the MIT Sloan Management Review in 2017, data quality consultant Thomas Redman estimated that correcting data errors and dealing with the business problems caused by bad data costs companies 15% to 25% of their annual revenue on average.

In addition, a lack of trust in data on the part of corporate executives and business managers is commonly cited among the chief impediments to using business intelligence (BI) and analytics tools to improve decision-making in organizations. All of that makes an effective data quality management strategy a must.

What is good data quality?

Data accuracy is a key attribute of high-quality data. To avoid transaction processing problems in operational systems and faulty results in analytics applications, the data that's used must be correct. Inaccurate data needs to be identified, documented and fixed to ensure that business executives, data analysts and other end users are working with good information.

Other aspects, or dimensions, that are important elements of good data quality include the following:

  • completeness, with data sets containing all of the data elements they should;
  • consistency, where there are no conflicts between the same data values in different systems or data sets;
  • uniqueness, indicating a lack of duplicate data records in databases and data warehouses;
  • timeliness or currency, meaning that data has been updated to keep it current and is available to use when it's needed;
  • validity, confirming that data contains the values it should and is structured properly; and
  • conformity to the standard data formats created by an organization.

Meeting all of these factors helps produce data sets that are reliable and trustworthy. A long list of additional dimensions of data quality can also be applied -- some examples include appropriateness, credibility, relevance, reliability and usability.

To view or add a comment, sign in

More articles by Shruti Anand

  • What is Data Quality?

    Data quality is defined as the degree to which data meets a company’s expectations of accuracy, validity, completeness,…

  • Generative AI

    Generative AI, sometimes called gen AI, is artificial intelligence (AI) that can create original content such as text…

  • AWS Lambda

    AWS Lambda is a powerful serverless computing service that automatically runs code in response to events, without…

  • Google Cloud Platform

    Google Cloud Platform (GCP) is a cloud computing service by Google that helps businesses, developers, and enterprises…

  • SQL

    SQL stands for Structured Query Language. It is a standardized programming language used to manage and manipulate…

  • What is Microsoft Power Automate?

    Microsoft Power Automate, formerly called Microsoft Flow, is cloud-based software that allows employees to create and…

  • ETL

    The ETL (Extract, Transform, Load) process plays an important role in data warehousing by ensuring seamless integration…

  • What is UiPath Used For?

    UiPath is used to streamline business processes that are time-consuming, manual, error-prone, repetitive, and mundane…

  • Data Bricks

    Databricks is a unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade…

  • SSIS

    SSIS Definition SQL Server Integration Services (SSIS) is a Microsoft SQL Server database built to be a fast and…

Insights from the community

Others also viewed

Explore topics