Data quality plan definition

 Illustration with collage of pictograms of clouds, pie chart, graph pictograms on the following

What is data quality?

Data quality measures how well a dataset meets criteria for accuracy, completeness, validity, consistency, uniqueness, timeliness and fitness for purpose, and it is critical to all data governance initiatives within an organization.

Data quality standards ensure that companies are making data-driven decisions to meet their business goals. If data issues, such as duplicate data, missing values, outliers, aren’t properly addressed, businesses increase their risk for negative business outcomes. According to a Gartner report, poor data quality costs organizations an average of USD 12.9 million each year 1 . As a result, data quality tools have emerged to mitigate the negative impact associated with poor data quality.

When data quality meets the standard for its intended use, data consumers can trust the data and leverage it to improve decision-making, leading to the development of new business strategies or optimization of existing ones. However, when a standard isn’t met, data quality tools provide value by helping businesses to diagnose underlying data issues. A root cause analysis enables teams to remedy data quality issues quickly and effectively.

Data quality isn’t only a priority for day-to-day business operations; as companies integrate artificial intelligence (AI) and automation technologies into their workflows, high-quality data will be crucial for the effective adoption of these tools. As the old saying goes, “garbage in, garbage out”, and this holds true for machine learning algorithms as well. If the algorithm is learning to predict or classify on bad data, we can expect that it will yield inaccurate results.

Ebook Build responsible AI workflows with AI governance

Learn the building blocks and best practices to help your teams accelerate responsible AI.

Related content

Read the guide for data leaders

Data quality vs. data integrity vs. data profiling

Data quality, data integrity and data profiling are all interrelated with one another. Data quality is a broader category of criteria that organizations use to evaluate their data for accuracy, completeness, validity, consistency, uniqueness, timeliness, and fitness for purpose. Data integrity focuses on only a subset of these attributes, specifically accuracy, consistency, and completeness. It also focuses on this more from the lens of data security, implementing safeguards to prevent against data corruption by malicious actors.

Data profiling, on the other hand, focuses on the process of reviewing and cleansing data to maintain data quality standards within an organization. This can also encompass the technology that support these processes.

Dimensions of data quality

Data quality is evaluated based on a number of dimensions, which can differ based on the source of information. These dimensions are used to categorize data quality metrics:

These metrics help teams conduct data quality assessments across their organizations to evaluate how informative and useful data is for a given purpose.

Why is data quality important?

Over the last decade, developments within hybrid cloud, artificial intelligence, the Internet of Things (IoT), and edge computing have led to the exponential growth of big data. As a result, the practice of master data management (MDM) has become more complex, requiring more data stewards and rigorous safeguards to ensure good data quality.

Businesses rely on data quality management to support their data analytics initiatives, such as business intelligence dashboards. Without this, there can be devastating consequences, even ethical ones, depending on the industry (e.g. healthcare). Data quality solutions exist to help companies maximize the use of their data, and they have driven key benefits, such as:

Related products and solutions AI consulting services

Reimagine how you work with AI: our diverse, global team of more than 20,000 AI experts can help you quickly and confidently design and scale AI and automation across your business, working across our own IBM watsonx technology and an open ecosystem of partners to deliver any AI model, on any cloud, guided by ethics and trust.

Explore IBM AI consulting services AI solutions

Operationalize AI across your business to deliver benefits quickly and ethically. Our rich portfolio of business-grade AI products and analytics solutions are designed to reduce the hurdles of AI adoption and establish the right data foundation while optimizing for outcomes and responsible use.

Explore IBM AI solutions IBM watsonx.data

Now available, a fit-for-purpose data store built on an open data lakehouse architecture to scale AI workloads, for all your data, anywhere.

Explore watsonx.data Try watsonx.data Resources Data governance and privacy for data leaders

Read an IBM guide about the building blocks of data governance and privacy.

Three steps to data quality and AI performance

Learn from the experts in this step-by-step guide on how to take a values-driven approach to data quality and AI practices.

Gartner® Magic Quadrant™

IBM named a Leader for the 18th year in a row in the 2023 Gartner® Magic Quadrant™ for Data Integration Tools