Data Quality (DQ) in an Agile World
To paraphrase Ford Motor Company, quality is job one in agile. This includes data quality. Having said that, few agile methods directly address data quality, and in fact, few of them address data issues at all. The Agile Data (AD) method is an exception.
This article explores several issues:
- What is data quality?
- What are potential data quality issues to look for?
- How does data quality fit into agile ways of working (WoW)?
- What is your agile data quality learning journey?
Defining Data Quality
Here is Agile Data’s definition for quality data:
Quality data is data that meets or exceeds the quality criteria of the consumers of that data.
Potential Data Quality Issues
The following table describes potential DQ quality issues.
| Data Quality Issue | Challenges to Detect |
| Accessibility/privacy. Do people have access to the data that they should, and no more? | Potentially difficult as it relies on well-defined and up-to-date access control. Requires integration into an operational access control strategy. |
| Accuracy. Can you confirm that the data value represents the concept that it is meant to? | Difficult to do outside of the source system where the data is originally generated. |
| Completeness. Are all the necessary data fields present? | Straightforward count of the null/zero/blank values. |
| Conformity. Is the format or type of the values of a data field consistent? | Requires definition of expected values. |
| Consistency. How well does the data align with other representations of it? | Requires mapping of related fields and computation of the content profile within each field. |
| Integrity. Are the relationships between this data and other data correct? | Requires ontology metadata. |
| Precision. How close is the data value to the real-world? Has it been rounded for example? | Requires metadata describing expected precision. |
| Reasonability. Do the data values make sense given the context? | Requires metadata describing expected values for a given field and data source. Expected values may vary across data sources for a given field. |
| Relevance. Is the data what you need, and no more? | Requires understanding of current requirements for the data. |
| Reliability. Can you trust the data source? | Requires analysis, including profiling, of the data source as well as an understanding of quality requirements for that data. |
| Timeliness. Is the data sufficiently current? | Requires an understanding of stakeholder needs. It could vary at the field level within a given data source. |
| Understandability. Does the data make sense? Is the format/representation clear? | Requires metadata describing what is expected of the data. |
| Uniqueness. Is the data recorded once and once only? | DV2’s change data capture (CDC) strategy makes it easy to detect differences between incoming records for a given concept. |
| Validity. How close is the data value to what it is expected to be? | Requires an understanding of both the expected range and the distribution of data values across that range. |
- Enforcement. Are mechanisms in place to ensure the continuing quality of the data?
- Value. Is the benefit of the data greater than the cost of providing it?
How Does Data Quality Fit Into Agile?
In theory, the need for clean data is orthogonal to the paradigm that you’re following. In practice, the greater focus on quality within the agile community, at least amongst disciplined practitioners, provides greater opportunity for improved data quality.
But this requires effort. It requires people skilled in techniques that either avoid poor-quality data from the start or address data quality problems when they are discovered. This is why this site describes a collection of agile techniques focused on data quality. Agile data quality techniques tend to be more effective in practice than traditional techniques, largely due to shorter feedback cycles and greater automation support.
Figure 1 maps the agile data quality techniques described at this site to the agile software development lifecycle, and Figure 2 does the same for a DataOps rendering of the lifecycle. As you can see, there are many potential techniques for you to adopt. And of course, you may also decide to adopt some traditional data quality techniques, hopefully in as agile a manner as possible.
Figure 1. Agile data quality techniques throughout the agile software development lifecycle (click to enlarge).
Figure 2. Agile data quality techniques through the DataOps lifecycle (click to enlarge).
Your Agile Data Quality (DQ) Learning Journey
Given the growing importance of data in modern organizations, it is clear that software professionals need to gain greater knowledge and skills in this space. Here is my advice for learning more about how data quality is addressed in an “agile world”:
- Adopt an agile mindset towards data quality. If you haven’t already done so, first read about the Agile Data Way of Thinking (WoT). Then consider the data quality metaphor that data is the new water.
- Recognize the impact of data debt.
- Learn about the range of data quality techniques available to you. The article Data Quality Techniques provides a great overview and links to detailed overviews.
- Learn how to identify the DQ techniques that are right for your context. In Assessing Data Quality Techniques, I describe a framework to understand the advantages and disadvantages of individual DQ techniques. In Comparing Data Quality Techniques, I show how over 20 DQ techniques compare on the five factors described in the previous article.
Related Reading
- The Agile Database Techniques Stack
- Clean Data Architecture
- Clean Database Design
- Continuous Database Integration (CDI)
- Data Cleansing: Applying The “5 Whys” To Get To The Root Cause
- Data Debt: Understanding Enterprise Data Quality Problems
- Data Quality Techniques
- Data Repair
- Database refactoring
- Database testing
- Impact of Poor Data Quality
- Metaphor for Data Quality: Data is the New Water
- Test-Driven Development (TDD)

