Data Quality (DQ) in an Agile World
To paraphrase Ford Motor Company, quality is job one in agile. This includes data quality. Having said that, few agile methods directly address data quality and in fact few of them address data issues at all. The Agile Data (AD) method is of course an exception. This article explores two issues: What is data quality and how does data quality fit into agile ways of working (WoW)?
Defining Data Quality
Here is Agile Data’s definition for quality data:
Quality data is data that meets or exceeds the quality criteria of the user(s) of that data.
- Accessibility/privacy. Do people have access to the data that they should, and no more?
- Accuracy. Can you confirm that the data value represents the concept that it is meant to?
- Completeness. Are all the necessary data fields present?
- Conformity. Is the format or type of the values of a data field consistent?
- Consistency. How well does the data align with other representations of it?
- Integrity. Are the relationships this data has with other data correct?
- Precision. How close is the data value to the real world value? Has it been rounded for example?
- Reasonability. Do the data values make sense given the context?
- Relevance. Is the data what you need and no more?
- Reliability. Can you trust the data source?
- Timeliness. Is the data sufficiently current?
- Understandability. Does the data make sense? Is the format/representation clear?
- Uniqueness. Is the data recorded once and once only?
- Validity. How close is the data value to what it is expected to be?
The challenge is that the importance, and extent, of each of these data quality criteria will potentially vary by data field and the usage context. For example, consider a person’s birth date, weight, and blood pressure. Consider which of the above criteria are important when this information is being gathered when applying for a bank loan versus when it is being gathered when you are in a hospital emergency room. Same data, but very different quality criteria apply. One approach doesn’t fit all.
Additional issues to consider include:
- Enforcement. Are mechanisms in place to ensure the continuing quality of the data?
- Value. Is the benefit of the data greater than the cost of providing it?
How Does Data Quality Fit Into Agile?
In theory, the need for clean data is orthogonal to the paradigm that you’re following. In practice, the greater focus on quality within the agile community, at least amongst the disciplined practitioners, provides greater opportunity for improved data quality.
But this requires effort. It requires people skilled in techniques that either avoid poor quality data to begin with or addresses data quality problems when they are discovered. This is why this site describes a collection of agile techniques focused on data quality. Agile data quality techniques tend to be more effective in practice than traditional techniques, mostly due to reduced feedback cycle and automation support.
Figure 1 maps the agile data quality techniques described at this site to the agile software development lifecycle. As you can see, there are many potential techniques for you to adopt. And of course you may also decide to adopt some traditional data quality techniques, hopefully in as agile a manner as possible.
Figure 1. Agile data quality techniques throughout the agile software development lifecycle (click to enlarge).
Your Agile Data Quality (DQ) Learning Journey
Given the growing importance of data in modern organizations it is clear that software professionals need to gain greater knowledge and skills in this space. Here is my advice for learning more about how data quality is addressed in an “agile world”:
- Adopt the an agile mindset towards data quality. If you haven’t already done so, first read about the Agile Data Way of Thinking (WoT). Then consider the data quality metaphor that data is the new water.
- Recognize the impact of data technical debt.
- Learn about the range of data quality techniques available to you. The article Data Quality Techniques provides a great overview and links to detailed overviews.
- Learn how to identify the DQ techniques that are right for your context. In Assessing Data Quality Techniques I describe a framework to understand the advantages and disadvantages of individual DQ techniques. In Comparing Data Quality Techniques I show how over 20 DQ techniques compare on the five factors described in the previous article.
Related Reading
- The Agile Database Techniques Stack
- Clean Data Architecture
- Clean Database Design
- Continuous Database Integration (CDI)
- Data Cleansing: Applying The “5 Whys” To Get To The Root Cause
- Data Quality Techniques
- Data Repair
- Database refactoring
- Data Technical Debt
- Database testing
- Impact of Poor Data Quality
- Metaphor for Data Quality: Data is the New Water
- Test-Driven Development (TDD)