The Agile Data (AD) Method

Data Quality (DQ) in an Agile World

To paraphrase Ford Motor Company, quality is job one in agile. This includes data quality. Having said that, few agile methods directly address data quality, and in fact, few of them address data issues at all. The Agile Data (AD) method is an exception.

This article explores several issues:

  1. What is data quality?
  2. What are potential data quality issues to look for?
  3. How does data quality fit into agile ways of working (WoW)?
  4. What is your agile data quality learning journey?

 

Defining Data Quality

Here is Agile Data’s definition for quality data:

Quality data is data that meets or exceeds the quality criteria of the consumers of that data.

This definition is interesting because it isn’t exact. Quality, including data quality, is in the eye of the beholder. This is a colloquial way of saying that context counts. The needs of the data consumer determine the quality of that data. 

 

Potential Data Quality Issues

The following table describes potential DQ quality issues.

Data Quality Issue Challenges to Detect
Accessibility/privacy. Do people have access to the data that they should, and no more? Potentially difficult as it relies on well-defined and up-to-date access control. Requires integration into an operational access control strategy.
Accuracy. Can you confirm that the data value represents the concept that it is meant to? Difficult to do outside of the source system where the data is originally generated.
Completeness. Are all the necessary data fields present? Straightforward count of the null/zero/blank values.
Conformity. Is the format or type of the values of a data field consistent? Requires definition of expected values.
Consistency. How well does the data align with other representations of it? Requires mapping of related fields and computation of the content profile within each field.
Integrity. Are the relationships between this data and other data correct? Requires ontology metadata.
Precision. How close is the data value to the real-world?  Has it been rounded for example? Requires metadata describing expected precision.
Reasonability. Do the data values make sense given the context? Requires metadata describing expected values for a given field and data source. Expected values may vary across data sources for a given field.
Relevance. Is the data what you need, and no more? Requires understanding of current requirements for the data.
Reliability. Can you trust the data source? Requires analysis, including profiling, of the data source as well as an understanding of quality requirements for that data.
Timeliness. Is the data sufficiently current? Requires an understanding of stakeholder needs. It could vary at the field level within a given data source.
Understandability. Does the data make sense? Is the format/representation clear? Requires metadata describing what is expected of the data.
Uniqueness. Is the data recorded once and once only? DV2’s change data capture (CDC) strategy makes it easy to detect differences between incoming records for a given concept.
Validity. How close is the data value to what it is expected to be? Requires an understanding of both the expected range and the distribution of data values across that range.
The challenge is that the importance and extent of each of these data quality criteria may vary across data fields and usage contexts. For example, consider a person’s birth date, weight, and blood pressure. Consider which of the above criteria are important when this information is being gathered when applying for a bank loan, versus when it is being gathered when you are in a hospital emergency room. Same data, but very different quality criteria apply. One approach doesn’t fit all. Additional issues to consider include:

  • Enforcement. Are mechanisms in place to ensure the continuing quality of the data?
  • Value. Is the benefit of the data greater than the cost of providing it?

 

How Does Data Quality Fit Into Agile?

In theory, the need for clean data is orthogonal to the paradigm that you’re following. In practice, the greater focus on quality within the agile community, at least amongst disciplined practitioners, provides greater opportunity for improved data quality.

But this requires effort. It requires people skilled in techniques that either avoid poor-quality data from the start or address data quality problems when they are discovered. This is why this site describes a collection of agile techniques focused on data quality. Agile data quality techniques tend to be more effective in practice than traditional techniques, largely due to shorter feedback cycles and greater automation support.

Figure 1 maps the agile data quality techniques described at this site to the agile software development lifecycle, and Figure 2 does the same for a DataOps rendering of the lifecycle.  As you can see, there are many potential techniques for you to adopt. And of course, you may also decide to adopt some traditional data quality techniques, hopefully in as agile a manner as possible.

Figure 1. Agile data quality techniques throughout the agile software development lifecycle (click to enlarge).

Data quality through the agile lifecycle

 

Figure 2. Agile data quality techniques through the DataOps lifecycle (click to enlarge).

Data quality techniques through the DataOps lifecycle

 

Your Agile Data Quality (DQ) Learning Journey

Given the growing importance of data in modern organizations, it is clear that software professionals need to gain greater knowledge and skills in this space. Here is my advice for learning more about how data quality is addressed in an “agile world”:

  1. Adopt an agile mindset towards data quality.  If you haven’t already done so, first read about the Agile Data Way of Thinking (WoT). Then consider the data quality metaphor that data is the new water.
  2. Recognize the impact of data debt.
  3. Learn about the range of data quality techniques available to you. The article Data Quality Techniques provides a great overview and links to detailed overviews.
  4. Learn how to identify the DQ techniques that are right for your context. In Assessing Data Quality Techniques, I describe a framework to understand the advantages and disadvantages of individual DQ techniques. In Comparing Data Quality Techniques, I show how over 20 DQ techniques compare on the five factors described in the previous article.

 

Related Reading