The Agile Data (AD) Method

A Metaphor for Data Quality: Data is the New Water

Many people like to say that data is the new oil because of its inherent value. But others, including myself, believe that a better metaphor is that data is the new water. Water, like oil, is also valuable. More importantly, because we require water to survive we want it to be of high quality. Because our organizations require data to thrive thinking of data being the new water provides an excellent metaphor for the importance of data quality.

In Non-Invasive Data Governance Strikes Again Rob Seiner introduces us to this metaphor. Although he has a good start at it, I think it can be taken further.

Data quality metaphor: Data is the new waterWe try to drink a glass of water from a river, but the water is dirty, so we filter it to make it clean and fit for drinking. This is the equivalent of cleansing data at the point of use where we bring in data from the source and then clean it to bring it up to the data quality level that we require. Like forcing everyone to filter their water to make it drinkable is an expensive strategy overall, forcing everyone to cleanse their data to make it useable is incredible wasteful.

To be less wasteful we go upstream to find the source of the pollution. We discover that a factory is dumping filth into the river. To address this problem we could put a filter on the outflow pipe to prevent the pollution from escaping. In a way this is the equivalent to the transformation part of an extract-transform-load (ETL) strategy, or ELT strategy depending on your architecture.   This is where Rob stopped with the metaphor in his book. This is certainly a better solution than forcing everyone to filter their water, or in our case to cleanse their own data. Unfortunately with this approach we really haven’t addressed the data quality problem but instead put a bandage on it. We can do better.

A better solution is to investigate the actual cause of the pollution within the factory so that way we can fix the problem at the source. In this case we discover that a machine is broken and leaking fluids. These fluids are then getting into the factory’s output and are eventually dumped into the river. The data equivalent to fixing the machine is to fix the data source itself, likely via techniques such as database refactoring and data cleansing at the source. Fixing the source data is clearly critical from the point of view of organizational data quality, but we still need to go one step further.

Regarding the machine that produces the filth getting into the water, we should investigate how the machine got that way. Our goal is to identify the root cause of the problem and then address that so as to avoid more machines getting broken in the future. Similarly, we should investigate how the data quality problems were injected into the data source to begin with. Perhaps the developers of those systems didn’t understand the existing semantics of the data source they were working with. If so, the agile solution would be to put automated database regression tests in place to ensure the semantics are followed. Perhaps the problem is in the user interface of the system in that its missing basic formatting and consistency checks for the data fields.

 

Towards True Data Quality

It is clear to me that data is the new water. Just like we need clean water to live our organizations need clean data to thrive. An effective data quality strategy focuses on data through the entire lifecycle, from data source to data usage. Mature data-oriented organizations recognize that when data quality problems are detected they must be followed to the true source and addressed there.

 

Related Resources


Recommended Reading

Choose Your WoW! 2nd Edition
This book, Choose Your WoW! A Disciplined Agile Approach to Optimizing Your Way of Working (WoW) – Second Edition, is an indispensable guide for agile coaches and practitioners. It overviews key aspects of the Disciplined Agile® (DA™) tool kit. Hundreds of organizations around the world have already benefited from DA, which is the only comprehensive tool kit available for guidance on building high-performance agile teams and optimizing your WoW. As a hybrid of the leading agile, lean, and traditional approaches, DA provides hundreds of strategies to help you make better decisions within your agile teams, balancing self-organization with the realities and constraints of your unique enterprise context.

 

I also maintain an agile database books page which overviews many books you will find interesting.