Modern software development processes, including
Disciplined Agile Delivery (DAD), Extreme Programming (XP), and
Scrum are all evolutionary
in nature. The implication is that if data professionals are going to be
effective members of such teams, then they need to adopt tools and
techniques which enable them to do so. There is nothing special
about the data aspects of your IT ecosystem: they can be developed in an
evolutionary manner just like non-data aspects. Even
data warehouse and business intelligence solutions
can be developed in an evolutionary manner (and quite frankly this is
the preferred approach).
This article overviews what we call the Agile Database Techniques Stack,
a collection of strategies required by modern data development professionals.
It is organized into the following topics:
- Agile database techniques stack
- Why is this a stack?
- Why adopt the agile database techniques stack?
- What is the best way to adopt the agile database techniques stack?
- Can you adopt individual techniques?
1. The Agile Database Techniques Stack
The techniques stack is overviewed in Figure 1 below.
Figure 1. The Agile Database Techniques Stack.
These strategies of the agile database techniques stack are:
- Vertical slicing.
A fundamental agile development technique is to slice functionality vertically into small,
consumable pieces that may be potentially deployed into production quickly.
These vertical slices are completely implemented - the analysis, design,
programming, and testing are complete - and offer real business value to stakeholders.
is completely applicable, and highly desirable, in data development.
- Clean data architecture and clean data design. A
clean data architecture
strategy enables you to develop and evolve your data assets at a pace which safely and effectively
supports your organization - in short, to be agile. Similarly, a
clean database design
enables you to evolve specific data assets in an agile manner.
- Agile data modeling.
With an evolutionary approach to data modeling you model the data
aspects of a system iteratively and incrementally. With an
agile data modeling
approach you do so in a highly collaborative and streamlined manner.
- Database refactoring. A
is a small change to your database schema which improves its
design without changing its semantics (e.g. you don't add anything nor do
you break anything). The process of database refactoring is the
evolutionary improvement of your database schema so as to improve your
ability to support the new needs of your customers.
- Automated database regression testing. You should ensure that
your database schema actually meets the requirements for it, and the best way to
do that is via testing. With a
test driven development
(TDD) approach you write a unit test before you write production
database schema code, the end result being that you have an
automated regression test for your database schema.
to ensure data quality.
- Continuous database integration (CDI).
Continuous integration (CI) is the automatic invocation of the build process of a system.
As the name implies,
continuous database integration (CDI)
is the database version of CI.
- Configuration management.
Your data models, database tests, test data, and so on are important
artifacts which should be put under
just like any other artifact.
2. Why is This a Stack?
We call it a stack because each technique relies
on you being able to perform the ones below it.
For example, as you see in Figure 2
continuous database integration requires you to have a
configuration management strategy in place.
Figure 2. How the techniques rely on each other.
3. Why Adopt The Agile Database Techniques Stack?
Any given technique has advantages and disadvantages, including the ones overviewed in this article.
Every given practice works well in some situations, and may even be the "best"
that you can do in those situations, but doesn't work well in others.
So practices are contextual in nature and should be presented as such.
To present something as a
"best practice" is deceptive in my opinion.
To prove my point, the advantages and disadvantages of each strategy of the agile database techniques stack are
summarized in the following table.
- Deliver high-value functionality sooner.
- Reduces risk.
- Enables opportunity for feedback.
- Requires clean architecture, design, and implementation.
- Requires "full stack" data capability within a team to develop and evolve the solution.
Clean Data Architecture
- Easier to understand.
- Easier to evolve, thereby enabling agility.
- Easier to validate.
- Requires investment to keep clean, including in architectural modeling and architectural refactoring.
- Existing legacy architectures often have significant technical debt that needs to be addressed before your architecture is sufficiently clean.
Clean Database Design
- Easier to understand and evolve.
- Easier to test.
- Increases ability to evolve data hosting strategy.
- Requires investment to keep clean, including in agile design modeling and database refactoring.
- Existing legacy designs often have significant technical debt that needs to be addressed before your design is sufficiently clean.
Agile Data Modeling
- Enables evolutionary exploration of both your problem and solution spaces.
- Enables decisions at the most responsible moment.
- Integrates into other agile ways of working.
- Traditional data modelers struggle with agile data modeling strategies at first, particularly when they don't yet have full stack agile database skills.
- Requires ability to safely refactor whatever is being modeled.
- Enables safe evolution of a data source.
- Supports removal of data technical debt.
- Enables evolutionary database development.
- Although possible to implement by hand, you really want to invest in database refactoring tooling.
- Not possible for data sources that do not support executable functionality, stored procedures for example, such as CSV files.
- Requires capability to schedule, and then execute upon, removal of refactoring scaffolding once systems that access the data source are updated.
Automated Database Regression Testing
- Enables inclusion of data sources in automated testing strategy.
- Enables automatic enforcement of data standards and conventions.
- Support executable specification strategy.
- Requires people with test thinking and testing skills.
- Requires an understanding that data sources are enterprise assets AND must be treated as such.
- Traditional data groups are often unwilling at first to include testing as one of their responsibilities.
- Lack of automated tests for existing legacy data sources requires investment to develop them.
Continuous Database Integration (CDI)
- Automates much of the drudgery around the overall build process.
- Increases consistency and predictability of database evolution work.
- Brings database evolution up to common software engineering practice.
- Enables greater visibility into the work and work products of database evolution via automated logging of results.
- Requires investment in automation of the database development infrastructure, potentially including new tooling.
- Requires investment in the creation of automated tests.
- May require training and coaching of data engineers in agile development techniques.
- Enables you to manage versions of databases, and the assets they are comprised of, across environments.
- Enables you to rollback portions of a database to previous versions.
- Enables you to modify work products in parallel and then merge the work.
- Makes it easier to identify the source of defects injected due to the change history maintained about assets.
- Requires database developers to be trained in CM-related skills.
- May require new tooling, in particular the CM system itself.
4. What is the Best Way to Adopt the Agile Database Techniques Stack?
There isn't a single, "best" way. It depends on the context of the situation that
you face. Here are some potential strategies to consider:
- Small, incremental improvements.
Treat the techniques stack like an improvement target, making small changes to work towards it.
You're very likely doing data architecture, database design, data modeling, some database testing,
configuration management of some assets, and may even be improving the implementation of
some of your legacy data sources. So start evolving your current way of working (WoW) to
leverage ideas captured in the various techniques of the stack.
I highly suggest that anyone interested in how to effectively improve via small steps
to take a look at PMI's
Guided Continuous Improvement (GGI).
- Take a top-down approach.
When you adopt a technique, you'll find that you need to adopt at least some
aspects of the technique that it immediately relies on. This continues recursively
until you reach the bottom of the stack. In some ways you'll be adopting
"vertical slices" of the overall techniques stack. What I mean by this is that you'll adopt
just enough of each technique to get some value, then adopt some more of it, and so on.
- Take a bottom-up approach.
This makes sense from a technical point of view, and it's certainly easier, it
tends to be difficult from a management point of view. The techniques towards
the top of the stack tend to have the best short-term payback whereas
the techniques towards the bottom have a longer-term payback. Starting bottom up,
you're effectively starting with the hardest strategies to justify (at least
in organizations struggling to operate with a value-driven mindset rather than
a cost-driven one).
- Take a middle-out approach.
Some people choose to start with the more technically interesting techniques,
particularly database refactoring and automated database regression testing,
often because those are most likely to be new to them and to their organization.
Other people will focus on improving what they're already doing, data modeling and
database design, but adopting more effective agile approaches.
Either way, you will still need to swiftly adopt techniques lower on the stack.
- Adopt everything at once.
This can be chaotic because in effect it's a large change for your data group.
As pointed out above, the top-down strategy tends to quickly evolve into this one
anyway, although perhaps via "vertical slices" of the overall technique stack.
The list above is ordered by my personal preference, which is driven by what I have found to
work in practice. But it really does depend on your situation, one approach does not
work in all situations. There are
no best practices.
5. Can You Adopt Individual Techniques?
Each technique offers value on its own. However, because they build on each other,
as you saw in Figure 2,
and in the previous section, they are more effective when adopted together.
6. Related Resources