Modern software development processes, including PMI's
Disciplined Agile Delivery (DAD), Extreme Programming (XP), and
Scrum are all evolutionary
if not
agile
in nature. The implication is that if data professionals are going to be
effective members of such teams, then they need to adopt tools and
techniques which enable them to do so. There is nothing special
about the data aspects of an IT system: they can be developed in an
evolutionary manner just like non-data aspects. Even
data warehouse and business intelligence initiatives
can be developed in an evolutionary manner (and quite frankly this is
the preferred approach). This article overviews a
collection of core practices for agile/evolutionary database
development.
These core practices for evolutionary/agile database
development are:
- Vertical slicing.
A fundamental agile development technique is to slice functionality vertically into small,
consumable pieces that may be potentially deployed into production quickly.
These vertical slices are completely implemented - the analysis, design,
programming, and testing are complete - and offer real business value to stakeholders.
Vertical slicing
is completely applicable, and highly desirable, in data development.
- Clean data architecture and clean data design. A
clean data architecture
strategy enables you to develop and evolve your data assets at a pace which safely and effectively
supports your organization - in short, to be agile. Similarly, a
clean data design
enables you to evolve specific data assets in an agile manner.
- Agile data modeling.
With an evolutionary approach to data modeling you model the data
aspects of a system iteratively and incrementally. With an
agile data modeling
approach you do so in a highly collaborative and streamlined manner.
- Database refactoring. A
database refactoring
is a small change to your database schema which improves its
design without changing its semantics (e.g. you don't add anything nor do
you break anything). The process of database refactoring is the
evolutionary improvement of your database schema so as to improve your
ability to support the new needs of your customers.
- Automated database regression testing. You should ensure that
your database schema actually meets the requirements for it, and the best way to
do that is via testing. With a
test driven development
(TDD) approach you write a unit test before you write production
database schema code, the end result being that you have an
automated regression test for your database schema.
to ensure data quality.
- Continuous database integration (CDI).
Continuous integration (CI) is the automatic invocation of the build process of a system.
As the name implies,
continuous database integration (CDI)
is the database version of CI.
- Configuration management.
Your data models, database tests, test data, and so on are important
artifacts which should be put under
configuration management
just like any other artifact.
- Developer sandboxes. Developers need their own
working environments, called
sandboxes, where they can modify the portion of the system which
they are building and get it working before they integrate their work with
that of their teammates.
- Data normalization.
Data
normalization is a process in which data attributes within a
data model are organized to increase the cohesion of entity types.
In other words, the goal of data normalization is to reduce and even
eliminate data redundancy, an important consideration for application developers because
it is incredibly difficult to stores objects in a relational database that
maintains the same information in several places.
- Set a realistic primary key strategy. The
fact is that sometimes it makes sense to use
natural keys and
sometimes surrogate keys. As a professional you need to understand
when to apply each strategy, and to be prepared to
refactor
if you discover that you've made the wrong choice.
- Database encapsulation. A
database encapsulation layer hides the
implementation details of your database(s), including their physical schemas,
from your business code. In effect
it provides your business objects with persistence services - the ability to
read data from, write data to, and delete data from - data sources.
Ideally your business objects should know nothing about how they are
persisted, it just happens. Database
encapsulation layers aren't magic and they aren't academic theories;
database encapsulation layers are commonly used practice by both large and small
applications as well as in both simple and complex applications.
Database encapsulation layers are an important technique that every agile
software developer should be aware of and be prepared to use.
- Train developers in basic data skills.
This enables developers to both improve their data-oriented work and to
interact with data professionals more effectively. Fundamental skills
include:
relational database fundamentals,
data modeling,
mapping objects
to RDBs (O/R mapping),
working with legacy data,
XML,
referential
integrity and shared business logic,
how to
retrieve objects from an RDB,
how to implement
reports, security
access control, and
transactions
and concurrency control.
- Train data engineers in basic development skills.
Similarly, data engineers need to gain an understanding of application development so
that they can play active roles on the team(s) which they support.
- Common development guidelines. Having a
common, usable set of development standards which are easy to understand and
to comply to can greatly improve the quality of the systems that you
develop. These guidelines may include, but not be limited to,
programming
guidelines, modeling style
guidelines, data naming conventions, and user interface conventions
(including report design conventions).
-
Lean data governance. The goal of
data
governance is to ensure the quality, availability, integrity, security,
and usability within an organization, and the goal of
agile/lean
data governance is to enable development teams to do these things
effectively within your overall IT ecology.
Many traditional approaches to data governance seem to
struggle in practice, I suspect in part because of the
cultural impedance mismatch but also in part because traditional IT
governance struggles in general. The command and control approach typical of
traditional governance strategies is a lot like herding cats, you do a lot
of work but nothing much gets accomplished in the long run.
Lean governance, on the other hand, is focused on enabling people and
motivating them to do the right things. A lean data governance
approach promotes a healthy, collaborative
relationship between data professionals and the teams that they're
supporting.
- Agile master data management (MDM). If
you're going to adopt an
MDM
strategy within your organization, it should at least be an
agile one. Many organizations struggle when it comes to MDM, typically
because they adopt a traditional, command-and-control strategy. Your
MDM efforts can in fact be very agile and streamlined if you choose to.
You may have noticed that I haven't used the term
"best practice." That's because I
don't believe the the concept. Yes, "best practice" is a wonderful marketing term. But my experience,
after looking for "best practices" for over 30 years now, is that there aren't any. Any given practice
has advantages and disadvantages. Every given practice works well in some situations, and may even be
the "best" that you can do in those situations, but doesn't work well in others. So practices are
contextual in nature and should be presented as such. To present something as a "best practice" is
deceptive in my opinion.