This article overviews a Disciplined Agile
approach to data warehouse (DW)
solution development. The focus of this article is on the process itself, as opposed to
specific architecture and design techniques (for those I highly suggest
Data Vault 2). Furthermore,
this topic is clearly worthy of a book containing detailed descriptions of the
techniques and artifacts described below (however, I have included numerous links to
such details if you're willing to explore further on your own).
This article addresses:
Disciplined Agile Data Warehousing
Artifact Creation by Disciplined Agile DW Teams
- Introduction to Disciplined Agile
Many organizations start their agile journey by adopting Scrum. Scrum
describes a good strategy for leading agile software teams but
is only part of what is required to deliver sophisticated solutions to your
stakeholders. Invariably teams need to look to other methods to fill in the
process gaps that Scrum purposely ignores. When looking at other methods
there is considerable overlap and conflicting terminology that can be confusing
to practitioners as well as outside stakeholders. Worse yet people don’t always
know where to look for advice or even know what issues they need to consider.
To address these challenges the
Disciplined Agile (DA),
formerly known as Disciplined Agile Delivery (DAD), process decision framework
provides a more cohesive approach to agile solution delivery. To be more exact,
here’s a definition: “The Disciplined Agile (DA) process decision framework
is a people-first, learning-oriented hybrid agile approach to IT solution delivery.
It has a risk-value delivery lifecycle, is goal-driven, is enterprise aware,
and is scalable.”
You will soon see that the Disciplined Agile framework offers the following benefits
to DW teams:
- DA offers all the benefits of agile. Organizations around the
world have found that agile strategies are more effective in practice than
traditional approaches. Agile teams, on average, enjoy higher levels of quality,
higher stakeholder satisfaction, quicker time to delivery, and higher
levels of productivity.
- The "heavy lifting" around process has been done for you. DA
that extends Scrum with proven strategies from Agile Modeling (AM),
Extreme Programming (XP), Unified Process (UP), Scaled Agile Framework (SAFe),
Kanban, Lean Software Development, Outside In Development (OID) and
several other methods. More importantly it shows how to address key considerations
such as architecture, analysis, design, testing and governance in a coherent, lightweight
- DA addresses the full delivery lifecycle. DA extends the construction-focused lifecycle of Scrum to address the full,
beginning-to-end delivery lifecycle
from project initiation all the way to
delivering the solution to its end users. It also supports lean and continuous
delivery versions of the lifecycle: unlike other agile methods, DA doesn’t prescribe
a single lifecycle because it recognizes that one process size does not fit all.
- DA shows how it all fits together in a flexible, context-sensitive manner.
DA includes advice about the technical practices such as those from
Extreme Programming (XP) as well as the modeling, documentation, and
governance strategies missing from both Scrum and XP. But, instead of the
prescriptive approach seen in other agile methods, including Scrum, the DAD
framework takes a
In doing so DA provides contextual
advice regarding viable alternatives and their trade-offs, enabling you to
tailor DAD to effectively address the situation in which you find yourself.
By describing what works, what doesn’t work, and more importantly why, DA
helps you to increase your chance of adopting strategies that will work for you
- DA considers the bigger picture. Your solutions must
work within your organizational eco-system and more importantly leverage your
existing infrastructure. They must be architected to stand the test of time.
The Disciplined Agile (DA) framework suggests a robust set of
roles for agile solution delivery.
These roles are organized into two categories:
- Primary roles. These roles - Team Lead, Product Owner,
Architecture Owner, Team Member and Stakeholder - are commonly found on delivery
teams regardless of the level of scale faced by the team.
- Secondary roles. These roles - Domain Expert, Technical Expert,
Specialist, Independent Tester, and Integrator -
are filled, often on a temporary basis, to address scaling issues.
There are several important observations about these roles:
For more details about disciplined agile roles, please read
Roles on Disciplined Agile Teams and
Disciplined Agile Roles at Scale.
- These roles are different from traditional roles. Moving
to an agile, and better yet disciplined agile, way of working requires a
paradigm shift. Part of that paradigm shift is an improved set of roles
and responsibilities held by people on agile teams. As a professional
you need to be prepared to work in an agile manner to fit into an agile team.
- The era of the specialist is over. Key tenets of agile
development are that people work together collaboratively to produce a working
solution in an evolutionary (iterative and incremental) manner. An implication
of that is that we're no longer able to work in a manner where people with
narrow specialties - such as logical data modeler, physical data modeler, data analyst,
and data architect - each do their small part of the work and then hand it off to the
next person. That traditional approach is too slow, expensive, and error prone.
Instead we build teams of "T-skilled"
who have one or more specialties (such as the ones listed above) PLUS a broader
set of skills and knowledge that allow them to take on a wider range of tasks
and work more effectively with their team mates.
- There is room for you if you're willing to learn. Although
these roles are different than what you may be used to it is still possible, and
highly desirable, for you to transition into one of these roles.
- The need for specialists still exist. There is a very
small range of situations where specialists are still needed. Having said that,
it is highly unlikely that you're in one of those situations. If you are new to
agile, your best strategy is to assume that you are not in one of those situations
and that you need to start working on becoming a generalizing specialist (not a
generalist!) just like the vast majority of poeple.
is one of the key aspects of the Disciplined Agile (DA) framework. The observation is that DA teams work within your organization’s enterprise ecosystem, as do all other teams. There are often existing systems currently in production and minimally your solution shouldn’t impact them. Better yet your solution will hopefully leverage existing functionality and data available in production. You will often have other teams working in parallel to your team, and you may wish to take advantage of a portion of what they’re doing and vice versa. Your organization may be working towards business or technical visions which your team should contribute to. A governance strategy exists which hopefully enhances what your team is doing.
It is important that data warehousing teams work in an enterprise aware manner for several reasons:
For more details, please read
- You should adopt your organization's development and data conventions.
The implication is that your delivery team will need to work closely with your
and data management teams who are typically responsible for such conventions.
- Your solution should leverage existing infrastructure wherever possible.
You will want to work with your organization's enterprise architecture team to understand
the technical direction of your company and with your reuse engineering team (if you have one)
to identify and leverage existing assets.
- You should fix existing legacy systems and data sources whenever possible.
This is called paying down
in the agile community.
- You should share learnings whenever possible.
- You should be governed appropriately. Like it or not, your
team is being governed. Effective IT organizations recognize that agile teams
need to be governed in an agile, not a traditional, manner. The Disciplined Agile
baked right into it.
One of the key aspects of the Disciplined Agile (DA) process decision framework
is that it promotes a full, beginning-to-end, solution delivery lifecycle.
The framework calls supports several lifecycles, two of which I summarize here,
where you incrementally build a consumable solution over time.
Figure 1 depicts a Scrum-based version of the delivery
lifecycle. This lifecycle deviates from the common Scrum lifecycle in several ways.
First, it depicts a three-phase delivery lifecycle, not just a single-phase construction
lifecycle. Second, it replaces the Scrum marketing language with more instructive
language. Third, it shows external inputs coming into the team from other areas of
Figure 1. A Scrum-Based Delivery Lifecycle.
The Scrum-based lifecycle above explicitly calls out three phases:
Teams that are working on the first release of a data warehouse are likely to follow
the Scrum-based lifecycle. Because they are working on the first release they will need to
invest time in basic initiation efforts discussed earlier. Furthermore, because they
are likely new to agile they will find the Scrum lifecycle to be easier to adopt
as it prescribes the timing of common practices (such as planning, demos, and
retrospectives) and forces the team into delivering on a regular basis (there should
be more working stuff that could potentially be deployed to stakeholders at the
end of each iteration/sprint). A detailed description of the type of activities that
occur during each phase appears later in this article.
- Inception. During this phase project/team initiation
activities occur. This includes initial scoping, initial architectural modeling,
high-level release planning, putting the team together, starting into your risk management
approach, setting up your work environment, and securing funding for the rest
of the release. Although “phase” tends to be a swear word within the agile
community, the reality is that the vast majority of teams do some up front work
at the beginning of a project. While some people will mistakenly refer to this
effort as Sprint/Iteration 0 it is easy to observe that on average this effort
takes longer than a single iteration. During the Inception phase we do some
very lightweight envisioning activities to properly frame the project. It
takes discipline to keep Inception short.
- Construction. During this phase a disciplined agle team will
produce a potentially consumable solution on an incremental basis.
They may do so via a set of iterations (Sprints in Scrum parlance) or do so
via a lean, continuous flow approach (see Figure 2.
The team applies a hybrid of practices from Scrum, XP, Agile Modeling,
Agile Data, and other methods to deliver the solution. More on this later.
- Transition. The DA framework recognizes that for
sophisticated enterprise agile teams deploying the solution to their
stakeholders is often a complex exercise. DA teams, as well as the enterprise
overall, will streamline their deployment processes so that over time
this phase becomes shorter and ideally disappears as the result of adopting
continuous deployment strategies. It
takes discipline to evolve Transition from a phase to an activity.
Figure 2 depicts Disciplined Agile's Lean Continuous Delivery
lifecycle. This lifecycle is often followed by disciplined agile teams who are
evolving an existing data warehouse that is already running in production. The requirements
for such solutions often come in continously from stakeholders instead of a large
batch. These requirements are often small in nature, typically adding a new data field,
updating an existing report or data download, or adding a new report/download. Furthermore
these requirements are self-contained and can be added easily to a well-architected
DW/BI solution. Many of these requirements can be implemented in a few hours or days,
so it often makes sense to do the work right away and release the new functionality
as soon as you can. The temporal overhead of iterations/sprints, let alone regularly
scheduled releases, of the Scrum-based lifecycle often doesn't make sense for evolving
an existing production DW thus the need for a different approach.
Figure 2. A Lean Continuous Delivery Lifecycle.
The Lean Continuous Delivery lifecycle of Figure 2
varies from the Scrum-based lifecycle of Figure 1
in several important ways:
For more details about both of these lifecycles, and others supported by the
DA framework, please read
Full Delivery Lifecycles.
- It supports a continuous flow of delivery. In this lifecycle
the solution is deployed as often, and whenever, it makes sense to do so.
Work is pulled into the team when there is capacity to do it, not on the
regular heartbeat of an iteration. This enables your disciplined agile DW team
to be more responsive to stakholders.
- Practices are on their own cadences. With iterations/sprints
many practices (detailed planning, retrospectives, demos, detailed modeling, and so on)
are effectively put on the same cadence, that of the iteration. With a lean approach
the observation is that you should do something when it makes sense to do it,
not when the calendar indicates that you’re scheduled to do it. This enables you
to streamline your activities more, BUT requires greater discipline.
- It has a work item pool. All work items are not created equal.
Although you may choose to prioritize some work in the “standard” manner,
either a value-driven approach as Scrum suggests or a risk-value driven approach
as DA suggests, but other work may not fit this strategy. Some work, particularly
that resulting from legislation, is date driven. Some work must be expedited,
such as fixing a severity one production problem. So, a work item pool and not a
prioritized stack (as in the Scrum-based lifecycle) makes a bit more sense when
you recognize these realities.
The Disciplined Agile (DA) framework takes a
goal driven approach.
The purpose of this goal-driven approach is that it guides people through
the process-related decisions that they need to make to tailor and scale
agile strategies to address the context of the situation that they face.
This is important because every team faces a unique situation. Teams vary in
size, they vary in the way that they are geographically or organizationally
distributed, they vary by the domain and technical complexity that they face,
and they vary by the compliancy issues that are relevant to them. Furthermore,
teams are made up of unique individuals, each of whom has a set of unique
skills and experiences. In short, because each agile team finds itself
in a unique situation the team must find a way to effectively tailor the
way that they work to best face that situation. DA’s goal driven strategy
is a light-weight approach to providing advice for such process tailoring.
Figure 3 summarizes the delivery-oriented process
goals of the Disciplined Agile process decision framework via a mind map.
There are twenty-two goals in total, each of which is described by a
Process Goal Diagram (effectively a decision tree). A disciplined agile team
will consider how to address each goal in a manner that reflects the situation
that they face. Sometimes a goal will be very easy to address, for example
an established development team may find that they need to do nothing else
to fulfill the Form Initial Team goal. A team that is building a solution
via an architecture they are very familiar with will have very little work
to do to fulfill the Prove Architecture Early goal whereas a team using
technologies that are new to them may have a fair bit of work to do.
Different situations require different approaches.
Figure 3. A Mind Map Summarizing the Delivery Goals.
Each of the three delivery phases (Inception, Construction, and Transition)
are described by specific goals. Some goals, such as Grow Team Members and
Address Risk, are applicable throughout the entire lifecycle.
For more details, including all 22 process goal diagrams, please read
Disciplined Agile Process Goals.
If you think that software development is hard, running the entire IT department
is even harder. IT departments are complex adaptive organizations. What we mean
by that is that the actions of one team will affect the actions of another team,
and so on and so on. For example, the way that your agile delivery team works
will have an effect on, and be affected by, any other team that you interact with.
If you’re working with your operations teams, perhaps as part of your overall
DevOps strategy, then each of those teams will need to adapt the way they work to
collaborate effectively with one another. Each team will hopefully learn from the
other and improve the way that they work. These improvements with ripple out to
other teams. The challenge is that every area within IT has one or more bodies of
knowledge, and in some cases published “books of knowledge”, that provide guidance
for people working in those areas. For example management has the PMIBoK and
Prince 2, enterprise architects have TOGAF and the Zachman Framework,
business analysts have the IIBA BoK, data managers have the DAMA BoK, and so on.
These industry groups and their corresponding bodies of knowledge contradict one
another, they are at different points on the agile/lean learning curve, and
sometimes they promote very non-agile/lean strategies. At the IT level this can be
very confusing, resulting in dysfunction. The DA framework shows how this all
fits together in a flexible manner that supports the realities faced in complex
Figure 4 depicts the high-level workflow of a Disciplined Agile IT
department. Your disciplined agile DW team will be affected, hopefully in a
positive manner, by other teams running in parallel to you. Your team will work closely
with the Enterprise Architects, your Architecture Owner may even be an EA, to understand
their long-term vision. You will work with the Data Management group to understand and
access existing legacy data sources. You'll work with your Release Management and Operations
teams to release your solution into production. Your Support/Help Desk team is likely
providing enhancement requests and bug reports to your team. You may have dependencies
on other delivery teams.
Figure 4. The Workflow of a Disciplined Agile IT Department.
Not only must we work in an enterprise aware manner, these
other teams need to be prepared to work in a more agile manner when interacting with
our DW team. It will be very difficult for our DW team to work in an agile manner if they
rely on other teams who aren't prepared, or worse yet sufficiently skilled, to work
in an agile manner too. Having said that, the entire EA or DM team doesn't need to be
agile, but at least enough people on those teams are so that they can support the agile
delivery teams appropriately.
For more details about this IT-centric view, please read
The Disciplined Agile Framework.
Let's explore how a disciplined agile DW team works in practice.
Figure 5 summarizes the primary and secondary development
activities that are potentially performed by the team. Primary activities are ones
that add direct value to the development effort. Secondary activities tend to
focus on long-term documentation that may add value in the future, but the
value proposition tends to be dubious in practice so you want to be very careful
in how much effort you invest in them. I tend to think of these as sideline activities
that I would only do if the team has time to spare from primary activities.
Figure 5. Activities on a Disciplined Agile DW Team.
For now we'll assume that your team is working on the first release of
a DW solution. As a result you will need to take a three-phase approach. For teams working
with an existing DW solution that is already running in production, you may find that you do
not need to work through Inception (or at least you only need an abbreviated version of it).
During the Inception phase the team strives to perform just enough work to get
going in the right direction. Disciplined teams will spend a few days or perhaps
a week or so to do so, not several weeks or months. The key is that they work in
a very light-weight manner. The process goals that a disciplined agile team will
fulfill during Inception, include:
Form Initial Team
Develop Common Vision
Align with Enterprise Direction
Explore Initial Scope
Identify Initial Technical Strategy
Develop Initial Release Plan
Form Work Environment
The purpose of this article is not to address every single aspect of the goals listed above,
that's why Mark Lines and I wrote the book
Disciplined Agile Delivery,
but instead it is to focus on critical activities that support a disciplined agile appproach
to DW/BI. Potential primary activities on a disciplined agile DW/BI team include:
- Initial usage modeling. Disciplined agile DW teams take a usage-driven,
not a data-driven approach to modeling. Understanding the data is still important,
don`t get me wrong, it is just that it isn`t anywhere near as important as
understanding how the data will be used. The most common strategy used by agile
teams to explore usage is to write
and epics (which are large user stories). Examples of user stories that would support
the development of a DW/BI solution for a retail bank include "As a Branch Manager
I need to analyze the portfolio of a customer so that we can target services to them",
"As a Branch Manager I need to explore the transactions occurring in my branch so that I
better understand my customer needs", or "As a Mortgage Officer I need to explore the risk
profile of a potential mortgage holder so that I can decide how much we can loan that person."
Notice how all of these requirements focused on usage, not data details. The details can
- Initial conceptual modeling. An important, supporting model to
the usage model is a
high-level conceptual model,
sometimes called a domain model. This diagram should indicate the main entity
types within the domain and the relationships between them. It does not need
to indicate potential data elements, nor does it need to be perfect, it just needs
to be reasonably close at this time. The goal is to gain a reasonable understanding
of domain terms at this time, the details will emerge later during construction. For
the banking application the domain model may indicate entity types such as Customer,
Account, Mortgage, Loan, Branch, Portfolio, and perhaps another twenty or so entity types.
- Identification of potential data sources. Early in a DW/BI project
the team needs to identify the main (potential) sources of data. This information is often
captured on a
or something similar. This diagram will overview the flow of information within
your technical architecture, indicating the potential data sources, how information is
obtained from those sources (see next activity), and how the data flows through your firewalls,
staging areas (if any), your data warehouse(s), and any data mart(s).
- High-level data source analysis. During Inception you will want to
obtain basic information about potential data sources as who the primary contacts are,
what type of data it contains, how is that data accessed (e.g. via file transfers, via
SQL queries, via web services, and so on), and sizing information (e.g. volume of data and rate
of change). The goal right now is to gain sufficient understanding of the data sources so
you can make intelligent architecture decisions about them.
- Initial architectural modeling. Early in a disciplined agile DW/BI project
you want to identify a viable architectural strategy. Part of that strategy will be identifying
potential data sources; part will be identifying how data will flow from the data sources to
the target data warehouse(s) or data marts; and part will making that flow work through combinations
of data extraction, data transformation, and data loading capabilities. The layout of
the technical architecture is often captured using a network diagram, discussed
earlier. Architectural notes, in particular important technical decisions as well as
good things to know (such as the data source information described earlier), are often
captured in an Architectural Handbook. This handbook is often implemented as a collection
of wiki pages so that anyone who is interested may have access to the information.
- Initial release planning. Disciplined agile teams, including those
focused on a DW/BI solution, will perform a bit of high-level release planning. Teams are
often required to guesstimate the potential cost of the release they are working on as well as the
potential delivery date. These guesstimates, or estimates if you prefer, are best presented
as ranges so as to reflect the uncertainty of the information the guesstimate is based
upon. For further information about initial release planning, please see the DA process goal
Develop Initial Release Plan.
- Adopt common guidelines. Disciplined agile teams are
an aspect of which means that they strive to adopt and then follow common
guidelines. These guidelines include data guidelines, security standards, coding
guidelines, user interface guidelines, and many others.
- Initate a data testing strategy. Testing is so important on
disciplined agile teams that we do it all the way through the lifecycle, not just
during some phase at the end of the lifecycle. This includes the testing of
all functionality, including any functionality pertaining to data-oriented issues, and
around the testing of data itself. There are many things that can be tested pertaining
to databases, see
Database Testing: How to Regression Test a Relational Database for some
thoughts on the subject. Your data testing strategy should address issues such as
how to test extract-transform-load (ETL) logic, how to validate data sources, how to ensure
the quality of the data in the data warehouse(s) and data mart(s), what tools will be used,
and identification of who has the skills to do the work. This may be the greatest challenge
faced by traditional data professionals as they transition to disciplined agile ways of
working - not only do few traditional data professionals have data testing skills the vast
majority don't even realize how critical those skills actually are.
Potential secondary activities include:
- Detailed data modeling (partial). You may need to start doing some data modeling, both
logical data model (LDM) and physical data model (PDM) development, during Inception.
Your LDM, if you create it at all, is typically used for detailed data analysis,
an activity which occurs during Construction. Similarly your PDM(s) are used to design
the database schema of your DW and data mart(s), also work that is typically done
during Construction. Having said that, during Inception you may choose to do detailed
of high-priority requirements, and the design to support them, that you intend to implement in the first iteration or
two of Construction. As a result you MIGHT do some data modeling work, see below for a more detailed
discussion of what that might entail.
- Source-to-target data mapping (partial). Part of your look-ahead modeling
effort, sometimes called "backlog refinement" by Scrum practitioners, will be to do
just enough data mapping to implement the first few stories in your backlog. You need
to know where the data is coming from to implement just these stories. Yes, any given
data source may have hundreds of data elements that your team may potentially be interested
in at some point, but for now you just need to map the handful of data elements required
to implement the first few stories. That's it. Future data mapping, if at all, will
be performed in an evolutionary manner throughout construction.
- Detailed data source analysis (partial). Similarly, you will do just enough
analysis of your data sources to understand just the data elements required for the first few
During the Construction phase the team strives to fulfill several process goals:
All of the above process goals are applicable to disciplined agile DW teams. To
address these process goals, you are likely to adopt a collection of activities as
we saw in Figure 5 above. Potential primary activities include:
- Development of vertical, fully functional slices. Each iteration
the DW/BI team will produce a solution that is consumable, something that could be
potentially shipped into production that people want to actually use. This means that
you will analyse, design, implement, and test that functionality during the iteration (and most
disciplined agile teams work in iterations that are two weeks in length or less). This is why
it is so important to take a usage-driven approach and not a data-driven approach - your
team needs to be always working on some new functionality that adds real value to your
stakeholders. In a given iteration you do the work to completely implement
one or more reports, or perhaps a portion of a report or an enhancement to an existing report,
in a single iteration. You will do the work to extract the data from the data source(s),
transform/clean it, and load it into the DW. This can be tough initially because you will not
have the infrastructure in place yet during the first few Construction iterations. For example,
the first time you extract data from a data source you'll need to do a lot of the work
to access that data source.
- Prove the architecture with working code. There are always technical
risks on DW projects. Maybe the technologies that you're working with are new to your
organization. Maybe several data sources are difficult to work with, either because the
owners of the data sources are difficult, because there are quality issues with
the data, because there are architectural differences between the data sources (e.g some
are real time and some are batch systems), or perhaps because there are volume challenges
(i.e. "big data"). Disciplined agile teams remove these sorts of
risks by implementing functional requirements that touch on the risks right away. Worried
about accessing data from a batch system? Start by writing a report that needs data from it.
Worried about your whether your ETL tool is going to work well? Implement one or more
requirements that require the key features of that tool. Worried about whether you'll
be able to handle the big data load? Implement a requirement needs that data. Many times
traditional teams, and even not-so-disciplined agile teams, will put off hard aspects of
their architecture to the end of the lifecycle, thereby increasing the potential costs of
fixing any problems that they do run into. Disciplined agile teams prefer to address their
risks as early as they can when they have the most time and resources to respond to the
- Detailed data source analysis. Data source analysis occurs on a
just-in-time (JIT), or near-JIT basis. Your team will do the analysis required to implement the
current requirements (let's assume they're captured as user stories). So, if a story requires
data from three data sources then you will do the analysis of for those data elements from those
data sources. Of course, you're likely doing the analysis for several stories at a time,
perhaps five or six stories. Furthermore, disciplined teams will be doing
described earlier, where you are doing the analysis for stories that are coming up
in the next iteration or two. The basic strategy is to have just enough data source
analysis done before you go to implement the actual functionality. You are likely to do
more data source analysis in earlier Construction iterations as opposed to later iterations -
as you populate your DW, the data required for new reports or queries is more likely to be
there over time. Traditional teams have a tendency to do comprehensive data source analysis
one data source at a time, followed by the implementation work needed to obtain and then load
the data into the DW. This appears efficient from the point of view of the person(s) doing
the work, but proves to be rather inefficient in practice from the point of view of your
stakeholders for two reasons. First, it takes much longer to get to the point where you have
sufficient data in place to implement the reports, or to support answering their questions, that
they actually want. In other words, you have a very high cost of delay (also referred to as
opportunity cost). Second, you end up analyzing (and then implementing) data elements that
aren't actually needed.
- Implementation of source-to-target data mapping. The implementation work
to extract the data from source, transform the data as required, and then load it into your
target database(s) will be done in a JIT, evolutionary manner. Each iteration your team will
do the work to implement one or more stories from end-to-end (e.g. vertical slices through your
solution) and part of this work is the implementation of the source-to-target data mappings.
- Database refactoring. A refactoring is a simple change to your design that
improves its quality without changing its semantics in a practical manner. A
is a simple change to a database schema that improves the quality of its design OR improves the
quality of the data that it contains. Database refactoring enables you to safely and easily
evolve database schemas, including production database schemas, over time. This technique, in
combination with database regression testing and continuous integration, allow us to develop
data warehouses, or any solution involving a database for that matter, in an agile manner.
- Physical data modeling. The physical data model(PDM)s describing your
databases, including both source and target databases, will evolve throughout Construction.
Please see the article
Agile/Evolutionary Data Modeling
for a detailed description for how to go about agile data modeling.
- Regression testing. Quality is paramount for disciplined agile teams. Disciplined
agile teams will develop, in an evolutionary manner of course, a regression test suite that validates
their work. They will run this test suite many times a day so as to detect any problems as early
as possible so that they can address them as cheaply as possible (remember,
the average cost of fixing a defect rises exponentially
the longer it takes you to find it). In fact, very disciplined teams will take a
test-driven development (TDD) approach
where they write tests before they do the work to implement the functionality that the tests
validate. As a result the tests do double duty - they validate and they specify (which is one
of many reasons why disciplined agile teams require far less documentation than traditional teams,
their specifications are in effect executable as opposed to static).
Database Testing: How to Regression Test a Relational Database for a more detailed
description of this strategy.
- Continuous integration (CI). CI is a technique where you automatically
build and test your system every time someone checks in a code change. Disciplined agile developers
will typically update a few lines of code, or make a small change to a configuration file, or make a small change
to a PDM and then check their work into their configuration management tool. The CI tool monitors
this, and when it detects a check in it automatically kicks off the build and regression test suite
in the background. This provides very quick feedback to team members, enabling them to detect issues
- Continuous deployment (CD). When an integration build is successful (it compiles and
passes all tests) your CD tool will automatically deploy to the next more appropriate environment(s).
For example, if the build runs successfully on a developer's work station their changes are propogated
automatically into the team integration environment (which automatically invokes CI in that space). When
the build is successful in your team integration environment perhaps it's promoted into an integration
testing environment, and so on.
- Continuous documentation. Disciplined agile is solution focused, not just
software focused. Because documentation is part of the overall solution that you deliver, you
should be develop key documentation (system overview documentation, help guidelines, and so on)
as you develop the software. For more information, see the Agile Modeling practice
- Detailed planning. Planning is so important on disciplined
agile teams that we do it all the way through Construction. Detailed planning occurs
at the beginning of each iteration (for teams following the Scrum-based lifecycle) or
in an as-needed, just in time (JIT) manner (for teams following the lean/continuous delivery
lifecycle). Team members may also choose to engage in look-ahead planning to begin
thinking through the next iteration or two in complex situations.
- Coordination meetings. The team needs to coordinate their work both
internally within the team and externally with other teams. The Disciplined Agile framework
process goal that captures a range of options for doing so.
- Demos. Disciplined agile teams demonstrate their work on a regular basis,
typically at the end of each iteration. This demo, typically run by your product owner, shows
off the work that your team has accomplished since the last demo. Because you are taking a usage-driven
approach, each iteration you should have added more functionality that provides real value to
your stakeholders. For example, your demo might walk through a new report that your team built
that iteration, show how a calculation was updated on an existing report, and show how there are
seven new data columns available to people doing ad-hoc reporting.
- Retrospectives. One of the principles behind the Agile Manifesto, and the
Disciplined Agile Manifesto
which extends the original manifesto to address enterprise-class issues, is that teams should
regularly reflect on what they're doing and strive to learn and improve their approach.
are a simple technique for doing exactly that.
Potential secondary activities include:
- Logical data modeling. In practice, logical data modeling tends to add
very little value to the overall development effort (other than the "value" of keeping logical
data modelers employed of course). If you do decide to invest time in an LDM, keep it as
light-weight as possible and DO NOT allow this effort to slow down development. If you think
that your LDM can offer actual value to your organization, and in the traditional world that's
possible, then ask yourself how you can add the same value using tests. I'll write up a more
detailed article on this at some point in the future. For now my best advice is to be very
leary about LDMing.
- Documentation of source-to-data mapping. You will likely find that you want
to document your mappings. Once again, you will find it more effective to capture these mappings
in the form of tests (assuming you have the skills to do so) rather than static documentation. If
you find that you need to resort to documentation, remain as agile as possible and keep the documentation
light-weight. Please see
Agile/Lean Documentation Strategies
for more ideas about how to keep your documentation light-weight and sufficient.
- Meta-data documentation. Once again, follow agile documentation strategies for
capturing any meta-data information.
During the Transition phase the team strives to
ensure that the solution is consumable,
and when it is
deploy the solution.
To address these process goals, you are likely to adopt a collection of activities as
we saw in Figure 5 above. Potential primary activities include:
- End-of-lifecycle testing. Some testing may slip into the Transition
phase. Ideally all testing should occur during Construction, other than one last run of your
regression test suite to ensure you're ready to ship. But it isn't always an ideal world. See
for more details.
- Last-minute fixes. If you perform end-of-lifecycle testing there is always
the risk that the testing effort finds some bugs. Your Product Owner may decide that these bugs
need to be fixed before you're ready to ship.
- Finalize deliverable documentation. Some teams will let documentation slip,
or at least some documentation slip, to the end of the lifecycle. This is a practice called
although I prefer a continuous documentation approach described earlier.
- Deploy database schema changes. Part of your overall deployment efforts
will be to deploy database schema changes. If you've been taking a database refactoring approach
then this will be very straightforward as your change scripts will already be running and fully
testing. Before making the schema changes you should consider creating a backup of the database.
You may find
The Process of Database Refactoring: Install Into Production,
to be an interesting read.
- Migrate production data to new schema. TBD.
The only potential secondary activity is to finalize any secondary documentation,
such as your logical data model (LDM) or your meta-data documentation, that you believe
will add real value in the future. The risks of investing too much effort on these sorts of
activities have been discussed earlier.
Figure 6 compares the typical level of expended effort creating
artifacts on traditional and agile teams. There are several interesting differences to
between the approaches. Disciplined agile DW teams will:
- Create a high-level conceptual model early. A high-level conceptual
model, or more accurately diagam at this point, identifies the critical business entity
types and the relationships between them. This provides vital insight into the domain
while helping to capture key domain terminology, thus helping to drive consistency of
wording in other artifacts (such as user stories and epics). Traditional teams will often make
the mistake of over documenting the conceptual model early in the lifecycle, injecting
delay into the team (with the corresponding opportunity cost of doing so) as well as the
risk of making important decisions when you and your stakeholders have the least knowledge
of the actual end goal.
- Evolve a light-weight logical data model (LDM) over time. If your agile DW team
does this at all they will keep their LDM very light weight. Traditional teams will often
invest heavily in their LDMs as they believe it is a mechanism to ensure quality and
consistency through specifying it in. This often proves to be wishful thinking in practice.
Disciplined agile teams instead invest their efforts in creating an executable specification
in the form of regression tests (more on this below).
- Evolve a detailed physical data model (PDM) over time. Disciplined agile
teams realize that a PDM, when created via a data modeling tool with full round-trip
engineering (it generates schemas as well as imports existing schemas), effectively becomes
the source code for the database. As the requirements evolve the team will evolve the PDM
to reflect these new needs, generating schema changes as needed. They can work this way because
they are able to easily refactor and regression test their database. This is different from
the traditional approach where they often perform detailed modeling up front. This is
motivated by the mistaken belief that production database schemas are difficult to evolve,
something that agilists know not to be true.
- Develop a comprehensive regression test suite over time.
These tests address several important issues. First, they validate the work of the team,
showing that their work to date fulfills the requirements as they've been described to
the team. Second, a regression test suite enables the team to safely evolve their work.
Agile developers can make a small change, rerun their regression tests, and see whether they
broke something (if so, then they either rollback their change or they fix what they broke).
Third, when a test is written before the corresponding database schema or database code is
developed, the test effectively becomes a detailed executable specification. Sophisticated
agile DW teams will capture the kinds of information that were previously captured in LDMs
in executable tests, and are thus much more likely to have
consistent schemas than teams that still rely on static LDMs.
- Capture critical meta-data over time. Because the rest of your organization
may not be completely agile there is often a need to continue to capture key meta data about
data sources. This meta data should be kept as light as possible. If there isn't a definite
need for it then don't capture it. If someone says "but we might need it someday" then wait
until someday and invest in capturing the information at that point. Furthermore, instead
of capturing meta data in a static manner (i.e. as documentation) try to identify ways to
capture it as tests, or to generate it automatically from other information sources. Any
documentation that you write today needs to be maintained over time, slowing you down.
Figure 6. Comparing Artifact Creation by DW Teams.
For a better understanding of why traditional DW teams are likely to write too much
documentation far too early in the lifecycle, you should read the article
The Cultural Impedance Mismatch.