Agile Data

Disciplined Agile Data Warehousing

Follow @scottwambler on Twitter!

This article overviews a Disciplined Agile approach to data warehouse (DW) solution development. The focus of this article is on the process itself, as opposed to specific architecture and design techniques (for those I highly suggest Data Vault 2). Furthermore, this topic is clearly worthy of a book containing detailed descriptions of the techniques and artifacts described below (however, I have included numerous links to such details if you're willing to explore further on your own). This article addresses:
  1. Introduction to Disciplined Agile
  2. Disciplined Agile Data Warehousing
  3. Artifact Creation by Disciplined Agile DW Teams
  4. Rekated Resources

1. Introduction to Disciplined Agile

Many organizations start their agile journey by adopting Scrum. Scrum describes a good strategy for leading agile software teams but is only part of what is required to deliver sophisticated solutions to your stakeholders. Invariably teams need to look to other methods to fill in the process gaps that Scrum purposely ignores. When looking at other methods there is considerable overlap and conflicting terminology that can be confusing to practitioners as well as outside stakeholders. Worse yet people don’t always know where to look for advice or even know what issues they need to consider.

To address these challenges the Disciplined Agile (DA), formerly known as Disciplined Agile Delivery (DAD), process decision framework provides a more cohesive approach to agile solution delivery. To be more exact, here’s a definition: “The Disciplined Agile (DA) process decision framework is a people-first, learning-oriented hybrid agile approach to IT solution delivery. It has a risk-value delivery lifecycle, is goal-driven, is enterprise aware, and is scalable.”

1.1 Why Take an Disciplined Agile Approach?

You will soon see that the Disciplined Agile framework offers the following benefits to DW teams:

  1. DA offers all the benefits of agile. Organizations around the world have found that agile strategies are more effective in practice than traditional approaches. Agile teams, on average, enjoy higher levels of quality, higher stakeholder satisfaction, quicker time to delivery, and higher levels of productivity.
  2. The "heavy lifting" around process has been done for you. DA is a hybrid approach that extends Scrum with proven strategies from Agile Modeling (AM), Extreme Programming (XP), Unified Process (UP), Scaled Agile Framework (SAFe), Kanban, Lean Software Development, Outside In Development (OID) and several other methods. More importantly it shows how to address key considerations such as architecture, analysis, design, testing and governance in a coherent, lightweight manner.
  3. DA addresses the full delivery lifecycle. DA extends the construction-focused lifecycle of Scrum to address the full, beginning-to-end delivery lifecycle from project initiation all the way to delivering the solution to its end users. It also supports lean and continuous delivery versions of the lifecycle: unlike other agile methods, DA doesn’t prescribe a single lifecycle because it recognizes that one process size does not fit all.
  4. DA shows how it all fits together in a flexible, context-sensitive manner. DA includes advice about the technical practices such as those from Extreme Programming (XP) as well as the modeling, documentation, and governance strategies missing from both Scrum and XP. But, instead of the prescriptive approach seen in other agile methods, including Scrum, the DAD framework takes a goal-driven approach. In doing so DA provides contextual advice regarding viable alternatives and their trade-offs, enabling you to tailor DAD to effectively address the situation in which you find yourself. By describing what works, what doesn’t work, and more importantly why, DA helps you to increase your chance of adopting strategies that will work for you
  5. DA considers the bigger picture. Your solutions must work within your organizational eco-system and more importantly leverage your existing infrastructure. They must be architected to stand the test of time.

1.2 Disciplined Agile Roles

The Disciplined Agile (DA) framework suggests a robust set of roles for agile solution delivery. These roles are organized into two categories:

  • Primary roles. These roles - Team Lead, Product Owner, Architecture Owner, Team Member and Stakeholder - are commonly found on delivery teams regardless of the level of scale faced by the team.
  • Secondary roles. These roles - Domain Expert, Technical Expert, Specialist, Independent Tester, and Integrator - are filled, often on a temporary basis, to address scaling issues.

There are several important observations about these roles:

  1. These roles are different from traditional roles. Moving to an agile, and better yet disciplined agile, way of working requires a paradigm shift. Part of that paradigm shift is an improved set of roles and responsibilities held by people on agile teams. As a professional you need to be prepared to work in an agile manner to fit into an agile team.
  2. The era of the specialist is over. Key tenets of agile development are that people work together collaboratively to produce a working solution in an evolutionary (iterative and incremental) manner. An implication of that is that we're no longer able to work in a manner where people with narrow specialties - such as logical data modeler, physical data modeler, data analyst, and data architect - each do their small part of the work and then hand it off to the next person. That traditional approach is too slow, expensive, and error prone. Instead we build teams of "T-skilled" generalizing specialists who have one or more specialties (such as the ones listed above) PLUS a broader set of skills and knowledge that allow them to take on a wider range of tasks and work more effectively with their team mates.
  3. There is room for you if you're willing to learn. Although these roles are different than what you may be used to it is still possible, and highly desirable, for you to transition into one of these roles.
  4. The need for specialists still exist. There is a very small range of situations where specialists are still needed. Having said that, it is highly unlikely that you're in one of those situations. If you are new to agile, your best strategy is to assume that you are not in one of those situations and that you need to start working on becoming a generalizing specialist (not a generalist!) just like the vast majority of poeple.
For more details about disciplined agile roles, please read Roles on Disciplined Agile Teams and Disciplined Agile Roles at Scale.

1.3 Enterprise Awareness

Enterprise awareness is one of the key aspects of the Disciplined Agile (DA) framework. The observation is that DA teams work within your organization’s enterprise ecosystem, as do all other teams. There are often existing systems currently in production and minimally your solution shouldn’t impact them. Better yet your solution will hopefully leverage existing functionality and data available in production. You will often have other teams working in parallel to your team, and you may wish to take advantage of a portion of what they’re doing and vice versa. Your organization may be working towards business or technical visions which your team should contribute to. A governance strategy exists which hopefully enhances what your team is doing.

It is important that data warehousing teams work in an enterprise aware manner for several reasons:

  1. You should adopt your organization's development and data conventions. The implication is that your delivery team will need to work closely with your organization's enterprise architecture and data management teams who are typically responsible for such conventions.
  2. Your solution should leverage existing infrastructure wherever possible. You will want to work with your organization's enterprise architecture team to understand the technical direction of your company and with your reuse engineering team (if you have one) to identify and leverage existing assets.
  3. You should fix existing legacy systems and data sources whenever possible.
  4. This is called paying down technical debt in the agile community.
  5. You should share learnings whenever possible.
  6. You should be governed appropriately. Like it or not, your team is being governed. Effective IT organizations recognize that agile teams need to be governed in an agile, not a traditional, manner. The Disciplined Agile framework has governance strategies baked right into it.
For more details, please read Enterprise Awareness.

1.4 Full Delivery Lifecycles

One of the key aspects of the Disciplined Agile (DA) process decision framework is that it promotes a full, beginning-to-end, solution delivery lifecycle. The framework calls supports several lifecycles, two of which I summarize here, where you incrementally build a consumable solution over time.

Figure 1 depicts a Scrum-based version of the delivery lifecycle. This lifecycle deviates from the common Scrum lifecycle in several ways. First, it depicts a three-phase delivery lifecycle, not just a single-phase construction lifecycle. Second, it replaces the Scrum marketing language with more instructive language. Third, it shows external inputs coming into the team from other areas of your organization.

Figure 1. A Scrum-Based Delivery Lifecycle.

The Scrum-based lifecycle above explicitly calls out three phases:

  1. Inception. During this phase project/team initiation activities occur. This includes initial scoping, initial architectural modeling, high-level release planning, putting the team together, starting into your risk management approach, setting up your work environment, and securing funding for the rest of the release. Although “phase” tends to be a swear word within the agile community, the reality is that the vast majority of teams do some up front work at the beginning of a project. While some people will mistakenly refer to this effort as Sprint/Iteration 0 it is easy to observe that on average this effort takes longer than a single iteration. During the Inception phase we do some very lightweight envisioning activities to properly frame the project. It takes discipline to keep Inception short.
  2. Construction. During this phase a disciplined agle team will produce a potentially consumable solution on an incremental basis. They may do so via a set of iterations (Sprints in Scrum parlance) or do so via a lean, continuous flow approach (see Figure 2. The team applies a hybrid of practices from Scrum, XP, Agile Modeling, Agile Data, and other methods to deliver the solution. More on this later.
  3. Transition. The DA framework recognizes that for sophisticated enterprise agile teams deploying the solution to their stakeholders is often a complex exercise. DA teams, as well as the enterprise overall, will streamline their deployment processes so that over time this phase becomes shorter and ideally disappears as the result of adopting continuous deployment strategies. It takes discipline to evolve Transition from a phase to an activity.

Teams that are working on the first release of a data warehouse are likely to follow the Scrum-based lifecycle. Because they are working on the first release they will need to invest time in basic initiation efforts discussed earlier. Furthermore, because they are likely new to agile they will find the Scrum lifecycle to be easier to adopt as it prescribes the timing of common practices (such as planning, demos, and retrospectives) and forces the team into delivering on a regular basis (there should be more working stuff that could potentially be deployed to stakeholders at the end of each iteration/sprint). A detailed description of the type of activities that occur during each phase appears later in this article.

Figure 2 depicts Disciplined Agile's Lean Continuous Delivery lifecycle. This lifecycle is often followed by disciplined agile teams who are evolving an existing data warehouse that is already running in production. The requirements for such solutions often come in continously from stakeholders instead of a large batch. These requirements are often small in nature, typically adding a new data field, updating an existing report or data download, or adding a new report/download. Furthermore these requirements are self-contained and can be added easily to a well-architected DW/BI solution. Many of these requirements can be implemented in a few hours or days, so it often makes sense to do the work right away and release the new functionality as soon as you can. The temporal overhead of iterations/sprints, let alone regularly scheduled releases, of the Scrum-based lifecycle often doesn't make sense for evolving an existing production DW thus the need for a different approach.

Figure 2. A Lean Continuous Delivery Lifecycle.

The Lean Continuous Delivery lifecycle of Figure 2 varies from the Scrum-based lifecycle of Figure 1 in several important ways:

  1. It supports a continuous flow of delivery. In this lifecycle the solution is deployed as often, and whenever, it makes sense to do so. Work is pulled into the team when there is capacity to do it, not on the regular heartbeat of an iteration. This enables your disciplined agile DW team to be more responsive to stakholders.
  2. Practices are on their own cadences. With iterations/sprints many practices (detailed planning, retrospectives, demos, detailed modeling, and so on) are effectively put on the same cadence, that of the iteration. With a lean approach the observation is that you should do something when it makes sense to do it, not when the calendar indicates that you’re scheduled to do it. This enables you to streamline your activities more, BUT requires greater discipline.
  3. It has a work item pool. All work items are not created equal. Although you may choose to prioritize some work in the “standard” manner, either a value-driven approach as Scrum suggests or a risk-value driven approach as DA suggests, but other work may not fit this strategy. Some work, particularly that resulting from legislation, is date driven. Some work must be expedited, such as fixing a severity one production problem. So, a work item pool and not a prioritized stack (as in the Scrum-based lifecycle) makes a bit more sense when you recognize these realities.
For more details about both of these lifecycles, and others supported by the DA framework, please read Full Delivery Lifecycles.

1.5 Beyond Prescription: Goal Driven

The Disciplined Agile (DA) framework takes a goal driven approach. The purpose of this goal-driven approach is that it guides people through the process-related decisions that they need to make to tailor and scale agile strategies to address the context of the situation that they face. This is important because every team faces a unique situation. Teams vary in size, they vary in the way that they are geographically or organizationally distributed, they vary by the domain and technical complexity that they face, and they vary by the compliancy issues that are relevant to them. Furthermore, teams are made up of unique individuals, each of whom has a set of unique skills and experiences. In short, because each agile team finds itself in a unique situation the team must find a way to effectively tailor the way that they work to best face that situation. DA’s goal driven strategy is a light-weight approach to providing advice for such process tailoring.

Figure 3 summarizes the delivery-oriented process goals of the Disciplined Agile process decision framework via a mind map. There are twenty-two goals in total, each of which is described by a Process Goal Diagram (effectively a decision tree). A disciplined agile team will consider how to address each goal in a manner that reflects the situation that they face. Sometimes a goal will be very easy to address, for example an established development team may find that they need to do nothing else to fulfill the Form Initial Team goal. A team that is building a solution via an architecture they are very familiar with will have very little work to do to fulfill the Prove Architecture Early goal whereas a team using technologies that are new to them may have a fair bit of work to do. Different situations require different approaches.

Figure 3. A Mind Map Summarizing the Delivery Goals.

Each of the three delivery phases (Inception, Construction, and Transition) are described by specific goals. Some goals, such as Grow Team Members and Address Risk, are applicable throughout the entire lifecycle.

For more details, including all 22 process goal diagrams, please read Disciplined Agile Process Goals.

1.6 Explicit Support for IT-Level Activities

If you think that software development is hard, running the entire IT department is even harder. IT departments are complex adaptive organizations. What we mean by that is that the actions of one team will affect the actions of another team, and so on and so on. For example, the way that your agile delivery team works will have an effect on, and be affected by, any other team that you interact with. If you’re working with your operations teams, perhaps as part of your overall DevOps strategy, then each of those teams will need to adapt the way they work to collaborate effectively with one another. Each team will hopefully learn from the other and improve the way that they work. These improvements with ripple out to other teams. The challenge is that every area within IT has one or more bodies of knowledge, and in some cases published “books of knowledge”, that provide guidance for people working in those areas. For example management has the PMIBoK and Prince 2, enterprise architects have TOGAF and the Zachman Framework, business analysts have the IIBA BoK, data managers have the DAMA BoK, and so on. These industry groups and their corresponding bodies of knowledge contradict one another, they are at different points on the agile/lean learning curve, and sometimes they promote very non-agile/lean strategies. At the IT level this can be very confusing, resulting in dysfunction. The DA framework shows how this all fits together in a flexible manner that supports the realities faced in complex adaptive systems.

Figure 4 depicts the high-level workflow of a Disciplined Agile IT department. Your disciplined agile DW team will be affected, hopefully in a positive manner, by other teams running in parallel to you. Your team will work closely with the Enterprise Architects, your Architecture Owner may even be an EA, to understand their long-term vision. You will work with the Data Management group to understand and access existing legacy data sources. You'll work with your Release Management and Operations teams to release your solution into production. Your Support/Help Desk team is likely providing enhancement requests and bug reports to your team. You may have dependencies on other delivery teams.

Figure 4. The Workflow of a Disciplined Agile IT Department.

Not only must we work in an enterprise aware manner, these other teams need to be prepared to work in a more agile manner when interacting with our DW team. It will be very difficult for our DW team to work in an agile manner if they rely on other teams who aren't prepared, or worse yet sufficiently skilled, to work in an agile manner too. Having said that, the entire EA or DM team doesn't need to be agile, but at least enough people on those teams are so that they can support the agile delivery teams appropriately.

For more details about this IT-centric view, please read The Disciplined Agile Framework.

Disciplined Agile Data Warehousing

Let's explore how a disciplined agile DW team works in practice. Figure 5 summarizes the primary and secondary development activities that are potentially performed by the team. Primary activities are ones that add direct value to the development effort. Secondary activities tend to focus on long-term documentation that may add value in the future, but the value proposition tends to be dubious in practice so you want to be very careful in how much effort you invest in them. I tend to think of these as sideline activities that I would only do if the team has time to spare from primary activities.

Figure 5. Activities on a Disciplined Agile DW Team.

For now we'll assume that your team is working on the first release of a DW solution. As a result you will need to take a three-phase approach. For teams working with an existing DW solution that is already running in production, you may find that you do not need to work through Inception (or at least you only need an abbreviated version of it).

2.1 The Inception Phase

During the Inception phase the team strives to perform just enough work to get going in the right direction. Disciplined teams will spend a few days or perhaps a week or so to do so, not several weeks or months. The key is that they work in a very light-weight manner. The process goals that a disciplined agile team will fulfill during Inception, include:

  • Form Initial Team
  • Develop Common Vision
  • Align with Enterprise Direction
  • Explore Initial Scope
  • Identify Initial Technical Strategy
  • Develop Initial Release Plan
  • Secure Funding
  • Form Work Environment
  • Identify Risks
  • The purpose of this article is not to address every single aspect of the goals listed above, that's why Mark Lines and I wrote the book Disciplined Agile Delivery, but instead it is to focus on critical activities that support a disciplined agile appproach to DW/BI. Potential primary activities on a disciplined agile DW/BI team include:

    1. Initial usage modeling. Disciplined agile DW teams take a usage-driven, not a data-driven approach to modeling. Understanding the data is still important, don`t get me wrong, it is just that it isn`t anywhere near as important as understanding how the data will be used. The most common strategy used by agile teams to explore usage is to write user stories and epics (which are large user stories). Examples of user stories that would support the development of a DW/BI solution for a retail bank include "As a Branch Manager I need to analyze the portfolio of a customer so that we can target services to them", "As a Branch Manager I need to explore the transactions occurring in my branch so that I better understand my customer needs", or "As a Mortgage Officer I need to explore the risk profile of a potential mortgage holder so that I can decide how much we can loan that person." Notice how all of these requirements focused on usage, not data details. The details can come later.
    2. Initial conceptual modeling. An important, supporting model to the usage model is a high-level conceptual model, sometimes called a domain model. This diagram should indicate the main entity types within the domain and the relationships between them. It does not need to indicate potential data elements, nor does it need to be perfect, it just needs to be reasonably close at this time. The goal is to gain a reasonable understanding of domain terms at this time, the details will emerge later during construction. For the banking application the domain model may indicate entity types such as Customer, Account, Mortgage, Loan, Branch, Portfolio, and perhaps another twenty or so entity types.
    3. Identification of potential data sources. Early in a DW/BI project the team needs to identify the main (potential) sources of data. This information is often captured on a network diagram, or something similar. This diagram will overview the flow of information within your technical architecture, indicating the potential data sources, how information is obtained from those sources (see next activity), and how the data flows through your firewalls, staging areas (if any), your data warehouse(s), and any data mart(s).
    4. High-level data source analysis. During Inception you will want to obtain basic information about potential data sources as who the primary contacts are, what type of data it contains, how is that data accessed (e.g. via file transfers, via SQL queries, via web services, and so on), and sizing information (e.g. volume of data and rate of change). The goal right now is to gain sufficient understanding of the data sources so you can make intelligent architecture decisions about them.
    5. Initial architectural modeling. Early in a disciplined agile DW/BI project you want to identify a viable architectural strategy. Part of that strategy will be identifying potential data sources; part will be identifying how data will flow from the data sources to the target data warehouse(s) or data marts; and part will making that flow work through combinations of data extraction, data transformation, and data loading capabilities. The layout of the technical architecture is often captured using a network diagram, discussed earlier. Architectural notes, in particular important technical decisions as well as good things to know (such as the data source information described earlier), are often captured in an Architectural Handbook. This handbook is often implemented as a collection of wiki pages so that anyone who is interested may have access to the information.
    6. Initial release planning. Disciplined agile teams, including those focused on a DW/BI solution, will perform a bit of high-level release planning. Teams are often required to guesstimate the potential cost of the release they are working on as well as the potential delivery date. These guesstimates, or estimates if you prefer, are best presented as ranges so as to reflect the uncertainty of the information the guesstimate is based upon. For further information about initial release planning, please see the DA process goal Develop Initial Release Plan.
    7. Adopt common guidelines. Disciplined agile teams are enterprise aware, an aspect of which means that they strive to adopt and then follow common guidelines. These guidelines include data guidelines, security standards, coding guidelines, user interface guidelines, and many others.
    8. Initate a data testing strategy. Testing is so important on disciplined agile teams that we do it all the way through the lifecycle, not just during some phase at the end of the lifecycle. This includes the testing of all functionality, including any functionality pertaining to data-oriented issues, and around the testing of data itself. There are many things that can be tested pertaining to databases, see Database Testing: How to Regression Test a Relational Database for some thoughts on the subject. Your data testing strategy should address issues such as how to test extract-transform-load (ETL) logic, how to validate data sources, how to ensure the quality of the data in the data warehouse(s) and data mart(s), what tools will be used, and identification of who has the skills to do the work. This may be the greatest challenge faced by traditional data professionals as they transition to disciplined agile ways of working - not only do few traditional data professionals have data testing skills the vast majority don't even realize how critical those skills actually are.

    Potential secondary activities include:

    1. Detailed data modeling (partial). You may need to start doing some data modeling, both logical data model (LDM) and physical data model (PDM) development, during Inception. Your LDM, if you create it at all, is typically used for detailed data analysis, an activity which occurs during Construction. Similarly your PDM(s) are used to design the database schema of your DW and data mart(s), also work that is typically done during Construction. Having said that, during Inception you may choose to do detailed look-ahead modeling of high-priority requirements, and the design to support them, that you intend to implement in the first iteration or two of Construction. As a result you MIGHT do some data modeling work, see below for a more detailed discussion of what that might entail.
    2. Source-to-target data mapping (partial). Part of your look-ahead modeling effort, sometimes called "backlog refinement" by Scrum practitioners, will be to do just enough data mapping to implement the first few stories in your backlog. You need to know where the data is coming from to implement just these stories. Yes, any given data source may have hundreds of data elements that your team may potentially be interested in at some point, but for now you just need to map the handful of data elements required to implement the first few stories. That's it. Future data mapping, if at all, will be performed in an evolutionary manner throughout construction.
    3. Detailed data source analysis (partial). Similarly, you will do just enough analysis of your data sources to understand just the data elements required for the first few requirements.

    2.2 The Construction Phase

    During the Construction phase the team strives to fulfill several process goals:

    All of the above process goals are applicable to disciplined agile DW teams. To address these process goals, you are likely to adopt a collection of activities as we saw in Figure 5 above. Potential primary activities include:

    1. Development of vertical, fully functional slices. Each iteration the DW/BI team will produce a solution that is consumable, something that could be potentially shipped into production that people want to actually use. This means that you will analyse, design, implement, and test that functionality during the iteration (and most disciplined agile teams work in iterations that are two weeks in length or less). This is why it is so important to take a usage-driven approach and not a data-driven approach - your team needs to be always working on some new functionality that adds real value to your stakeholders. In a given iteration you do the work to completely implement one or more reports, or perhaps a portion of a report or an enhancement to an existing report, in a single iteration. You will do the work to extract the data from the data source(s), transform/clean it, and load it into the DW. This can be tough initially because you will not have the infrastructure in place yet during the first few Construction iterations. For example, the first time you extract data from a data source you'll need to do a lot of the work to access that data source.
    2. Prove the architecture with working code. There are always technical risks on DW projects. Maybe the technologies that you're working with are new to your organization. Maybe several data sources are difficult to work with, either because the owners of the data sources are difficult, because there are quality issues with the data, because there are architectural differences between the data sources (e.g some are real time and some are batch systems), or perhaps because there are volume challenges (i.e. "big data"). Disciplined agile teams remove these sorts of risks by implementing functional requirements that touch on the risks right away. Worried about accessing data from a batch system? Start by writing a report that needs data from it. Worried about your whether your ETL tool is going to work well? Implement one or more requirements that require the key features of that tool. Worried about whether you'll be able to handle the big data load? Implement a requirement needs that data. Many times traditional teams, and even not-so-disciplined agile teams, will put off hard aspects of their architecture to the end of the lifecycle, thereby increasing the potential costs of fixing any problems that they do run into. Disciplined agile teams prefer to address their risks as early as they can when they have the most time and resources to respond to the problems.
    3. Detailed data source analysis. Data source analysis occurs on a just-in-time (JIT), or near-JIT basis. Your team will do the analysis required to implement the current requirements (let's assume they're captured as user stories). So, if a story requires data from three data sources then you will do the analysis of for those data elements from those data sources. Of course, you're likely doing the analysis for several stories at a time, perhaps five or six stories. Furthermore, disciplined teams will be doing look-ahead modeling, described earlier, where you are doing the analysis for stories that are coming up in the next iteration or two. The basic strategy is to have just enough data source analysis done before you go to implement the actual functionality. You are likely to do more data source analysis in earlier Construction iterations as opposed to later iterations - as you populate your DW, the data required for new reports or queries is more likely to be there over time. Traditional teams have a tendency to do comprehensive data source analysis one data source at a time, followed by the implementation work needed to obtain and then load the data into the DW. This appears efficient from the point of view of the person(s) doing the work, but proves to be rather inefficient in practice from the point of view of your stakeholders for two reasons. First, it takes much longer to get to the point where you have sufficient data in place to implement the reports, or to support answering their questions, that they actually want. In other words, you have a very high cost of delay (also referred to as opportunity cost). Second, you end up analyzing (and then implementing) data elements that aren't actually needed.
    4. Implementation of source-to-target data mapping. The implementation work to extract the data from source, transform the data as required, and then load it into your target database(s) will be done in a JIT, evolutionary manner. Each iteration your team will do the work to implement one or more stories from end-to-end (e.g. vertical slices through your solution) and part of this work is the implementation of the source-to-target data mappings.
    5. Database refactoring. A refactoring is a simple change to your design that improves its quality without changing its semantics in a practical manner. A database refactoring is a simple change to a database schema that improves the quality of its design OR improves the quality of the data that it contains. Database refactoring enables you to safely and easily evolve database schemas, including production database schemas, over time. This technique, in combination with database regression testing and continuous integration, allow us to develop data warehouses, or any solution involving a database for that matter, in an agile manner.
    6. Physical data modeling. The physical data model(PDM)s describing your databases, including both source and target databases, will evolve throughout Construction. Please see the article Agile/Evolutionary Data Modeling for a detailed description for how to go about agile data modeling.
    7. Regression testing. Quality is paramount for disciplined agile teams. Disciplined agile teams will develop, in an evolutionary manner of course, a regression test suite that validates their work. They will run this test suite many times a day so as to detect any problems as early as possible so that they can address them as cheaply as possible (remember, the average cost of fixing a defect rises exponentially the longer it takes you to find it). In fact, very disciplined teams will take a test-driven development (TDD) approach where they write tests before they do the work to implement the functionality that the tests validate. As a result the tests do double duty - they validate and they specify (which is one of many reasons why disciplined agile teams require far less documentation than traditional teams, their specifications are in effect executable as opposed to static). Please see Database Testing: How to Regression Test a Relational Database for a more detailed description of this strategy.
    8. Continuous integration (CI). CI is a technique where you automatically build and test your system every time someone checks in a code change. Disciplined agile developers will typically update a few lines of code, or make a small change to a configuration file, or make a small change to a PDM and then check their work into their configuration management tool. The CI tool monitors this, and when it detects a check in it automatically kicks off the build and regression test suite in the background. This provides very quick feedback to team members, enabling them to detect issues early.
    9. Continuous deployment (CD). When an integration build is successful (it compiles and passes all tests) your CD tool will automatically deploy to the next more appropriate environment(s). For example, if the build runs successfully on a developer's work station their changes are propogated automatically into the team integration environment (which automatically invokes CI in that space). When the build is successful in your team integration environment perhaps it's promoted into an integration testing environment, and so on.
    10. Continuous documentation. Disciplined agile is solution focused, not just software focused. Because documentation is part of the overall solution that you deliver, you should be develop key documentation (system overview documentation, help guidelines, and so on) as you develop the software. For more information, see the Agile Modeling practice document continuously.
    11. Detailed planning. Planning is so important on disciplined agile teams that we do it all the way through Construction. Detailed planning occurs at the beginning of each iteration (for teams following the Scrum-based lifecycle) or in an as-needed, just in time (JIT) manner (for teams following the lean/continuous delivery lifecycle). Team members may also choose to engage in look-ahead planning to begin thinking through the next iteration or two in complex situations.
    12. Coordination meetings. The team needs to coordinate their work both internally within the team and externally with other teams. The Disciplined Agile framework includes the Coordinate Activities process goal that captures a range of options for doing so.
    13. Demos. Disciplined agile teams demonstrate their work on a regular basis, typically at the end of each iteration. This demo, typically run by your product owner, shows off the work that your team has accomplished since the last demo. Because you are taking a usage-driven approach, each iteration you should have added more functionality that provides real value to your stakeholders. For example, your demo might walk through a new report that your team built that iteration, show how a calculation was updated on an existing report, and show how there are seven new data columns available to people doing ad-hoc reporting.
    14. Retrospectives. One of the principles behind the Agile Manifesto, and the Disciplined Agile Manifesto which extends the original manifesto to address enterprise-class issues, is that teams should regularly reflect on what they're doing and strive to learn and improve their approach. Retrospectives are a simple technique for doing exactly that.

    Potential secondary activities include:

    1. Logical data modeling. In practice, logical data modeling tends to add very little value to the overall development effort (other than the "value" of keeping logical data modelers employed of course). If you do decide to invest time in an LDM, keep it as light-weight as possible and DO NOT allow this effort to slow down development. If you think that your LDM can offer actual value to your organization, and in the traditional world that's possible, then ask yourself how you can add the same value using tests. I'll write up a more detailed article on this at some point in the future. For now my best advice is to be very leary about LDMing.
    2. Documentation of source-to-data mapping. You will likely find that you want to document your mappings. Once again, you will find it more effective to capture these mappings in the form of tests (assuming you have the skills to do so) rather than static documentation. If you find that you need to resort to documentation, remain as agile as possible and keep the documentation light-weight. Please see Agile/Lean Documentation Strategies for more ideas about how to keep your documentation light-weight and sufficient.
    3. Meta-data documentation. Once again, follow agile documentation strategies for capturing any meta-data information.

    2.3 The Transition Phase

    During the Transition phase the team strives to ensure that the solution is consumable, and when it is deploy the solution. To address these process goals, you are likely to adopt a collection of activities as we saw in Figure 5 above. Potential primary activities include:

    1. End-of-lifecycle testing. Some testing may slip into the Transition phase. Ideally all testing should occur during Construction, other than one last run of your regression test suite to ensure you're ready to ship. But it isn't always an ideal world. See end-of-lifecycle testing for more details.
    2. Last-minute fixes. If you perform end-of-lifecycle testing there is always the risk that the testing effort finds some bugs. Your Product Owner may decide that these bugs need to be fixed before you're ready to ship.
    3. Finalize deliverable documentation. Some teams will let documentation slip, or at least some documentation slip, to the end of the lifecycle. This is a practice called Document Late, although I prefer a continuous documentation approach described earlier.
    4. Deploy database schema changes. Part of your overall deployment efforts will be to deploy database schema changes. If you've been taking a database refactoring approach then this will be very straightforward as your change scripts will already be running and fully testing. Before making the schema changes you should consider creating a backup of the database. You may find The Process of Database Refactoring: Install Into Production, to be an interesting read.
    5. Migrate production data to new schema. TBD.

    The only potential secondary activity is to finalize any secondary documentation, such as your logical data model (LDM) or your meta-data documentation, that you believe will add real value in the future. The risks of investing too much effort on these sorts of activities have been discussed earlier.


    3. Artifact Creation by DW Teams: Traditional vs. Agile

    Figure 6 compares the typical level of expended effort creating artifacts on traditional and agile teams. There are several interesting differences to between the approaches. Disciplined agile DW teams will:

    1. Create a high-level conceptual model early. A high-level conceptual model, or more accurately diagam at this point, identifies the critical business entity types and the relationships between them. This provides vital insight into the domain while helping to capture key domain terminology, thus helping to drive consistency of wording in other artifacts (such as user stories and epics). Traditional teams will often make the mistake of over documenting the conceptual model early in the lifecycle, injecting delay into the team (with the corresponding opportunity cost of doing so) as well as the risk of making important decisions when you and your stakeholders have the least knowledge of the actual end goal.
    2. Evolve a light-weight logical data model (LDM) over time. If your agile DW team does this at all they will keep their LDM very light weight. Traditional teams will often invest heavily in their LDMs as they believe it is a mechanism to ensure quality and consistency through specifying it in. This often proves to be wishful thinking in practice. Disciplined agile teams instead invest their efforts in creating an executable specification in the form of regression tests (more on this below).
    3. Evolve a detailed physical data model (PDM) over time. Disciplined agile teams realize that a PDM, when created via a data modeling tool with full round-trip engineering (it generates schemas as well as imports existing schemas), effectively becomes the source code for the database. As the requirements evolve the team will evolve the PDM to reflect these new needs, generating schema changes as needed. They can work this way because they are able to easily refactor and regression test their database. This is different from the traditional approach where they often perform detailed modeling up front. This is motivated by the mistaken belief that production database schemas are difficult to evolve, something that agilists know not to be true.
    4. Develop a comprehensive regression test suite over time. These tests address several important issues. First, they validate the work of the team, showing that their work to date fulfills the requirements as they've been described to the team. Second, a regression test suite enables the team to safely evolve their work. Agile developers can make a small change, rerun their regression tests, and see whether they broke something (if so, then they either rollback their change or they fix what they broke). Third, when a test is written before the corresponding database schema or database code is developed, the test effectively becomes a detailed executable specification. Sophisticated agile DW teams will capture the kinds of information that were previously captured in LDMs in executable tests, and are thus much more likely to have consistent schemas than teams that still rely on static LDMs.
    5. Capture critical meta-data over time. Because the rest of your organization may not be completely agile there is often a need to continue to capture key meta data about data sources. This meta data should be kept as light as possible. If there isn't a definite need for it then don't capture it. If someone says "but we might need it someday" then wait until someday and invest in capturing the information at that point. Furthermore, instead of capturing meta data in a static manner (i.e. as documentation) try to identify ways to capture it as tests, or to generate it automatically from other information sources. Any documentation that you write today needs to be maintained over time, slowing you down.

    Figure 6. Comparing Artifact Creation by DW Teams.

    For a better understanding of why traditional DW teams are likely to write too much documentation far too early in the lifecycle, you should read the article The Cultural Impedance Mismatch.

    4. Related Resources