Agile Data Logo

Look-Ahead Data Analysis

Follow @scottwambler on Twitter!

When it takes several days, and sometimes weeks, to perform data analytics how do you remain agile? The answer is to perform look-ahead data analysis.


What is Look-Ahead Data Analysis?

As the name implies, this is effectively the application of Agile Modeling's look-ahead modeling practice to data analysis. Scrum teams refer to look-ahead modeling as backlog refinement or even backlog grooming.

The basic idea is that you do just enough analysis work to explore and understand the data source(s) so that a data requirement, likely captured in the form of a question story, can be implemented. When you have existing, high quality reporting data sources (data warehouses (DWs), data marts, data lakes, ...) that contain the data you need then the data analysis work is fairly straightforward. When this is not the case, when the data resides in legacy OLTP (online transaction processing) databases or in external data sources (think big data) then you may require significant effort to explore, understand, and document the incoming source data. It is this effort that look-ahead data analyis focuses on.

Data analysis activities may include:


Why Look-Ahead Data Analysis?

There are several reasons why you need to perform look-ahead data analysis:

  1. Data source documentation isn't trustworthy. Very often the artifacts describing existing data sources, if there are any at all, aren't complete nor up-to-date. As a result you need to analyze the legacy data source(s) to determine what is actually there and what is potentially usable.
  2. Data engineers need to understand the source data. The data engineers, or developers in some cases, need to understand what data they will use and how they need to manipulate it, to implement whatever is required to address a given question story.
  3. You want the best data available. You very often have many data sources available to you capable of providing the data you need. One goal of data analysis is to help you to identify the best available source for the data that you require, not just a source. Be quality infected.

Look-Ahead Data Analysis on Agile Teams

Many teams choose to follow an agile lifecycle, typically based on Scrum. Figure 1 depicts the look-ahead data analysis work required for three question stories that are to be implemented during sprint #9 of an agile DW/BI initiative. Notice how each question story requires a different amount of data analysis effort due to the fact that every question has unique data needs.

Figure 1. Look-ahead data analysis on an agile team. Click to enlarge.

Look-ahead data analysis on a Scrum team


There are several interesting implications for look-ahead data analysis on agile/Scrum teams:


Look-Ahead Data Analysis on Lean Continuous Delivery Teams

Advanced teams choose to follow a lean continuous delivery lifecycle, based on Kanban and DevOps strategies.

Figure 2 depicts the look-ahead data analyis work for the same three question stories from Figure 1, the difference being that the work is done on a just-in-time (JIT) basis rather than scheduled into fixed-length sprints. Note that the same amount of data analysis is still required for each user story as in Figure 1, but that the implementation time is no longer tied to a two-week sprint.

Figure 2. Look-ahead data analysis on a continuous delivery team. Click to enlarge.

Look-ahead data analysis on a continuous delivery team


There are several interesting implications for look-ahead data analysis on continuous delivery teams:


Factors That Affect The Timing of Look-Ahead Data Analysis

There are several factors that will determine how far ahead you need to perform look-ahead data analysis:

  1. The existing data in your data warehouse (DW).. The more data that you have in your reporting data sources (DWs, data marts, data lakes, ...) and the higher the quality of it then the less data analysis you will need to perform of legacy data sources.
  2. The complexity of the data source(s). The more complex the data source, either in structure or in contents, the longer it will take to analyze.
  3. Your ability to gain access to the data source(s). You may not have easy access to a data source that you need to analyze, and it can take time to gain that access.
  4. The quality of the existing documentation. The higher the quality of the documentation describing the data source(s), if any exists at all, the easier it will be to understand the data source.
  5. The difficulty of the question being asked. The more difficult, or complex, the question to be answered then very likely it will take longer to analyze the data source(s) required to answer that question.
  6. The skill, experience, and knowledge of the data analyst(s). Highly capable data analysts are generally more effective than novices, and thus work faster and generally produce better results. In short, it really does depend on who is doing the work as to how long it will take.
  7. The availability of the data analyst(s). You will need to staff accordingly. As more people with data analytics skills become available, the shorter the lead/wait time will be for people to do the work.
  8. Your data profiling tools. The more effective your tools, the easier it will be to explore existing data source(s).

Staffing Look-Ahead Data Analysis

There are several challenges that you are likely to face when staffing for look-ahead data analytics. These challenges, and potential solutions, are summarized in Table 1.

Table 1. Staffing challenges for look-ahead data analysis.

Challenge Potential Solution(s)
Lack of people with agile analytics skills - This is the primary limiting factor.
  • Hire agile data analysts. Unfortunately it is currently difficult to hire agile data professionals as the demand outstrips the supply.
  • Motivate people to become generalizing specialists. A generalizing specialist has one or more specialties, such as data analysis, and a general knowledge of the domain that they are working in. If you can nudge people to pick up data analysis skills as one of the specialties that they are working on, or at least to pick up basic skills, then you will soon have a growing number of people capable of look-ahead data analysis.
  • Adopt collaborative strategies. When people work closely together they pick up skills from one another, particularly when they follow non-solo work techniques such as pairing or mobbing. If traditional data analysts purposefully pair/mob with agilists then they will pick up agile WoT and WoW and the agilists will pick up valuable data analysis skills.
  • Train and coach existing data analysts in agile. This is possible, although it can be difficult to find agile data coaches.
  • Train and coach existing agile developers in data analysis. This is possible, although will likely be very difficult as few agile developers have a background in data. This is likely to require significant investment as a result.
  • Adopt better tooling. There are great data profiling, data modeling, and data extraction tools available to you. Are you using them?
Sprint scheduling conflicts - You can only do so much look-ahead data analysis at any given time.
  • Create smaller question stories. Although this depends on the needs of your stakeholders, if you are able to simplify question stories then the data analysis required to support their implementation will likely decrease. The less analysis there is to do, the less likely you'll have overlap between sprints.
  • Adopt shorter sprints. By shortening (yes, shorten, not lengthen) your sprint length, usually from two weeks to one week, you will motivate three important behaviors: You will find ways to improve your way of working (WoW), reduce the size of question stories, and reduce the number of work items you address each sprint. All of these behaviors will help reduce the change of schedule conflicts.
  • Adopt a lean, continuous delivery approach. You significantly reduce your overall coordination/scheduling overhead by moving away from sprints/timeboxes, as you learned above.
  • Increase the number of people with agile data analysis skills. The more people who are able to perform agile data analysis, the less likely sprint scheduling conflicts will occur.
Centralized data teams - They become a bottleneck when multiple teams need help simultaneously.
  • Build whole teams. A team is whole when it has sufficient people, with the right skills, to achieve the outcome(s) they have taken on. The implication is that if a team needs to perform agile data analysis then they should have the people on the team with those skills and not rely on another team to do the work for them.
  • Create an agile data community of practice (CoP). A CoP is a group of like-minded people who choose to learn together. One of the reasons why centralized teams exist is the belief that it's the only way to develop a group of people with those skills. The fact is that CoPs are another way to accomplish that goal.
  • Increase the number of people with agile data analysis skills. The more people you have with agile data analysis skills, the easier it will be to rework the centralized data team.

Look-Ahead Data Analysis in Context

The following table summarizes the trade-offs associated with look-ahead data analysis and provides advice for when to adopt it.

Table 2. Look-ahead data analysis in context.

Advantages
  • Provides data analysts with the opportunity to perform sufficient data analysis to explore the requisite data sources before implementation begins.
  • Increases the chance that the data engineers will work with the best data in the most appropriate way.
  • Ensures that the data engineers have the information they require to implement a question story.
  • Supports a definition of ready (DoR) strategy popular with teams following a Scrum-based approach.
Disadvantages
  • Requires people with sufficient data analysis skills.
  • The required effort can be hard to predict due to accessibility of, and quality issues with, source data. This analysis work may range from several hours to several weeks in effort.
  • Data analysts new to agile will often fall into the trap of over-analyzing the source systems rather than focusing on just the data required to address the question story they are currently performing analysis for.
When to Adopt This Practice Whenever you require data from legacy data sources that has not yet been brought into your existing data warehouse/lake/... environment.

Related Resources