- What is look-ahead data analysis?
- Why look-ahead data analysis?
- Look-ahead data analysis on agile teams
- Look-ahead data analysis on lean continuous delivery teams
- Factors that affect the timing of look-ahead data analysis
- Staffing look-ahead data analysis
- Look-ahead data analysis in context
1. What is Look-Ahead Data Analysis?
As the name implies, this is effectively the application of Agile Modeling’s look-ahead modeling practice to data analysis. Scrum teams refer to look-ahead modeling as backlog refinement or even backlog grooming.
The basic idea is that you do just enough analysis work to explore and understand the data source(s) so that a data requirement, likely captured in the form of a question story, can be implemented. When you have existing, high quality reporting data sources (data warehouses (DWs), data marts, data lakes, …) that contain the data you need then the data analysis work is fairly straightforward. When this is not the case, when the data resides in legacy OLTP (online transaction processing) databases or in external data sources (think big data) then you may require significant effort to explore, understand, and document the incoming source data. It is this effort that look-ahead data analyis focuses on.
Data analysis activities may include:
- Identification of potential data sources
- Profiling/exploring data sources
- Identification of multiple sources of specific data
- Assessment of data quality
- Selection of the best source(s) of specific data
- Formulation of data cleansing rules
- Mapping of source data to required data elements
2. Why Look-Ahead Data Analysis?
There are several reasons why you need to perform look-ahead data analysis:
- Data source documentation isn’t trustworthy. Very often the artifacts describing existing data sources, if there are any at all, aren’t complete nor up-to-date. As a result you need to analyze the legacy data source(s) to determine what is actually there and what is potentially usable.
- Data engineers need to understand the source data. The data engineers, or developers in some cases, need to understand what data they will use and how they need to manipulate it, to implement whatever is required to address a given question story.
- You want the best data available. You very often have many data sources available to you capable of providing the data you need. One goal of data analysis is to help you to identify the best available source for the data that you require, not just a source. Be quality infected.
3. Look-Ahead Data Analysis on Agile Teams
Many teams choose to follow an agile lifecycle, typically based on Scrum. Figure 1 depicts the look-ahead data analysis work required for three question stories that are to be implemented during sprint #9 of an agile DW/BI initiative. Notice how each question story requires a different amount of data analysis effort due to the fact that every question has unique data needs.
There are several interesting implications for look-ahead data analysis on agile/Scrum teams:
- You need to get good at guesstimation. To schedule the look-ahead data analysis properly the person(s) doing the work will need to guesstimate the amount of effort required for each question story. This is required because the people with agile analytics have a limited amount of capacity.
- The data analysis of several sprints will overlap. Figure 1 depicts the look-ahead data analysis efforts for the three question stories being implemented in sprint#9 only. Previous sprints would have also required similar efforts.
- You will be limited by availability of agile analytics skills. See the discussion of staffing below.
- The first few sprints will be rough. The data analysis work needs to get ahead of the implementation work. The implication is that it may be a few sprints before the team delivers sufficient functionality to address a single question story.
- Your definition of ready (DoR) must address data issues. Many agile/Scrum teams have a DoR that defines the minimum level of quality of work that needs to be put into a story before the team is willing to work on it. See DoRs for question stories for a detailed discussion.
4, Look-Ahead Data Analysis on Lean Continuous Delivery Teams
Advanced teams choose to follow a lean continuous delivery lifecycle, based on Kanban and DevOps strategies. Figure 2 depicts the look-ahead data analyis work for the same three question stories from Figure 1, the difference being that the work is done on a just-in-time (JIT) basis rather than scheduled into fixed-length sprints. Note that the same amount of data analysis is still required for each user story as in Figure 1, but that the implementation time is no longer tied to a two-week sprint.
There are several interesting implications for look-ahead data analysis on continuous delivery teams:
- There are no scheduling issues. The work is done on a JIT basis, removing the need for the guesstimation and scheduling overhead. The team takes on the work for a new question story once they have the capacity to do so.
- It’s really just part of the implementation work. Because the work is done on a JIT basis, there’s less need to distinguish between analysis work and any other work required to implement a question story.
- It will be easier to collaborate and learn together. While there is still a challenge around having sufficient people with agile analytics skills, the people performing data analysis are more likely to be working closely with the data engineers and developers and thus are more likely to share their skills with them. As a result you are likely to grow more people, and so so faster, with agile data analysis skills.
5. Factors That Affect The Timing of Look-Ahead Data Analysis
There are several factors that will determine how far ahead you need to perform look-ahead data analysis:
- The existing data in your data warehouse (DW).. The more data that you have in your reporting data sources (DWs, data marts, data lakes, …) and the higher the quality of it then the less data analysis you will need to perform of legacy data sources.
- The complexity of the data source(s). The more complex the data source, either in structure or in contents, the longer it will take to analyze.
- Your ability to gain access to the data source(s). You may not have easy access to a data source that you need to analyze, and it can take time to gain that access.
- The quality of the existing documentation. The higher the quality of the documentation describing the data source(s), if any exists at all, the easier it will be to understand the data source.
- The difficulty of the question being asked. The more difficult, or complex, the question to be answered then very likely it will take longer to analyze the data source(s) required to answer that question.
- The skill, experience, and knowledge of the data analyst(s). Highly capable data analysts are generally more effective than novices, and thus work faster and generally produce better results. In short, it really does depend on who is doing the work as to how long it will take.
- The availability of the data analyst(s). You will need to staff accordingly. As more people with data analytics skills become available, the shorter the lead/wait time will be for people to do the work.
- Your data profiling tools. The more effective your tools, the easier it will be to explore existing data source(s).
6. Staffing Look-Ahead Data Analysis
There are several challenges that you are likely to face when staffing for look-ahead data analytics. These challenges, and potential solutions, are summarized in Table 1.
Table 1. Staffing challenges for look-ahead data analysis.
|Lack of people with agile analytics skills – This is the primary limiting factor.||
|Sprint scheduling conflicts – You can only do so much look-ahead data analysis at any given time.||
|Centralized data teams – They become a bottleneck when multiple teams need help simultaneously.||
7. Look-Ahead Data Analysis in Context
The following table summarizes the trade-offs associated with look-ahead data analysis and provides advice for when to adopt it.
Table 2. Look-ahead data analysis in context.
|When to Adopt This Practice||Whenever you require data from legacy data sources that has not yet been brought into your existing data warehouse/lake/… environment.|
9. Related Resources