Agile MDM – Practical Master Data Management
The primary goals of Master Data Management (MDM) are to promote a shared foundation of common data definitions within your organization, to reduce data inconsistency within your organization, and to improve overall return on your IT investment. MDM, when it is done effectively, is an important supporting activity for service oriented architecture (SOA) at the enterprise level, for enterprise architecture in general, for data warehouse(DW)/business intelligence (BI) efforts, and for software development initiatives in general. Traditional approaches to data management (DM), particularly those based on extensive modeling and a serial approach to performing the work, have a poor track record in practice. MDM is likely to struggle if you do not move away from traditional DM strategies. In this article I describe how agile MDM works, applying strategies based on evolutionary development, collaborative approaches to working, and focusing on providing concrete value to the business.
Agile software development (ASD) is an evolutionary approach which is collaborative and self-organizing in nature, producing high-quality systems that meets the changing needs of stakeholders in a cost effective and timely manner. MDM and ASD are clearly different things, although they are clearly compatible.
This article addresses:
- Fundamental MDM activities
- Agile MDM is collaborative
- Agile MDM is embedded within the development process
- Agile MDM is an enterprise activity
- Agile MDM is evolutionary
- Agile MDM is usage-driven
- Produce measurable results
- Deliver quality through testing
- Adopt a lean governance approach
- Agile MDM requires a cultural shift
The main differences between “Agile MDM” and “traditional MDM” are centered on the approach to doing the work, not the fundamental work itself; In other words, when you do the work, how you do it, and who you do it with are the critical issues. An agile approach to MDM achieves the goals of MDM (promoting common data definitions, reducing data inconsistency, and improving IT ROI) by embedding MDM activities into the overall software process in a manner which reflects the environment of modern IT departments. The following basic MDM activities are still performed (if and when they make sense) with an agile approach, but as you’ll see they’re accomplished in a more effective and efficient manner:
- Classify data elements (data classification)
- Consider data access (data security)
- Identify pertinent master data elements (MDEs) such as entity types, data elements, associations, and so on.
- Define and manage metadata pertaining to MDEs, including:
- Primary source(s) of record for MDEs
- How systems access MDEs (identifying producers and consumers)
- Volatility of MDEs
- Lifecycles of MDEs
- Value to your organization of individual MDEs
- Owners and/or data stewards of MDEs
- Adopt tools, including modeling tools and repositories, to manage MDM metadata
Any agilist reading the above list is likely reeling from the potential for out-of-control bureaucracy surrounding MDM. Considering the past track record of most data management efforts, more on this in a bit, this is a significant concern. As you’ll soon learn in the rest of this article it is in fact possible to streamline MDM efforts so that the value is achieved without the pain of needless bureaucracy, although as you would imagine this will require significant culture shifts in some organizations.
The best way to deliver value is to work closely with development teams and their stakeholders to ensure that the MDM effort is focused on supporting the creation of business functionality that stakeholders actually need now, not at some undefined point in the future. Traditional, documentation heavy, command-and-control approaches to MDM are often doomed to failure because the MDM program is too tedious for teams to follow. With a collaborative approach to MDM:
- The data engineers and enterprise architects are actively involved with working with the teams to support and enhance the MDM efforts.
- They make it as easy as possible for the development teams to do the right thing by collaborating with them to do so.
- They do a lot of the “MDM grunt work” which the teams would have otherwise avoided.
- You work together in face-to-face collaborative working sessions. These prove to be far more effective than traditional approaches such as formalized meetings, reviews, or functionally distributed teams (where the data specialists work on their own in parallel to the development teams).
It is very easy to claim that you intend to take a collaborative approach to MDM, but a lot harder to actually do so. Traditional data management has a poor track record of working together closely and effectively with development teams. The point is that if your development teams are currently frustrated with the level of service provided by your organization’s data group then it will be that more difficult for the data group to make inroads into the teams to support any sort of MDM effort.
If the MDM activities, particularly the ones involving work to identify and capture metadata, are separate from day-to-day development activities then there is very little chance of your MDM program succeeding. The easiest way to embedded MDM activities into your development process is to educate team members on the importance of MDM and to ensure that one or people have the appropriate skills to collaborate with the data engineers and enterprise architect(s) responsible for MDM efforts. If your team has one or more Agile data engineers then MDM activities should be part of their daily jobs, and ideally they will have tools which automate as much if this work as possible.
The challenge is that development teams in general, and in particular agile teams with their focus on high-value activities, will be reticent to do this sort of data-oriented work if they perceive it as extraneous. Worse yet, few development methods explicitly include these sorts of activities, in part because the people behind the methods often lack experience in such activities but mostly because the data community struggles to make their techniques relevant to modern-day development.
MDM by definition must have an organization/enterprise-level view, and an agile approach to MDM is no exception. However, that doesn’t mean that MDM has to be an onerous, command-and-control activity which does little more than justify the existence of your data management group for the year or two that they’re able to milk MDM before it fails due to not producing measurable value. Instead, with a collaborative and lean approach your data engineers and enterprise architect(s) can achieve the stated goals of MDM in a sustainable way. Agile MDM is both a team-level and an enterprise-level activity, and the needs of these two levels will need to be balanced in a fit-for-purpose manner which reflects your unique situation.
The evidence that evolutionary, iterative and incremental, approaches to software development are superior to serial approaches has been mounting for years. This is true of data-oriented activities too, as this site clearly shows. Technically it is quite easy to take an evolutionary approach to IT activities, including data activities, but that often the true challenges prove to be around overcoming cultural challenges.
Not only is it possible to analyze legacy data sources, to collect metadata, and then support development teams in an evolutionary manner you really have no choice in the matter. This is obvious for several reasons:
- For all but the smallest organizations you simply can’t do all of the requisite legacy analysis and metadata collection all up front without it changing underneath you before you can make it available.
- The business environment is going to change anyway so you’re going to have to evolve your data definitions over time, like it or not.
- You’re only human and as a result you’re going to make mistakes. You have to assume that your understanding of various data elements will change over time regardless of how much time you actually put into the initial definition efforts.
- The needs and priorities of development teams will change throughout the lifetime of a release of a system, let alone the lifetime of the system itself. This will affect how you prioritize your MDM activities.
- If your organization chooses to grow through acquisition or partnership then the new firms that you acquire and/or work with will likely have different viewpoints which will motivate you to evolve your existing perceptions.
With an evolutionary approach to MDM you want to work in priority order. This order should be set by the business not by the IT department. A common Agile strategy is to have the stakeholders prioritize the work to be done, not the IT professionals. This strategy is depicted in Figure 1 and described in detail in Agile Requirements Change Management. This enables you to maximize return on investment (ROI) because you’re always working on the most important functionality required by your stakeholders. Yes, your enterprise architecture and enterprise business modeling efforts will still guide your work, but this guidance will be reflected in the overall prioritization of the work.
This is probably the most radical advice which I present in this article â€” data is a secondary concern for MDM, not a primary one. Success factors for CRM were business-oriented and cultural in nature and not technical. Considering that MDM is arguably CRM applied to all major business concepts and not just customers we should really take heed of these findings. In other words, you must focus on usage, not on data.
With a usage-driven approach your major requirements artifacts explain how people will work with, or interact with, the system. Examples of such artifacts include use cases, user stories, and usage scenarios which are primary artifacts of OpenUP, XP, and MSF for Agile respectively. Business process models could also arguably be used here, but none of the major agile development methodologies use them as a primary artifact although Agile Modeling includes them as potential models which you should apply where appropriate. When these artifacts are created rigorously they often refer to other types of requirements, such as business rules and report specifications. However, these sorts of details are often explored on a just-in-time (JIT) model storming basis during the initiative so many agile teams won’t invest in rigorously documenting them because the useful lifetime of such documentation is very short.
The value in usage models, in particular use cases and usage scenarios, is that they focus on the business objectives which end users are trying to accomplish by using your system(s). If your stakeholders are able to prioritize the various usages, then suddenly development teams find themselves in the position of being able to not only deliver something of concrete value, the implementation of the various usages, but if they implement them in priority order then they will maximize stakeholder’s return on investment (ROI) in IT.
A common mistake which often leads to failure is to let technology decisions drive your prioritization strategies. For example, a favorite IT strategy is to work on one legacy system at a time, analyzing and then cataloging the metadata for the entire system. This sort of initial, detailed cataloging effort can take years to accomplish and will more than likely run out of steam long before any concrete results are produced. Another ill-fated strategy is to focus on specific data entities one at a time. Although this approach has more merit than the previous one, you may find that you need to do this for a large number of entities before you can start providing real business value from your efforts. The fundamental problem is that technical prioritization strategies do not reflect the priorities of the business which you are trying to support, putting any IT effort, including MDM efforts, at risk because your stakeholders aren’t receiving concrete value in a timely manner. When stakeholders don’t perceive the value that they’re getting for their IT investment they quickly start to rethink such investment.
Worse yet, some MDM efforts run aground on the â€œone truthâ€ shoals â€” they strive to develop one definition for each data entity within an organization. In theory this is a laudable goal but in practice it’s virtually impossible because few organizations can actually come to an agreement on the definitions of major concepts. Furthermore, it’s often a competitive advantage for your organization to treat various concepts differently at times based on the given context. A wonderful example of this is HSBC’s series of billboard and airport advertisements around the world showing two different pictures with captions, then showing the same two pictures with the captions swapped. Figure 2 is a picture that I took in a hallway in London’s Heathrow airport. In short, efforts to try to identify the “one truth” are likely misguided and unlikely to actually produce value. My advice is to worry less about gathering perfect metadata and instead focus on delivering valuable business functionality.
Many traditional IT efforts find themselves in trouble when they take a document-based approach to reporting progress. For example, in earned value management (EVM) you claim progress against your plan when you achieve various milestones called out in those plans. On traditional software development initiatives these milestones are typically based on delivery of key documentation such requirements specifications, design specifications, test plans, and eventually the working system. Traditional MDM efforts may choose to measure earned value in terms of the metadata collected, such as the number of entity types or entity attributes defined. The challenge to a document-based approach to measuring earned value is that there is a tenuous relationship between documentation and actual delivery of working functionality which actually provides real value to business stakeholders. When you think about it, you’re doing little more than justifying bureaucracy with document-based EVM.
Agile teams “earned value” is in the form of a working solution, which for a software development team is the delivery of working software and for a DW/BI team the delivery of analytic data and supporting reports. Therefore, with an agile approach to MDM your focus shouldn’t be on collecting metadata (although you will still do that) but instead should be on:
- Supporting teams to deliver high-quality working software which meets the changing needs of their stakeholders
- Supporting business stakeholders to access and manipulate data, typically via a DW/BI solution
In other words, don’t do MDM for the sake of doing MDM, instead do it to streamline stakeholder-facing data-oriented activities. The only valid way of measuring your MDM efforts isn’t by number of data elements collected but instead by number of â€œdata conformantâ€ reports, data conformant web services, or data conformant components delivered by teams.
Agile software development teams work in priority order, as you saw in Figure 1, and thereby they maximize stakeholder return on investment (ROI) by focusing on delivering the highest value functionality at any given time. If all of your development teams work in this manner, and because agile MDM work is embedded in the development process, you similarly will maximize the ROI on your MDM efforts.
This differs from traditional MDM efforts which try to capture the required metadata in a “big modeling up front (BMUF)” style effort. This is often in the form of a multi-month if not multi-year effort run by a DM team in parallel to actual software development initiatives. There are several problems with the traditional approach to MDM:
- It can be months, if not years, before tangible results are produced. Although many organizations believe that they can succeed at long-term efforts such as this, few actually can in practice. Larissa T. Moss points out in Critical Success Factors for MDM that in the past the data community had a very poor track record with similar metadata schemes which had long-term paybacks.
- Immediate efficiencies are forgone. Although the MDM effort may inevitably produce a comprehensive repository of metadata it misses immediate opportunities to provide actual value to the business. If the MDM effort does eventually achieve a positive ROI it will be lower as a result.
- Needless work will occur. People are not good at judging up front what they want, we’ve found that when you define detailed requirements specifications early in the development lifecycle nearly half of the identified functionality is never used by end users. Therefore it is likely that a traditional approach to MDM where you try to comprehensively define the required metadata is equally likely to result in significant wastage.
Agile software developers typically take a test-first approach to development, also called test-driven development (TDD) or behavior driven development (BDD), and this is not only possible for data professionals it is highly desirable. With a test-driven approach you write a single test before doing the work to fulfill that test, in effect creating a detailed specification for that functionality before implementing it. Better still, you can run the tests on a regular basis and thereby validate your work in progress. A test-first approach, in combination with other agile testing activities, greatly increases the quality of the work delivered. This shouldn’t come as a surprise â€” testing as early as you possibly can, and fixing the defects that you do find, and doing so more often, leads to improved quality.
Traditional teams often take a review-based approach to development, particularly early in the lifecycle when they have no software to work with. Although better than doing nothing at all, reviews prove ineffective in practice when compared with regression testing when it comes to quality. Reviews have a very long feedback cycle, often weeks if not months, and as a result the costs of addressing defects are much higher than techniques (such as TDD) with shorter feedback cycles. If someone can offer actual value in a review, why not have them involved with the actual work to begin with? In short, reviews often seem to be a stop-gap measure which compensate for poor collaboration or lack of quality focus earlier in the lifecycle. It is far better to address the real problem, hopefully with Agile strategies, than to simply put a band-aid over it and hope for the best. And the numbers clearly show that traditional approaches to data quality are failing in practice â€” The Data Warehouse Institute (TDWI) reports that data quality problems result in a loss of over $600 Billion annually in the United States.
Traditional governance often focuses on command-and-control strategies which strive to manage and direct development teams in an explicit manner. This approach is akin to herding cats because you’ll put a lot of work into the governance effort but achieve very little in practice. Agile/lean data governance focuses on collaborative strategies that strive to enable and motivate team members implicitly. This is akin to leading cats â€” if you grab a piece of raw fish, cats will follow you wherever you want to go.
An important component of data management is governance of the MDM metadata and of the source data which it represents. My experience is that a traditional, command-and-control approach where the DM group “owns” the data assets within your organization and has a “death-lock” on your databases proves dysfunctional in practice. At best it results in the DM group becoming a bottleneck within your IT department and at worst it results in the development teams going around the DM group in order to get their work done, effectively negating your data governance efforts (some alarming statistics on this in a minute). A better approach is to:
- Include data professionals as active participants on development teams. When your DM group is external to development teams it can foster a them vs. us mentality within your IT organization if you’re not very careful. You don’t need to have an external group to run your data governance activities, instead individual data professionals can do so as part of their responsibilities on development teams in a collaborative and timely manner. This is one of the fundamental concepts of the Agile Data method.
- Streamline data standards and supporting activities. When data standards, including master data definitions, are sensible, easy to understand, and easy to access then there is a significantly greater chance that people will actually follow the standards in practice. When you force people to conform to standards, when it make it onerous for them to do so, then you reduce the chance that they will actually do so. Your data administration efforts need to be based on collaboration and enablement, not command-and-control.
- Educate developers. Developers need to understand why your MDM efforts are important, what the benefits are, and how to work together with your DM team. When they know why something needs to be done, and how to do it effectively, chances are much better that they’ll actually do it.
The real challenges with MDM have nothing to do with technology but instead with people. In many organizations there is a significant cultural impedance mismatch that you need to overcome between the data management group and the development teams. This will take time. This mismatch was revealed in the results of the data management survey performed by Dr. Dobb’s Journal in the Fall of 2006. The survey found that 66% of respondents indicated the need to go around their data groups at time, and that of those people 75% indicated that they did so because the data groups were too slow to respond to their requests, provided too little real value to the development teams, or were simply too difficult to work with.
The data community must recognize that we can do better than the traditional strategy for MDM, and for data management in general. Although many data professionals prefer traditional, documentation-heavy approaches they must recognize that the rest of the IT community has moved on and have adopted more effective ways of working. An Agile approach to MDM is more effective than a traditional approach, for several reasons:
- The traditional data management (DM) track record is poor. If you apply traditional DM strategies to MDM this it is fair to assume that you will experience the same levels of success achieved with Customer Relationship Management (CRM) and metadata repositories in the past. To be fair perhaps organization’s expectations weren’t realistic, but if your DM group is making similar promises about MDM that were made about CRM a few years ago then you have cause for concern. Furthermore, as Larissa T. Moss points out in Critical Success Factors for MDM, the data community has clearly struggled in the past with similar meta-data schemes. We need to get off the traditional treadmill and start adopting strategies which have a chance of succeeding in practice.
- The Agile track record is better. Agile enjoys a higher success rate due to its greater focus on return on investment (ROI), it’s increased ability to meet the actual needs of business stakeholders, and its greater focus on quality.
- The Agile community leads in DM thinking. The agile data community represents the leading edge of data-oriented techniques. This community has lead the way in evolutionary/agile data modeling, database refactoring, database testing, database integration, and even agile administration techniques. We’ve addressed many of the issues which have thwarted the traditional community for years, particularly when it comes to data quality.
Master Data Management (MDM), when implemented correctly, can provide significant value to your organization. Unfortunately, our track record with similar efforts in the past, in particular Customer Relationship Management (CRM) and metadata repositories before that, were less than ideal. I believe that you will greatly increase your chance of success by apply agile techniques such as working in an evolutionary manner, taking a usage-driven approach, focusing on measurable results, working collaboratively, delivering quality through testing, and adopting a lean approach to data governance.