Agile Data Logo

Why Data Models Shouldn't Drive Object Models (And Vice Versa)

Follow @scottwambler on Twitter!

A common problem that I run into again and again is the idea that a data model should drive the development of your objects. This idea comes in two flavors: your physical data schema should drive the development of your objects and that a conceptual/logical data model should be (almost) completely developed up front before you begin to design your objects. Both of these views are inappropriate for non-agile teams and clearly wrong for agile teams. Let's explore this issue in more depth. Why do people want to base their object models on existing data schemas? First, there is very likely a desire to reuse the existing thinking that went behind the current schema. I'm a firm believer in reusing things, but I prefer to reuse the right things. There is an impedance mismatch between the object and relational paradigms, and this mismatch leads object and data practitioners to different designs. You also saw in Object Orientation 101 that object developers apply different design techniques and concepts than the techniques and concepts described in Data Modeling 101 that data modelers apply. Second, the database owner seeks to maintain or even enhance their political standing within your organization by forcing you to base your application on their existing design. Third, the people asking you to take this approach may not understand the implications of this decision, or that there are better ways to proceed.

Why is basing your object model on an existing data schema a bad idea? First, your legacy database design likely has some significant problems. In practice, I look at existing physical data models to get an idea of what is currently going on, and to get a feel for the technical constraints that I'll have to work with, but I won't unnaturally constrain my application with a bad data design. Second, even if the existing database design is very good there can be significant differences in the way that you map objects to relational databases. Consider Figure 1 which depicts three object schemas, all of which can be correctly mapped to the data schema on the right. Now pretend you have the data schema as your starting point. Which of the three object schemas would you generate from it? Likely the top one, which may in fact be correct for your situation, but then again maybe one of the other two schemas could have been better choices. Yes, all of the models in Figure 1 could be improved, but I needed a simple example that showed how different object schemas mapping to the same data schema.

Figure 1. Several class structures that correctly map to the same table.

Why do people want to create (nearly) complete data models early in the initiative? There are several reasons:

  1. Existing culture. This is the way it's always been done, this is the way that they like, therefore this is the way that they're going to continue to work.

  2. Over specialization. Data modeling might be the only thing they know, or at least it's what they prefer to specialize in. When all you have is a hammer, not only does every problem look like a nail but nails are clearly the most important problem that needs to be addressed right now.

  3. This reflects a serial mindset. Many IT professionals have little or no experience taking an iterative and incremental approach to development, let alone taking it one step further to take an evolutionary/emergent approach.

  4. People assume that the cost of change is high. This is completely true when you're following a non-agile approach, but with modern techniques such as database refactoring and Agile Modeling the cost of change becomes much lower because these techniques support change.

  5. Lack of teamwork. Existing processes dictate that the data group will go off and develop the database while the application programmers go off and build the application. This may have worked for COBOL teams but it doesn't work for agile software development teams - there is one team that works together, not several teams that work in isolation.

  6. They don't understand the true costs. Many people are unaware that a serial approach to development results in significant wastage by the time the application is finally delivered.

The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2 Why is basing your object model on a conceptual or logical data model a bad idea? Actually, it's not such a bad idea, as long as you're taking an iterative and incremental approach, the real problem is the big design up front (BDUF) approach that many data professionals seem to prefer. It is possible to take an evolutionary approach when conceptual modeling, but you have to choose to work this way. Flexibility in your approach is critical to success. However, there are much better options. Although the object role modeling (Halpin 2001) notation is very good, I have found that Class Responsibility Collaborator (CRC) cards to be a very useful technique for domain modeling with my stakeholders. Similarly, although logical data models can be quite useful I personally find UML class models much more expressive due to their ability to depict behavior as well as data. Although David Hay argues in his excellent book Requirements Analysis that you should not use UML class diagrams for domain or analysis modeling, my experience is that you can do so quite easily if you choose not to (Hay also holds this view, although he leans towards data models whereas I lean towards UML-based models). However, I have to concede his point that many object modelers struggle with analysis, but in the end that's a separate issue.

So, should you blindly base your data schema on your object schema? No! You need a much more robust approach. Figure 2 shows the three data schemas that would result from applying each of the three inheritance mapping strategies. As you can see mapping multiple inheritance is fairly straightforward, there aren't any surprises in Figure 2. The point is that it is possible for a single object schema to correctly map to several data schemas.

Figure 2. Mapping multiple inheritance.

You saw in Figure 1 that it is possible for several object schemas to map to a single data schema, and in Figure 2 for a single object schema to map to several data schemas. There is a skill to successfully mapping objects to relational databases, you can't simply create one model, press the "magic CASE tool button", and come up with the right answer every time.

My advice is to:

The real question isn't "what model should drive the effort" it should be "how can we work together effectively. Time to end the "religious battles" once and for all, a very good first step in overcoming the cultural impedance mismatch within the IT industry.