Overcoming The Object-Relational Impedance Mismatch
Object-oriented technology supports the building of applications out of objects that have both data and behavior. Relational technologies support the storage of data in tables. Manipulation of that data occurs via data manipulation language (DML) internally within the database via stored procedures and externally via SQL calls. Some relational databases go further and now support objects internally as well, a trend that will only grow stronger over time. It is clear that object technologies and relational technologies are in common use in most organizations. Both technologies are here to stay and both are being used together to build complex software-based systems. It is clear that the fit between the two technologies isn’t perfect, that there is an “impedance mismatch” between the two.
In the early 1990s the differences between the two approaches was labeled the “object-relational impedance mismatch”, or simply “impedance mismatch” for short. These labels are still in common use today. Much of the conversation about the impedance mismatch focus on the technical differences between object and relational technologies. Confusingly, there are deceptive similarities there are also subtle yet important differences. Luckily, there are strategies for overcoming the O/R impedance mismatch.
1. The O/R Impedance Mismatch
Why does this impedance mismatch exist? The object-oriented paradigm is based on proven software engineering principles. The relational paradigm, however, is based on proven mathematical principles. Because the underlying paradigms are different the two technologies do not work together seamlessly. The impedance mismatch becomes apparent when you look at the preferred approach to access. With the object paradigm you traverse objects via their relationships whereas with the relational paradigm you join the data rows of tables. This fundamental difference results in a non-ideal combination of object and relational technologies?
To succeed using objects and relational databases together you need to understand both paradigms, and their differences, and then make intelligent tradeoffs based on that knowledge. Relational Databases 101 overviews relational databases and Data Modeling 101 describes the basics of data modeling, providing you with sufficient background to understand the relational paradigm. Similarly Object-Orientation 101 overviews object-orientation and the UML, explaining the basics of the object-oriented paradigm. Until you understand both paradigms, and gain real-world experience working in both technologies, it will be very difficult to see past the deceptive similarities between the two.
2. Deceptive Similarities, Subtle Differences
The easiest similarity/difference to observe is the different types in object languages and in relational databases. On the subtle side, Java has a string and an int whereas Oracle has a varchar and a smallint. Luckily, it’s fairly straightforward to convert back and forth. However, on the not-so-subtle side Java has collections whereas Oracle has tables, clearly not the same concepts. Oracle has blobs whereas Java has objects, once again clearly not the same concepts.
Figure 1 depicts a physical data model (PDM) using UML data modeling notation. Figure 2 depicts a UML class diagram. On the surface they look like very similar diagrams, and on the surface they in fact are. It’s how you arrive at the two diagrams that can be very different.
Figure 1. A physical data model (UML notation).
Let’s consider the deceptive similarities between the two diagrams. Both diagrams:
- Depict structure, the PDM shows four database tables and the relationships between them whereas the UML class diagram shows four classes and their corresponding relationships.
- Depict data, the PDM shows the columns within the tables and the class model the attributes of the classes.
- Indicate behavior, the Customer table of Figure 1 includes a delete trigger and the Customer class of Figure 2 includes two operations.
- Use similar notations, something that I did on purpose, although the UML data modeling notation is little different than other industry notations.
Differences in your modeling approaches will result in subtle differences between your object schema and your data schema:
- By considering both data and behavior in the class diagram the modeler created a different structure than in the data model that only considered data
- Data normalization in data modeling versus class normalization in class modeling
- The application of data analysis patterns (Hay 1996) versus object-oriented analysis patterns (Fowler 1997; Ambler 1997) and design patterns (Gamma et. al. 1995)
There are differences in the types of relationships that each model supports, with class diagrams being slightly more robust than physical data models for relational databases. This is because of the inherent nature of the technologies. For example, you see that there is a many-to-many relationship between Customer and Address in Figure 2. This relationship is resolved in Figure 1 via the CustomerAddress associative table. Object technology natively supports this type of relationship but relational databases do not, hencethe associative table.
Figure 3 also reveals a schism within the object community. It is common practice to not show keys on class diagrams. However, for a relational database to store your objects each object must maintain the data to successfully write itself, and its relationships, to the database. This is something that I call “shadow information“, which you can see has been added in Figure 3 in the form of attributes with implementation visibility (no visibility symbol is shown). For example the Address class now includes the attribute addressID which corresponds to AddressID in the Address table. Similarly, the attributes customers, state, and zipCode maintain the relationships to the Customer, State, and ZipCode classes respectively.
Figure 3. A fully attributed UML class model.
The schism is that the object community has a tendency to underestimate the importance of object persistence. Symptoms of this problem include:
- The lack of an official data model in the UML (see The Unofficial UML Data Modeling Profile)
- The practice of not modeling keys on class diagrams
- The misguided belief that you can model the persistent aspects of your system by applying a few stereotypes to a UML class diagram
- Many popular OOA&D books spend little or no time discussing object persistence issues
Yet in reality object developers discover that they need to spend significant portions of their time making their object persistent, perhaps because they’ve run into performance problems after improper mappingsor perhaps because they’ve discovered that they didn’t take legacy data constraints into account in their design. My experience is that persistence is a significant blind spot for many object developers, one that promotes the cultural impedance mismatch.
3. Strategies for Overcoming the Object-Relational Impedance Mismatch
Object and relational technologies are real, you are very likely working with both, and they are here to stay. The two technologies differ, the differences referred to as “the object-relational impedance mismatch”. In this article you learned that there are two aspects to the impedance mismatch: technical and cultural.
The technical mismatch is overcome by ensuring that all team members understand the basics of both technologies. Furthermore, actively reduce coupling through encapsulating access to your database(s), through clean database design, and by keeping quality high through database refactoring.
4. The Cultural Impedance Mismatch
There has been less attention spent on the cultural differences between the developer and the data communities. This is unfortunate. These differences are often revealed when developers and data professionals argue regarding the approach that should be taken by a team. For a detailed discussion, see The Cultural Impedance Mismatch Between Data Professionals and Application Developers.
Recommended Reading
This book, Choose Your WoW! A Disciplined Agile Approach to Optimizing Your Way of Working (WoW) – Second Edition, is an indispensable guide for agile coaches and practitioners. It overviews key aspects of the Disciplined Agile® (DA™) tool kit. Hundreds of organizations around the world have already benefited from DA, which is the only comprehensive tool kit available for guidance on building high-performance agile teams and optimizing your WoW. As a hybrid of the leading agile, lean, and traditional approaches, DA provides hundreds of strategies to help you make better decisions within your agile teams, balancing self-organization with the realities and constraints of your unique enterprise context.
I also maintain an agile database books page which overviews many books you will find interesting.