Most modern business application development projects
use object technology such as Java or C# to build the application software and
relational databases to store the data. This
isn't to say that you don't have other options, there are many applications
built with procedural languages such as COBOL and many systems will use object
databases or XML databases to store data. However,
because object and relational technologies are by far the norm that's what I
assume you're working with in this article.
If you're working with different storage technologies then many of the
concepts are still applicable, albeit with modification (don't worry, Realistic
XML overviews mapping issues pertaining to objects and XML).
Unfortunately we need to deal with the
object
relational (O/R) impedance mismatch, and to do so you need to understand two
things: the
process of mapping objects to relational databases and how to implement those mappings. in this article the term "mapping" will be used to refer to how objects and
their relationships are mapped to the tables and relationships between them in a
database. As you'll soon find out
it isn't quite as straightforward as it sounds although it isn't too bad
either.
Table of Contents
- Basic mapping concepts
- Mapping inheritance structures
- Mapping object relationships
- Mapping class-scope properties
- Performance tuning
- Implementation impact on your objects
- Implications for Model Driven Architecture (MDA)
- Patternizing what you have learned
When learning how to map objects to relational databases the place
to start is with the data attributes of a class.
An attribute will map to zero or more columns in a relational
database. Remember, not all
attributes are persistent, some are used for temporary calculations.
For example, a Student
object may have an averageMark attribute that is needed within your
application but isn't saved to the database because it is calculated by the
application. Because some
attributes of an objects are objects in their own right, a
Customer object has an Address object as an attribute
- this really reflects an association between the two classes that would
likely need to be mapped, and the attributes of the
Address class itself will need to be mapped.
The important thing is that this is a recursive definition: At some point
the attribute will be mapped to zero or more columns.
The easiest mapping you will ever have is a property
mapping of a single attribute to a single column. It is even simpler when the each have the same basic types,
e.g. they're both dates, the attribute is a string and the column is a char,
or the attribute is a number and the column is a float.
Mapping Terminology
Mapping (v).
The act of determining how objects and their relationships are
persisted in permanent data storage, in this case relational databases.
Mapping (n). The definition of how an
object's property or a relationship is persisted in permanent storage.
Property.
A data attribute, either implemented as a physical attribute such
as the string firstName or as a virtual attribute implemented via
an operation such as getTotal() which returns the total of an
order.
Property mapping.
A mapping that describes how to persist an object's property.
Relationship mapping.
A mapping that describes how to persist a relationship
(association, aggregation, or composition) between two or more objects. |
It can make it easier to think that classes map to tables, and in
a way they do, but not always directly.
Except for very simple databases you will never have a one-to-one mapping
of classes to tables, something you will see later in this article with regards
to inheritance mapping.
However, a common theme that you will see throughout
this article is that a one class to one table mapping is preferable for your
initial mapping (performance tuning may
motivate you to refactor your mappings).
For now, let's keep things simple.
Figure 1 depicts two models, a
UML class diagram and a physical data model which follows the UML
data modeling profile. Both
diagrams depict a portion of a simple schema for an order system.
You can see how the attributes of the classes could be mapped to the
columns of the database. For
example, it appears that the dateFulfilled attribute of the Order
class maps to the DataFulfilled column of the Order table and that
the numberOrdered attribute of the OrderItem class maps to the NumberOrdered
column of the OrderItem table.
Figure 1. Simple mapping example.

Note that these initial property mappings were easy to determine for several
reasons. First, similar naming
standards were used in both models, an aspect of Agile
Modeling (AM)'s
Apply Modeling Standards practice.
Second, it is very likely that the same people created both models.
When people work in separate teams it is quite common for their solutions
to vary, even when the teams do a very good job, because they make different
design decisions along the way. Third,
one model very likely drove the development of the other model.
In Different
Projects Require Different Strategies I argued that when you are building a
new system that your
object schema should drive the
development of your database schema.
The easiest mapping you will ever have is a property
mapping of a single attribute to a single column. It is even simpler when the each have the same basic types,
e.g. they're both dates, the attribute is a string and the column is a char,
or the attribute is a number and the column is a float.
Even though the two schemas depicted in
Figure
1 are very similar there are differences.
These differences mean that the mapping isn't going to be perfect.
The differences between the two schemas are:
-
There are several attributes for tax in the object
schema yet only one in the data schema.
The three attributes for tax in the Order class presumably
should be added up and stored in the tax column of the Order
table when the object is saved. When
the object is read into memory, however, the three attributes would need to
be calculated (or a lazy initialization approach would need to be taken and
each attribute would be calculated when it is first accessed).
A schema difference such as this is a good indication that the
database schema needs to be refactored to split the tax column into three.
-
The data schema indicates
keys whereas the object
schema does not. Rows in tables
are uniquely identified by
primary keys and relationships between rows are
maintained through the use of foreign keys.
Relationships to objects, on the other hand, are implemented via
references to those objects not through foreign keys.
The implication is that in order to fully persist the object
data,
and the relationships which the objects are involved in, that the objects need to know about the key values
used in the database to identify them.
This additional information is called "shadow
information".
-
Different types are used in each schema. The subTotalBeforeTax
attribute of Order is of the type Currency whereas the SubTotalBeforeTax
column of the Order table is a float.
When you implement this mapping you will need to be able to convert
back and forth between these two representations without loss of
information.
Shadow information is any data that objects need to maintain,
above and beyond their normal domain data, to persist themselves.
This typically includes primary key information, particularly when the
primary key is a surrogate key that has no business meaning, concurrency
control markings such as timestamps or incremental counters, and versioning
numbers. For example, in
Figure
1 you see that the Order table has an OrderID column used as a
primary key and a LastUpdate column that is used for optimistic
concurrency control that the Order class does not have.
To persist an order object properly the Order class would
need to implement shadow attributes that maintain these values.
Figure 2 shows a
detailed design class model for the Order and OrderItem classes.
There are several changes from Figure
1. First, the new diagram shows
the shadow attributes that the classes require to properly persist themselves.
Shadow attributes have an implementation visibility, there is a space in
front of the name instead of a minus sign, and are assigned the stereotype
<<persistence>> (this is not a UML standard).
Second, it shows the scaffolding attributes required to implement the
relationship the two classes. Scaffolding
attributes, such as the orderItems vector in Order, also have an
implementation visibility. Third, a
getTotalTax() operation was added to the Order class to calculate
the value required for the tax column of the Order table.
This is why I use the term property mapping instead of attribute mapping
- what you really want to do is map the properties of a class, which sometimes
are implemented as simple attributes and other times as one or more operations,
to the columns of a database. |
 |
Figure 2. Including
"shadow information" on a class diagram.

One type of shadow information that I have not discussed yet is a boolean
flag to indicate whether an object currently exists in the database. The
problem is that when you save data to a relational database you need to use a
SQL update statement if the object was previously retrieved from the database
and a SQL insert statement if the data does not already exist. A common
practice is for each class to implement an isPersistent boolean flag, not
shown in Figure 2, that is set
to true when the data is read in from the database and set to false when the
object is newly created.
It is a common style
convention in the UML community to not show shadow information, such as keys
and concurrency markings, on class diagrams.
Similarly, the common convention is to not model scaffolding code either.
The idea is that everyone knows you need to do this sort of thing, so why waste
your time modeling the obvious?
Shadow information doesn't necessarily need to be implemented by the
business objects, although your application will need to take care of it
somehow. For example, with
Enterprise
JavaBeans (EJBs) you store primary key information outside of EJBs in
primary key classes, the individual object references a corresponding primary
key object. The Java Data Object
(JDO) approach goes one step further and implement shadow information in the
JDOs and not the business objects.
Figure 3 depicts the meta data representing
the property mappings required to persist the Order and OrderItem
classes of Figure 2.
Meta data is information about data.
Figure 3 is important for several reasons.
First, we need some way to represent mappings.
We could put two schemas side by side, as you see in
Figure
1, and then draw lines between them but that gets complicated very quickly.
Another option is a tabular representation that you see in
Figure
3. Second, the concept of
mapping meta data is critical to the functioning of persistence
frameworks which are a database
encapsulation strategy that can enable agile database techniques.
Figure 3. Meta data representing the
property maps.
Property
|
Column
|
Order.orderID
|
Order.OrderID
|
Order.dateOrdered
|
Order.DateOrdered
|
Order.dateFulfilled
|
Order.DateFulfilled
|
Order.getTotalTax()
|
Order.Tax
|
Order.subtotalBeforeTax
|
Order.SubtotalBeforeTax
|
Order.shipTo.personID
|
Order.ShipToContactID
|
Order.billTo.personID
|
Order.BillToContactID
|
Order.lastUpdate
|
Order.LastUpdate
|
OrderItem.ordered
|
OrderItem.OrderID
|
Order.orderItems.position(orderItem)
|
OrderItem.ItemSequence
|
OrderItem.item.number
|
OrderItem.ItemNo
|
OrderItem.numberOrdered
|
OrderItem.NumberOrdered
|
OrderItem.lastUpdate
|
OrderItem.LastUpdate
|
The naming convention that I'm using is reasonably
straightforward: Order.dateOrdered refers to the dateOrdered
attribute of the Order class. Similarly
Order.DateOrdered refers to the DateOrdered column of the Order
table. Order.getTotalTax()
refers to the getTotalTax() operation of Order and Order.billTo.personID
is the personID attribute of the Person object referenced by the Order.billTo
attribute. Likely the most
difficult property to understand is Order.orderItems.position(orderItem)
which refers to the position within the Order.orderItems vector of the
instance of OrderItem that is being saved.
Figure 3 hints at an important part
of the O/R impedance
mismatch between object technology and relational technology.
Classes implement both behavior and data whereas relational database
tables just implement data. The end
result is that when you're mapping the properties of classes into a relational
database you end up mapping operations such as getTotalTax() and position()
to columns. Although it didn't
happen in this example, you often need to map two operations that represent a
single property to a column - one operation to set the value, e.g. setFirstName(),
and one operation to retrieve the value, e.g. getFirstName().
These operations are typically called setters and getters respectively,
or sometimes mutators and accessors.
Whenever a key column is mapped to a property of a
class, such as the mapping between OrderItem.ItemSequence and Order.orderItems.position(orderItem),
this is really part of the effort of relationship mapping, discussed later in
this article. This is because keys
implement relationships in relational databases.
See the essay Evolutionary
Development.
Relational databases do not natively support inheritance,
forcing you to map the inheritance structures within your object schema to your
data schema. Although there is
somewhat of a backlash against inheritance within the object community, due in
most part to the fragile base class problem, my experience is that this problem
is mostly due to poor encapsulation practices among object developers than with
the concept of inheritance. What I'm saying is
that the fact you need to do a little bit of work to map an inheritance
hierarchy into a relational database shouldn't dissuade you from using
inheritance where appropriate.
The concept of inheritance throws in several interesting
twists when saving objects into a relational DB.
How do you organize the inherited attributes within your data
model? In this section you'll see
that there are three primary solutions for mapping inheritance into a relational
database, and a fourth supplementary technique that goes beyond inheritance
mapping. These techniques are:
To explore each technique I will discuss how to map the two
versions of the class hierarchy presented in Figure
4.
The first version depicts three classes - Person, an abstact
class, and two concrete classes, Employee and Customer.
You know that Person is abstract because its name is shown in
italics. In older versions of the
UML the constraint "{abstract}" would have been used instead.
The second version of the hierarchy adds a fourth concrete class to the
hierarchy, Executive. The idea is
that you have implemented the first class hierarchy and are now presented with a
new requirement to support giving executives, but not non-executive employees,
fixed annual bonuses. The Executive class was added to support this
new functionality.
For
the sake of simplicity I have not modeled all of the attributes of the classes,
nor have I modeled their full signatures, nor have I modeled any of the
operations.
This diagram is just barely good enough for my purpose, in
other words it is an agile model. Furthermore
these hierarchies could be approved by applying the
Party analysis
pattern or the
Business Entity analysis pattern. I
haven't done this because I need a simple example to explain mapping
inheritance hierarchies, not to explain the effective application of analysis
patterns - I always follow
AM's
Model With A
Purpose principle.
Figure 4.
Two versions of a simple class hierarchy.
Inheritance can also be a problem when it's misapplied
- for example, the hierarchy in Figure 4 could be better modeled via
the Party (Hay
1996, Fowler
1997) or the Business Entity (Ambler
1997) patterns. For
example, if someone can be both a customer and an employee you would have
to objects in memory for them, which may be problematic for your
application. I've chosen
this example because I needed a simple, easy to understand class hierarchy
to map. |
Following this strategy you store all the attributes of the classes in one
table.
Figure
5 depicts the data model for the class hierarchies of
Figure
4 when this approach is taken. The
attributes of each the classes are stored in the table Person, a good
table naming strategy is to use the name of the hierarchy's root class, in a
very straightforward manner.
Figure 5. Mapping to a single table.

Two columns have been added to the table - PersonPOID and
PersonType. The first column
is the primary key for the table, you know this because of the <<PK>>
stereotype, and the second is a code indicating whether the person is a
customer, an employee, or perhaps both. PersonPOID
is a persistent object identifier (POID), often simply called an object
identifier (OID), which is a surrogate key.
I could have used the optional stereotype of <<Surrogate>> to
indicate this but chose not to as POID implies this, therefore indicating the
stereotype would only serve to complicate the diagram (follow the
AM
practice Depict Models Simply).
Data
Modeling 101 discusses surrogate
keys in greater detail.
The PersonType column is required to identify the type of
object that can be instantiated from a given row.
For example the value of E would indicate the person
is an employee, C would indicate customer, and B would indicate
both. Although this approach is
straightforward it tends to break down as the number of types and combinations
begin to grow. For example, when
you add the concept of executives you need to add a code value, perhaps X,
to represent this. Now the value of
B, representing both, is sort of goofy.
Furthermore you might have combinations involving executives now, for
example it seems reasonable that someone can be both an executive and a customer
so you'd need a code for this. When
you discover that combinations are possible you should consider applying the Replace
Type Code With Booleans database refactoring, as you see in
Figure
6.
For the sake of simplicity I did not include columns for concurrency control,
such as the time stamp column included in the tables of
Figure
2, nor did I include columns for data versioning.
Figure 6. A refactored
approach.

With this approach a table is created for each concrete class, each table
including both the attributes implemented by the class and its inherited
attributes.
Figure
7 depicts the physical data model for the class hierarchy of
Figure
4 when this approach is taken. There
are tables corresponding to each of the Customer and Employee
classes because they are concrete, objects are instantiated from them, but not Person
because it is abstract. Each table
was assigned its own primary key, customerPOID and employeePOID
respectively. To support the
addition of Executive all I needed to do was add a corresponding table
with all of the attributes required by executive objects.
Figure 7. Mapping concrete classes
to tables.

Following this strategy you create one table per class, with one
column per business attributes and any necessary identification information (as
well as other columns required for concurrency control and versioning).
Figure 8 depicts the physical
data model for the class hierarchy of Figure 4
when each class is mapped to a single table.
The data for the Customer class is stored in two tables, Customer
and Person, therefore to retrieve this data you would need to join the
two tables (or do two separate reads, one to each table).
The application of keys is interesting.
Notice how personPOID is used as the primary key for
all of the tables. For the Customer, Employee, and Executive
tables the personPOID is both a primary key and a foreign key.
In the case of Customer, personPOID is its primary key and
a foreign key used to maintain the relationship to the Person table.
This is indicated by application of two stereotypes, <<PK>>
and <<FK>>. In older
versions of the UML
it wasn't permissible to assign several stereotypes to a single model element
but this restriction was lifted in UML version 1.4.
Figure 8. Mapping each class to
its own table.

A common modification that you may want to consider is the
addition of a type column, or boolean columns as the case may be, in the Person
table to indicate the applicable subtypes of the person.
Although this is additional overhead it makes some types of queries
easier. The addition of views is
also an option in many cases, an approach that I prefer over the addition of
type or boolean columns because they are easier to maintain.
A fourth option for mapping inheritance structures into a relational database
is to take a generic, sometimes called meta-data driven approach, to mapping
your classes. This approach isn't
specific to inheritance structures, it supports all forms of mapping.
In Figure 9 you see a data
schema for storing the value of attributes and for traversing inheritance
structures. The schema isn't
complete, it could be extended to map associations for example, but it's
sufficient for our purposes. The
value of a single attribute is stored in the Value table, therefore to
store an object with ten business attributes there would be ten records, one for
each attribute. The Value.ObjectPOID
column stores the unique identifier for the specific object (this approach
assumes a common key strategy across all objects, when this isn't the case
you'll need to extend this table appropriately).
The AttributeType table contains rows for basic data
types such as data, string, money, integer and so on.
This information is required to convert the value of the
object attribute into the varchar stored in Value.Value.
Figure 9. A generic data schema
for storing objects.

Let's work through an example of mapping a single class to this
schema. To store the OrderItem
class in Figure 2 there would
be three records in the Value table.
One to store the value for the number of items ordered, one to store the
value of the OrderPOID that this order item is part of, and one to store
the value of the ItemPOID that describes the order item.
You may decide to have a fourth row to store the value of the lastUpdated
shadow attribute if you're taking an optimistic locking approach to concurrency
control. The Class table would include a row for the OrderItem
class and the Attribute table would include one row for each attribute
stored in the database (in this case either 3 or 4 rows).
Now let's map the inheritance structure between Person
and Customer, shown in Figure 4, into
this schema. The Inheritance
table is the key to inheritance mapping. Each
class would be represented by a row in the Class table.
There would also be a row in the Inheritance table, the value of Inheritance.SuperClassPOID
would refer to the row in Class representing Person and Inheritance.SubClassPOID
would refer to the row in Class representing Customer. To map the
rest of the hierarchy you require one row in Inheritance for each
inheritance relationship.
Until this point I have focused on mapping single inheritance
hierarchies, single inheritance occurs when a subclass such as Customer
inherits directly from a single parent class such as Person.
Multiple inheritance occurs when a subclass has two or more direct
superclasses, such as Dragon directly inheriting from both Bird
and Lizard in Figure 10.
Multiple inheritance is generally seen as a questionable feature of an
object-oriented language, since 1990 I have only seen one domain problem where
multiple inheritance made sense, and as a result most languages choose not to
support it. However, languages such
as C++ and Eiffel do support it so you may find yourself in a situation where
you need to map a multiple inheritance hierarchy to a relational database.
Figure 10 shows
the three data schemas that would result from applying each of the three
inheritance mapping strategies. As
you can see mapping multiple inheritance is fairly straightforward, there aren't
any surprises in Figure 10.
The greatest challenge in my experience is to identify a reasonable table
name when mapping the hierarchy into a single table, in this case Creature
made the most sense.
Figure 10. Mapping multiple
inheritance.

None of these mapping strategies are ideal for all situations, as
you can see in Table 1.
My experience is that the easiest strategy to work with is to have one
table per hierarchy at first, then if you need to refactor your schema
according. Sometimes I'll start
by applying the one table per class strategy whenever my team is motivated to
work with a "pure design approach". I
stay away from using one table per concrete class because it typically results
in the need to copy data back and forth between tables, forcing me to refactor
it reasonably early in the life of the project anyway. I rarely use the generic
schema approach because it simply doesn't scale very well.
It is important to understand that you can combine the first three
strategies - one table per hierarchy, one table per concrete class, and one
table per class - in any given application.
You can even combine these strategies in a single, large hierarchy.
Table 1. Comparing the
inheritance mapping strategies.
Strategy
|
Advantages
|
Disadvantages
|
When to Use
|
One table per
hierarchy
|
Simple approach.
Easy to add new classes, you just need to add new
columns for the additional data.
Supports polymorphism by simply changing the type of
the row.
Data access is fast because the data is in one table.
Ad-hoc reporting
is very easy because all of the data is found in one table.
|
Coupling within the class hierarchy is increased
because all classes are directly coupled to the same table.
A change in one class can affect the table which can then
affect the other classes in the hierarchy.
Space potentially wasted in the database.
Indicating the type becomes complex when significant
overlap between types exists.
Table can grow quickly for large hierarchies.
|
This is a good strategy for simple and/or shallow
class hierarchies where there is little or no overlap between the types
within the hierarchy.
|
One table per concrete
class
|
Easy to do ad-hoc reporting
as all the data you need about a single class is stored in only one table.
Good performance to access a single object's data.
|
When you modify a class you need to modify its table
and the table of any of its subclasses.
For example if you were to add height and weight to the Person
class you would need to add columns to the Customer, Employee,
and Executive tables.
Whenever an object changes its role, perhaps you hire
one of your customers, you need to copy the data into the appropriate
table and assign it a new POID value (or perhaps you could reuse the
existing POID value).
It is difficult to support multiple roles and still
maintain data integrity. For
example, where would you store the name of someone who is both a customer
and an employee?
|
When changing types and/or overlap between types is
rare.
|
One table per class
|
Easy to understand because of the one-to-one mapping.
Supports polymorphism very well as you merely have
records in the appropriate tables for each type.
Very easy to modify superclasses and add new
subclasses as you merely need to modify/add one table.
Data size grows in direct proportion to growth in the
number of objects.
|
There are many tables in the database, one for every
class (plus tables to maintain relationships).
Potentially takes longer to read and write data using
this technique because you need to access multiple tables.
This problem can be alleviated if you organize your database
intelligently by putting each table within a class hierarchy on different
physical disk-drive platters (this assumes that the disk-drive heads all
operate independently).
Ad-hoc reporting
on your database is difficult, unless you add views to simulate the
desired tables.
|
When there is significant overlap between types or
when changing types is common.
|
Generic schema
|
Works very well when database access is encapsulated
by a robust persistence
framework.
It can be extended to provide meta data to support a
wide range of mappings, including relationship mappings.
In short, it is the start at a mapping meta data engine.
It is incredibly flexible, enabling you to quickly
change the way that you store objects because you merely need to update
the meta data stored in the Class, Inheritance, Attribute,
and AttributeType tables accordingly.
|
Very advanced technique that can be difficult to
implement at first.
It only works for small amounts of data because you
need to access many database rows to build a single object.
You will likely want to build a small administration
application to maintain the meta data.
Reporting
against this data can be very difficult due to the need to access several
rows to obtain the data for a single object.
|
For complex applications that work with small amounts
of data, or for applications where you data access isn't very common or
you can pre-load data into caches.
|
In addition to property and inheritance mapping you
need to understand the art of relationship mapping. There are three types of object relationships that you need
to map: association, aggregation, and composition. For now, I'm going to treat these three types of
relationship the same - they are mapped the same way although there are
interesting nuances when it comes to referential
integrity.
There are two categories of object relationships that you
need to be concerned with when mapping. The
first category is based on multiplicity and it includes three types:
-
One-to-one relationships.
This is a relationship where the maximums of each of its
multiplicities is one, an example of which is holds relationship
between Employee and Position in
Figure
11.
An employee holds one and only one position and a position
may be held by one employee (some positions go unfilled).
-
One-to-many relationships. Also known as a
many-to-one relationship, this occurs when the maximum of one multiplicity
is one and the other is greater than one.
An example is the works in relationship between Employee
and Division. An
employee works in one division and any given division has one or more
employees working in it.
-
Many-to-many relationships. This is a
relationship where the maximum of both multiplicities is greater than one,
an example of which is the assigned relationship between Employee
and Task. An employee is
assigned one or more tasks and each task is assigned to zero or more
employees.
The second category is based on directionality and it
contains two types, uni-directional relationships and bi-directional
relationships.
-
Uni-directional relationships.
A uni-directional relationship when an object knows about the
object(s) it is related to but the other object(s) do not know of the
original object. An example of
which is the holds relationship between Employee and Position
in Figure 11, indicated by the
line with an open arrowhead on it. Employee
objects know about the position that they hold, but Position objects
do not know which employee holds it (there was no requirement to do so).
As you will soon see, uni-directional relationships are easier to
implement than bi-directional relationships.
-
Bi-directional relationships.
A bi-directional relationship exists when the objects on both end of
the relationship know of each other, an example of which is the works in
relationship between Employee and Division.
Employee objects know what division they work in and Division
objects know what employees work in them.
Figure 11. Relationships between objects.

It is possible to have all six combinations of
relationship in object schemas. However
one aspect of the impedance
mismatch between object technology and relational technology is that
relational technology does not support the concept of uni-directional
relationships - in relational databases all associations are bi-directional
(relationships are implemented via foreign keys, which can be joined/traversed
in either direction).
Relationships in object schemas are implemented by a combination
of references to objects and operations. When
the multiplicity is one (e.g. 0..1 or 1) the relationship is implemented with a
reference to an object, a getter operation, and a setter operation.
For example in Figure 11 the
fact that an employee works in a single division is implemented by the Employee
class via the combination of the attribute division, the getDivision()
operation which returns the value of division, and the setDivision()
operation which sets the value of the division attribute. The
attribute(s) and operations required to implement a relationship are often
referred to as scaffolding.
When the multiplicity is many (e.g. N, 0..*, 1..*) the
relationship is implemented via a collection attribute, such as an Array
or a HashSet in Java, and operations to manipulate that array.
For example the Division class implements a HashSet
attribute named employees, getEmployees() to get the value, setEmployees()
to set the value, addEmployee() to add an employee into the HashSet,
and removeEmployee() to remove an employee from the HashSet.
When a relationship is uni-directional the code is implemented
only by the object that knows about the other object(s).
For example, in the uni-directional relationship between Employee
and Position only the Employee class implements the association.
Bi-directional associations, on the other hand, are implemented by both
classes, as you can see with the many-to-many relationship between Employee
and Task.
Relationships in relational databases are maintained through the use of
foreign keys. A foreign key is a
data attribute(s) that appears in one table that may be part of or is
coincidental with the key of another table.
With a one-to-one relationship the foreign key needs to be implemented by
one of the tables. In
Figure
12 you see that the Position table includes EmployeePOID, a
foreign key to the Employee table, to implement the association.
I could easily have implemented a PositionPOID column in Employee
instead.
Figure 12. Relationships
in a relational database.

To implement a one-to-many relationship you implement a foreign
key from the "one table" to the "many table".
For example Employee includes a DivisionPOID column to
implement the works in relationship to Division.
You could also choose to overbuild your database schema and implement a
one-to-many relationship via an associative table, effectively making it a
many-to-many relationship.
There are two ways to implement many-to-many associations in a
relational database. The first one
is to implement in each table the foreign key column(s) to the other table
several times. For example to
implement the many-to-many relationship between Employee and Task
you could have five TaskPOID columns in Employee and the Task
table could include seven EmployeePOID columns.
Unfortunately you run into a problem with this approach when
you assign more than five tasks to an employee or more than seven employees to a
single task. A better approach is
to implement what is called an associative table, an example of which is EmployeeTask
in Figure 12, which includes the
combination of the primary keys of the tables that it associates.
With this approach you could have fifty people assigned to the same task,
or twenty tasks assigned to the same person, and it wouldn't matter. The
basic "trick" is that the many-to-many relationship is converted into
two one-to-many relationships, both of which involve the associative table.
Because foreign keys are used to join tables, all relationships in
a relational database are effectively bi-directional.
This is why it doesn't matter in which table you implement
a one-to-one relationship, the code to join the two tables is virtually the
same. For example, with the
existing schema in Figure 12 the SQL
code to join across the holds relationship would be
SELECT * FROM Position, Employee
WHERE Position.EmployeePOID = Employee.EmployeePOID |
Had the foreign key been implemented in the Employee table the SQL
code would be
SELECT * FROM Position, Employee
WHERE Position.PositionPOID = Employee.PositionPOID |
A consistent key strategy within your database can
greatly simplify your relationship mapping efforts. The first step is to prefer single-column keys.
The next step is to use a globally unique surrogate key, perhaps
following the GUID
or HIGH-LOW strategies, so you are always mapping to the same type of key
column.
Now that we understand how to implement relationships in each
technology, let's see how you map them. I
will describe the mappings from the point of view of mapping the object
relationships into the relational database.
An interesting thing to remember is that in some cases you
have design choices to make. Once
again beware of the "magic CASE tool button" that supposedly automates
everything for you.
A general rule of thumb with relationship mapping is that you
should keep the multiplicities the same. Therefore
a one-to-one object relationship maps to a one-to-one data relationship, a
one-to-many maps to a one-to-many, and a many-to-many maps to a many-to-many.
The fact is that this doesn't have to be the case, you can implement a
one-to-one object relationship with to a one-to-many or even a many-to-many data
relationship. This is because a
one-to-one data relationship is a subset of a one-to-many data relationship and
a one-to-many relationship is a subset of a many-to-many relationship.
Figure
13 depicts the property mappings between the object schema of
Figure
11 and the data schema of Figure 12.
Note how I have only had to map the business properties and the
shadow
information of the objects, but not scaffolding attributes
such as Employee.position
and Employee.tasks. These scaffolding attributes are represented via the
shadow information that is mapped into the database.
When the relationship is read into memory the values of stored in the
primary key columns will be stored in the corresponding shadow attributes within
the objects. At the same time the
relationship that the primary key columns represent will be defined between the
corresponding objects by setting the appropriate values in their scaffolding
attributes.
Figure 13. Property mappings.
Property
|
Column
|
Position.title
|
Position.Title
|
Position.positionPOID
|
Position.PositionPOID
|
Employee.name
|
Employee.Name
|
Employee.employeePOID
|
Employee.EmployeePOID
|
Employee.employeePOID
|
EmployeeTask.EmployeePOID
|
Division.name
|
Division.Name
|
Division.divisionPOID
|
Division.DivisionPOID
|
Task.description
|
Task.Description
|
Task.taskPOID
|
Task.TaskPOID
|
Task.taskPOID
|
EmployeeTask.TaskPOID
|
Consider the one-to-one object relationship between Employee and Position.
Let's assume that whenever a Position or an Employee
object is read into memory that the application will automatically traverse the holds
relationship and automatically read in the corresponding object.
The other option would be to manually traverse the
relationship in the code, taking a lazy read approach where the other
object is read at the time it is required by the application.
The trade-offs of these two approaches are discussed in Implementing
Referential Integrity.
Figure
14 shows how the object relationships are mapped.
Figure 14. Mapping the
relationships.
Object Relationship
|
From
|
To
|
Cardinality
|
Automatic Read
|
Column(s)
|
Scaffolding Property
|
holds
|
Employee
|
Position
|
One
|
Yes
|
Position.EmployeePOID
|
Employee.position
|
held by
|
Position
|
Employee
|
One
|
Yes
|
Position.EmployeePOID
|
Employee.position
|
works in
|
Employee
|
Division
|
One
|
Yes
|
Employee.DivisionPOID
|
Employee.division
|
has working in it
|
Division
|
Employee
|
Many
|
No
|
Employee.DivisionPOID
|
Division.employees
|
assigned
|
Employee
|
Task
|
Many
|
No
|
Employee.EmployeePOID
EmployeeTask.EmployeePOID
|
Employee.tasks
|
assigned to
|
Task
|
Employee
|
Many
|
No
|
Task.TaskPOID
EmployeeTask.TaskPOID
|
Task.employees
|
Let's work through the logic of retrieving a single Position
object one step at a time:
-
The Position object is read into memory.
-
The holds relationship is automatically
traversed.
-
The value held by the Position.EmployeePOID
column is used to identify the single employee that needs to be read into
memory.
-
The Employee table is searched for a record with
that value of EmployeePOID.
-
The Employee object (if any) is read in and
instantiated (due to the automatic read indicated in the held by row of
Figure 14).
-
The value of the Employee.position attribute is
set to reference the Position object.
Now let's work through the logic of retrieving a single Employee
object one step at a time:
-
The Employee object is read into memory.
-
The holds relationship is automatically
traversed.
-
The value held by the Employee.EmployeePOID
column is used to identify the single position that needs to be read into
memory.
-
The Position table is searched for a row with
that value of EmployeePOID.
-
The Position object is read in and instantiated
(due to the automatic read indicated in the holds row).
-
The value of the Employee.position attribute is
set to reference the Position object.
Now let's consider how the objects would be saved to the
database. Because the relationship
is to be automatically traversed, and to maintain referential integrity, a transaction
is created. The next step is to add
update statements for each object to the transaction.
Each update statement includes both the business attributes and the key
values mapped in Figure 13.
Because relationships are implemented via foreign keys, and because those
values are being updated, the relationship is effectively being persisted.
The transaction is submitted to the database and run (see
Introduction
to Transaction Control for details).
There is one annoyance with the way the holds relationship has
been mapped into the database. Although
the direction of this relationship is from Employee to Position
within the object schema, it's been implemented from Position to Employee
in the database. This isn't a big
deal, but it is annoying. In the
data schema you can implement the foreign key in either table and it wouldn't
make a difference, so from a data point of view when everything else is equal
you could toss a coin. Had there
been a potential requirement for the holds relationship to turn into a
one-to-many relationship, something that a
change case
would indicate, then you would be motivated to implement the foreign key to
reflect this potential requirement. For
example, the existing data model would support an employee holding many
positions. However, had the object
schema been taken into account, and if there were no future requirements
motivating you to model it other wise, it would have been cleaner to implement
the foreign key in the Employee table instead.
Now let's consider the works in relationship
between Employee and Division in
Figure
11. This is a one-to-many
relationship - an employee works in one division and a single division has
many employees working in it. As
you can see in Figure 13 an interesting
thing about this relationship is that it should be automatically traversed from Employee
to Division, something often referred to as a cascading read, but not in
the other direction.
Cascading saves and cascading deletes are also possible,
something covered in the discussion of referential
integrity.
When an employee is read into memory the relationship is
automatically traversed to read in the division that they work in.
Because you don't want several copies of the same division, for example
if you have ten employee objects that all work for the IT division you want them
to refer to the same IT division object in memory.
The implication is that you will need to implement a strategy for doing
this, one option is to implement a cache that ensures only one copy of an object
exists in memory or to simply have the Division class implement it's
own collection of instances in memory (effectively a mini-cache).
If the application needs to it will read the Division object into
memory, then it will set the value of Employee.division to reference the
appropriate Division object. Similarly
the Division.addEmployee() operation will be invoked to add the employee
object into its collection.
Saving the relationship works in the same way as it does
for one-to-one relationships - when the objects are saved so are their primary
and foreign key values so therefore the relationship is automatically saved.
Every example in this article uses foreign keys, such as Employee.DivisionPOID,
pointing to the primary keys of other tables, in this case Division.DivisionPOID.
This doesn't have to be the case, sometimes a foreign key can refer to
an alternate key. For example, if
the Employee table of Figure 12
were to include a SocialSecurityNumber column then that would be an
alternate key for that table (assuming all employees are American citizens).
If this where the case you would have the option to replace the Position.EmployeePOID
column with Position.SocialSecurityNumber.
To implement many-to-many relationships you need the concept of an
associative table, a data entity whose sole purpose is to maintain the
relationship between two or more tables in a relational database. In
Figure
11 there is a many-to-many relationship between Employee and Task.
In the data schema of Figure 12 I
needed to introduce the associative table EmployeeTask to implement a
many-to-many relationship the Employee and Task tables.
In relational databases the attributes contained in an associative table
are traditionally the combination of the keys in the tables involved in the
relationship, in the case EmployeePOID and TaskPOID.
The name of an associative table is typically either the combination of
the names of the tables that it associates or the name of the association that
it implements. In this case I chose EmployeeTask over Assigned.
Notice the multiplicities in
Figure
11. The rule is that the
multiplicities "cross over" once the associative table is introduced,
as indicated in Figure 12.
A multiplicity of 1 is always introduced on the outside edges of the
relationship within the data schema to preserve overall multiplicity of the
original relationship. The original
relationship indicated that an employee is assigned to one or more tasks and
that a task has zero or more employees assigned to it. In the data schema you
see that this is still true even with the associative table in place to maintain
the relationship.
Assume that an employee object is in memory and we need a
list of all the tasks they have been assigned.
The steps that the application would need to go through are:
-
Create a SQL Select statement that joins the EmployeeTask
and Task tables together, choosing all EmployeeTask records
with the an EmployeePOID value the same as the employee we are
putting the task list together.
-
The Select statement is run against the database.
-
The data records representing these tasks are marshaled
into Task objects. Part
of this effort includes checking to see if the Task object is already
in memory. If it is then we may
choose to refresh the object with the new data values (this is a concurrency
issue).
-
The Employee.addTask() operation is invoked for
each Task object to build the collection up.
A similar process would have been followed to read in the
employees involved in a given task. To
save the relationship, still from the point of view of the Employee
object, the steps would be:
-
Start a transaction.
-
Add Update statements for any task objects that have
changed.
-
Add Insert statements for the Task table for any
new tasks that you have created.
-
Add Insert statements for the EmployeeTask table
for the new tasks.
-
Add Delete statements for the Task table any
tasks that have been deleted. This
may not be necessary if the individual object deletions have already
occurred.
-
Add Delete statements for the EmployeeTask table
for any tasks that have been deleted, a step that may not be needed if the
individual deletions have already occurred.
-
Add Delete statements for the EmployeeTask table
for any tasks that are no longer assigned to the employee.
-
Run the transaction.
Many-to-many relationships are interesting because of
the addition of the associative table. Two
business classes are being mapped to three data tables to support this
relationship, so there is extra work to do as a result.
Figure 1
depicted a classic Order and OrderItem model with an aggregation
association between the two classes. An
interesting twist is the {ordered} constraint placed on the relationship -
users care about the order in which items appear on an order.
When mapping this to a relational database you need to add an addition
column to track this information. The
database schema, also depicted in Figure
1, includes the column OrderItem.ItemSequence to persist this
information. Although this mapping
seems straightforward on the surface, there are several issues that you need
take into consideration. These
issues become apparent when you consider basic persistence functionality for the
aggregate:
-
Read the data in the proper sequence.
The scaffolding attribute that implements this relationship must be a
collection that enables sequential ordering of references and it must be
able to grow as new OrderItems are added to the Order.
In Figure 2 you see
that a Vector is used, a Java collection class that meets these
requirements. As you read the
order and order items into memory the Vector must be filled in the proper
sequence. If the values of the OrderItem.ItemSequence
column start from 1 and increase by 1 then you can simply use the value
of the column as the position to insert order items into the collection.
When this isn't the case you must include an ORDER BY clause in the
SQL statement submitted to the database to ensure that the rows appear in
order in the result set.
-
Don't include the sequence number in the key.
You have an order with five order items in memory and they have been
saved into the database.
You now insert a new order item in between the second and
third order items, giving you a total of six order items.
With the current data schema of
Figure
1 you have to renumber the sequence numbers for every order item that
appears after the new order item and then write out all them even though
nothing has changed other than the sequence number in the other order items.
Because the sequence number is part of the primary key of the OrderItem
table this could be problematic if other tables, not shown in
Figure
1, refer to rows in OrderItem via foreign keys that include ItemSequence.
A better approach is shown in Figure
15 where the OrderItemID column is used as the primary key.
-
When do you update sequence numbers after
rearranging the order items? Whenever
you rearrange order items on an order, perhaps you moved the fourth order
item to be the second one on the order, you need to update the sequence
numbers within the database. You
may decide to cache these changes in memory until you decide to write out
the entire order, although this runs the risk that the proper sequence
won't be saved in the event of a power outage.
-
Do you update sequence numbers after deleting an
order item? If you delete
the fifth of six order items do you want to update the sequence number for
what is now the fifth item or do you want to leave it as it.
The sequence numbers still work - the values are 1, 2, 3, 4, 6 -
but you can no longer use them as the position indicators within your
collection without leaving a hole in the fifth position.
-
Consider sequence number gaps greater than one.
Instead of assigning sequence numbers along the lines of 1, 2, 3, "¦
instead assign numbers such as 10, 20, 30 and so on.
That way you don't need to update the values of the OrderItem.ItemSequence
column every time you rearrange order items because you can assign a
sequence number of 15 when you move something between 10 and 20. You will
need to change the values every so often, for example after several
rearrangements you may find yourself in the position of trying to insert
something between 17 and 18.
Larger gaps help to avoid this (e.g. 50, 100, 150, "¦) but
you'll never completely avoid this problem.
Figure 15.
Improved data schema for persisting Order and OrderItem.

A recursive relationship, also called reflexive relationships (Reed
2002; Larman
2002), is one where the same entity (class, data entity, table, "¦) is
involved with both ends of the relationship.
For example the manages relationship in
Figure
16 is recursive, representing the concept that an employee may manage
several other employees. The
aggregate relationship that the Team class has with itself is recursive
- a team may be a part of one or more other teams.
Figure 16 depicts a
class model that includes two recursive relationships and the resulting data
model that it would be mapped to. For
the sake of simplicity the class model includes only the classes and their
relationships and the data model includes only the keys.
The many-to-many recursive aggregation
is mapped to the Subteams associative table in the same way that you would map a
normal many-to-many relationship - the only difference is that both columns
are foreign keys into the same table. Similarly
the one-to-many manages association is
mapped in the same way that you would map a normal one-to-many relationship, the
ManagerEmployeePOID column refers to another row in the Employee
table where the manager's data is stored.
Figure 16. Mapping recursive
relationships.

Sometimes a class will implement a property that is
applicable to all of its instances and not just single instances.
The Customer class of Figure 17
implements nextCustomerNumber, a class attribute (you know this
because it's underlined) which stores the value of the next customer number to
be assigned to a new customer object. Because
there is one value for this attribute for the class, not one value per object,
we need to map it in a different manner. Table
2 summarizes the four basic strategies for mapping class scope properties.
Figure 17. Mapping class scope
attributes.

Table 2. Strategies for mapping class
scope properties.
Strategy
|
Example
|
Advantages
|
Disadvantages
|
Single Column, Single-Row Table
|
The CustomerNumber table of
Figure 17
implements this strategy.
|
Simple
Fast access
|
Could result in many small tables
|
Multi-Column, Single-Row Table for a Single Class
|
If Customer implemented a second class scope
attribute then a CustomerValues table could be introduced with one
column for each attribute.
|
Simple
Fast access
|
Could result in many small tables, although fewer
than the single column approach
|
Multi-Column, Single-Row Table for all Classes
|
The topmost version of the ClassVariables
table in Figure 17.
This table contains one column for each class attribute within your
application, so if the Employee class had a nextEmployeeNumber
class attribute then there would be a column for this as well.
|
Minimal number of tables introduced to your data
schema.
|
Potential for concurrency problems if many classes
need to access the data at once. One
solution is to introduce a ClassConstants table, as shown in
Figure 17, to separate attributes that are read only from those that can be
updated.
|
Multi-Row Generic Schema for all Classes
|
The bottommost version of the ClassVariables
and ClassConstants tables of
Figure 17. The table contains
one row for each class scope property in your system.
|
Minimal number of tables introduced to your data
schema.
Reduces concurrency problems (assuming your database
supports row-based locking).
|
Need to convert between types (e.g. CustomerNumber
is an integer but is stored as character data).
The data schema is coupled to the names of your
classes and their class scope properties.
You could avoid this with an even more generic schema along the
lines of Figure 9.
|
One of the most valuable services that an Agile DBA can
perform on a development team is performance tuning.
A very good book is Database
Tuning by Shasha and Bonnet (2003).
When working with structured technology most of the performance tuning
effort was database-oriented, generally falling into one of two categories:
-
Database performance tuning.
This effort focuses on changing the database schema itself, often by denormalizing
portions of it. Other
techniques include changing the types of key columns, for example an index
is typically more effective when it is based on numeric columns instead of
character columns; reducing the number of columns that make up a composite
key; or introducing indices on a table to support common joins.
-
Data access performance tuning. This effort
focuses on improving the way that data is accessed.
Common techniques include the introduction of stored procedures to
"crunch" data in the database server to reduce the result set
transmitted across the network; reworking SQL queries to reflect database
features; clustering data to reflect common access needs; and caching data
within your application to reduce the number of accesses.
In fact, although I haven't presented an example in this article, a common
strategy is to map an attribute of a class to a stored function. For
example, you could map the Customer.totalPortfolio to the
calculateCustomerPortfolio() stored procedure. Granted, this may
introduce performance problems itself (do you really want this stored
function to be invoked each time you read in a customer object?) and instead
you might want to map Customer.totalPortfolio attribute to the
Customer.TotalPortfolio column which would be calculated via a trigger
(or in batch).
Neither of these needs go away with object technology,
although as Figure 18 implies the situation is a little more complicated.
An important thing to remember is that your object schema also has
structure to it, therefore changes to your object schema can affect the database
access code that is generated based on the mappings to your database.
For example, assume that the Employee class has a homePhoneNumber
attribute. A new feature requires
you to implement phone number specific behavior (e.g. your application can call
people at home). You decide to
refactor homePhoneNumber into its class, and example of third
normal object form (3ONF), and therefore update your mappings to reflect
this change. Performance degrades
as a result of this change, motivating you to change either your mappings which
the data access paths or the database schema itself.
The implication is that a change to your object source code
could motivate a change to your database schema.
Sometimes the reverse happens as well.
This is perfectly fine, because as an agile software developer you are
used to working in an evolutionary manner.
Figure 18. Performance
tuning opportunities.

There are two main additions to performance tuning that
you need to be aware of: mapping tuning and
object schema tuning. Mapping
tuning is described below. When it
comes to object schema tuning most changes to your schema will be covered by
common refactorings. However, a technique
called lazy reading can help dramatically.
Throughout this article you have seen that there is more than one
way to map object schemas to data schemas - there are four ways to map inheritance
structures, two ways to map a one-to-one
relationship (depending on where you put the foreign key), and four ways to
map class-scope properties.
Because you have mapping choices, and because each mapping choice has its
advantages and disadvantages, there are opportunities to improve the data access
performance of your application by changing your choice of mapping. Perhaps you implemented the one
table per class approach to mapping inheritance only to discover that it's
too slow, motivating you to refactor it to use the one
table per hierarchy approach.
It
is important to understand that whenever you change a mapping strategy that it
will require you to change either your object schema, your data schema, or both.
An important performance consideration is whether the attribute
should be automatically read in when the object is retrieved.
When an attribute is very large, for example the picture of a person
could be 100k whereas the rest of the attributes are less than 1k, and rarely
accessed you may want to consider taking a lazy read approach.
The basic idea is that instead of automatically bringing the attribute
across the network when the object is read you instead retrieve it only when the
attribute is actually needed. This
can be accomplished by a getter method, an operation whose purpose is to provide
the value of a single attribute, that checks to see if the attribute has been
initialized and if not retrieves it from the database at that point.
Other common uses for lazy read is reporting
and for retrieving objects as the results of searches
where you only need a small subset of the data of an object.
The
O/R impedance mismatch forces you to map your object schema to your data schema.
To implement these mappings you will need to add code to your business objects,
code that impacts your application. These
impacts are the primary fodder for the argument that object purists make against
using object and relational technology together.
Although I wish the situation were different, the reality is that we're
using object and relational technology together and very likely will for many
years to come.
Like it or not we need to accept this fact.
I think that there is significant value in summarizing how
mapping impacts your objects. Some
of this material you have seen in this article and some you will see in other
chapters. The impacts on your code
include the need to:
-
Maintain shadow information.
-
Refactor
it to improve overall performance.
-
Work with legacy
data. It is common to work
with legacy databases and that there are often significant data quality,
design, and architectural problems associated with them.
The implication is that you often need to map your objects to legacy
databases and that your objects may need to implement integration and data
cleansing code to do so.
-
Encapsulate
database access. Your strategy for encapsulating database access
determines how you will implement your mappings.
Your objects will be impacted by your chosen strategy, anywhere from
including embedded SQL code to implementing a common interface that a
persistence framework requires.
-
Implement concurrency
control. Because most applications are multi-user, and because most
databases are accessed by several applications, you run the risk that two
different processes will try to modify the same data simultaneously.
Therefore your objects need to implement concurrency control
strategies that overcome these challenges.
-
Retrieve
objects from a relational database. You
will want to work with collections of the same types of objects at once,
perhaps you want to list all of the employees in a single division.
-
Implement
referential integrity.
There are several strategies for implementing referential
integrity between objects and within databases.
Although referential integrity is a business issue, and
therefore should be implemented within your business objects, the reality is
that many if not all referential integrity rules are implemented in the
database instead.
-
Implement
security access control.
Different people have different access to information.
As a result you need to implement security access control logic
within your objects and your database.
-
Implement
reporting. Do your business
objects implement basic reporting functionality or do you leave this effort
solely to reporting tools that go directly against your database.
Or do you use a combination.
-
Implement
object caches. Object
caches can be used to improve application performance and to ensure that
objects are unique within memory.
The Model-Driven Architecture (MDA) defines an approach to modeling that separates the
specification of system functionality from the specification of its
implementation on a specific technology platform.
In short, it defines guidelines for structuring specifications expressed
as models. The MDA promotes an
approach where the same model specifying system functionality can be realized on
multiple platforms through auxiliary mapping standards, or through point
mappings to specific platforms. It
also supports the concept of explicitly relating the models of different
applications, enabling integration, interoperability and supporting system
evolution as platform technologies come and go.
Although the MDA is based on the Unified Modeling Language (UML),
and the UML
does not yet officially support a data model, my expectation is that object
to relational mapping will prove to be one of the most important features that
MDA-compliant CASE tools will support.
My hope is that the members of the OMG find a way to overcome the cultural
impedance mismatch and start to work with data professionals to bring issues
such as UML data modeling and object-to-relational mapping into account.
Time will tell.
In this article you learned the basics of mapping objects to
relational databases (RDBs), including some basic implementation techniques that
will be expanded on in following chapters.
You saw that there are several strategies for mapping
inheritance structures to RDBs and that mapping object relationships into RDBs
is straightforward once you understand the differences between the two
technologies. Techniques for
mapping both instance attributes and class attributes were presented, providing
you with strategies to complete map a class's attributes into an RDB.
This article included some methodology discussions that described
how mapping is one task in the iterative and incremental approach that is
typical of agile software development. A
related concept is that it is a fundamental mistake to allow your existing
database schemas or data models to drive the development of your object models.
Look at them, treat them as constraints, but don't let them negatively
impact your design if you can avoid it.
Throughout this article I have described mapping techniques in common prose, some authors choose to write
patterns instead. The first such
effort was the
Crossing Chasms pattern language and the latest effort is captured in the book
Patterns
of Enterprise Application Architecture. Table
3 summarizes the critical material presented in this article as patterns,
using the names suggested by other authors wherever possible.
Table 3. Mapping patterns.
Pattern
|
Description
|
Class Table
Inheritance
|
Map each individual class within an inheritance
hierarchy to its own table.
|
Concrete Table
Inheritance
|
Map the concrete classes of an inheritance hierarchy
to its own table.
|
Foreign Key
Mapping
|
A relationship between objects is implemented in a
relational database as foreign keys in tables.
|
Identity Field
|
Maintain the primary key of an object as an
attribute. This is an example
of Shadow Information.
|
Lazy Initialization
|
Read a high-overhead attribute, such as a picture,
into memory when you first access it, not when you initially read the
object into memory.
|
Lazy Read
|
Read an object into memory only when you require it.
|
Legacy Data Constraint
|
Legacy data sources are a constraint on your object
schema but they should not drive its definition.
|
Map Similar Types
|
Use similar types in your classes and tables.
For example it is easier to map an integer to an numeric column
than it is to map it to a character-based column.
|
Map Simple Property to
Single Column
|
Prefer to map the property of an object, such as the
total of an order or the first name of an employee, to a single database
column.
|
Mapping-Based
Performance Tuning
|
To improve overall data access performance you can
change your object schema, your data schema, or the mappings in between
the two.
|
Recursive
Relationships Are Nothing Special
|
Map a recursive relationship exactly the same way
that you would map a non-recursive relationship.
|
Representing Objects as
Tables
|
Prefer to map a single class to a single table but be
prepared to evolve your design based to improve performance.
|
Separate Tables for
Class-Scope Properties
|
Introduce separate tables to store class scope
properties.
|
Shadow Information
|
Classes will need to maintain attributes to store the
values of database keys (see Identity Field) and concurrency columns to
persist themselves.
|
Single Column
Surrogate Keys
|
The easiest key strategy that you can adopt within
your database is to give all tables a single column, surrogate key that
has a globally unique value.
|
Single Table
Inheritance
|
Map all the classes of an inheritance hierarchy to a
single table.
|
Table Design Time
|
Let your object schema form the basis from which you
develop your data schema but be prepared to iterate your design in an
evolutionary manner.
|
Uni-directional Key
Choice
|
When a one-to-one unidirectional association exists
from class A to class B, put the foreign key that maintains the
relationship in the table corresponding to class A.
|