In the data world there is a common process called
normalization by which you organize data in such a way as to reduce and even
eliminate data redundancy, effectively increasing the cohesiveness of data
entities. Can the techniques of
data normalization be applied to object schemas?
Yes, but this isn't an ideal approach because data normalization only
deals data and not behavior. We
need to consider both when normalizing our object schema.
We need to rethink our approach. Class normalization is a process by which you reorganize the structure
of your object schema in such a way as to increase the cohesion of classes while
minimizing the coupling between them.
Unfortunately class normalization hasn't been adopted as widely
as I would have hoped. This
happened for a couple of reasons, but a big part of the problem was that class
normalization was clearly overshadowed by
design patterns at the time.
Although design patterns, which describe solutions to known problems
within a defined context, are very good things they are a different and
complementary approach. An
important benefit of class normalization over design patterns is that the
concept is familiar to data professionals and thus provides a bridge for them to
help learn object techniques (at least that's been my experience).
In this article, I discuss:
First Object Normal Form (1ONF)
Second Object Normal Form (2ONF)
Third Object Normal Form (3ONF)
Class Normalization and Other
Object Design Techniques
1. First Object Normal Form (1ONF)
A class is in first object normal form (1ONF) when
specific behavior required by an attribute that is actually a collection of
similar attributes is encapsulated within its own class.
An object schema is in 1ONF when all of its classes are in 1ONF.
Consider the class Student
in Figure 1. You
can see that it implements the behavior for adding and dropping students to and
from seminars. The attribute seminars
is a collection of seminar information, perhaps implemented as an array of
arrays, that is used to track what seminars a student is assigned to.
The operation addSeminar() enrolls the student into another
seminar whereas dropSeminar() removes them from one.
The operation printSchedule() produces a list of all
the seminars the student is enrolled in so that the student can have a printed
schedule. The operations setProfessor()
and setCourseName() make the appropriate changes to data within the seminars
collection. This design is clearly
not very cohesive - this single class is implementing functionality that is
appropriate to several concepts.
Figure 1. The Student class in 0ONF.
Figure 2 depicts the object schema in 1ONF.
Seminar was introduced, having
both the data and the functionality required to keep track of when and where a
seminar is taught, as well as who teaches it and what course it is.
It also implements the functionality needed to add students to the
seminar and drop students from the seminar.
By encapsulating this behavior in Seminar we have increased the
cohesion of our design - Student now does student kinds of things and Seminar
does seminar types of things. In
the schema of Figure 1 Student did both.
Figure 2. The Student class in 1ONF.
It should be clear that 1ONF is simply the object equivalent of data's first
normal form (1NF) - with 1NF you remove repeating groups of data from a data
entity and with 1ONF you remove repeating groups of behavior from a class.
2. Second Object Normal Form (2ONF)
A class is in second object normal form (2ONF) when it is in 1ONF
and when "shared" behavior that is needed by more than one instance of the
class is encapsulated within its own class(es).
An object schema is in 2ONF when all of its classes are in 2ONF.
Consider Seminar in
2. It implements the behavior
of maintaining both information about the course that is being taught in the
seminar and about the professor teaching that course.
Although this approach would work, it unfortunately doesn't work very
well. When the name of a course
changes you'd have to change the course name for every seminar of that course.
That's a lot of work.
3 depicts the object schema in 2ONF. To
improve the design of Seminar we have introduced two new classes, Course
and Professor which encapsulate the appropriate behavior needed to
implement course objects and professor objects.
As before, notice how it has been easy to introduce new functionality to
our application. Course now has methods to list the seminars that it is
being taught in (needed for scheduling purposes) and to create new seminars
because popular courses often need to have additional seminars added at the last
moment to meet student demand. The
Professor class now has the ability to produce a teaching schedule so
that the real-world person has the information needed to manage his or her time.
Figure 3. The object schema in 2ONF.
3. Third Object Normal Form (3ONF)
Although putting the object schema in 2ONF is
definitely a step in the right direction we can still improve the design
further. A class is in third object normal form
(3ONF) when it is in 2ONF and when it encapsulates only one
set of cohesive behaviors. An
object schema is in 3ONF when all of its classes are in 3ONF.
still not done, because the Seminar
class of Figure 3 implements "date range"
behavior - it has a start date and an end date, and it calculates the
difference between the two dates. Because
this sort of behavior forms a cohesive whole, and because it is more than likely
needed in other places, it makes sense to introduce the class DateRange
of Figure 4.
|In Figure 3 the Student class
encapsulates the behavior for both students and addresses.
The first step would be to refactor Student into two classes, Student
and Address. This would make
our design more cohesive and more flexible because there is a very good chance
that students aren't the only things that have addresses.
However, this isn't enough because the Address class still needs
to be normalized. There is behavior
that is associated only with zip codes, formatting and validation to be
specific. For example, based on the
zip code it should be possible to determine whether or not the city and state of
an address are valid. This
realization leads to the class diagram presented in Figure
4 which implements addresses as four distinct classes: Address, ZipCode,
City, and State. The
advantage of this approach is twofold - first of all the zip code
functionality is implemented in one place, increasing the
cohesiveness of our model. Second,
by making zip codes, cities, and states their own separate classes we can now
easily group addresses based on various criteria for reporting purposes,
increasing the flexibility of our application.
The main drawback is that to build a single address we have to build it
from four distinct objects, increasing the code that we have to write, test, and
Figure 4. The object schema in 3ONF.
Fundamentally class normalization is a technique for improving the
quality of your object schemas. The
exact same thing can be said of the application of common design pattern, such
as those defined by the "Gang of Four (GoF)" in
(Gamma et. al.
patterns are known solutions to common problems, examples of which include the Strategy
pattern for implementing a collection of related algorithms and the Singleton
pattern for implementing a class that only has one instance.
The application of common design patterns will often result in a highly
normalized object schema, although the overzealous application of design
patterns can result in you overbuilding your software unnecessarily.
Modeling (AM) suggests, you should follow the practice
Gently and ease into a design pattern over time.
Another common approach to improving object schemas is
(Fowler 1999), an approach overviewed in Database
Refactoring is a disciplined way to restructure code by
applying small changes to your code to improve its design. Refactoring
enables you to evolve your design slowly over time.
Class normalization and refactoring fit together quite well - as
you're normalizing your classes you will effectively be applying many known
refactorings to your object schema. A
fundamental difference between class normalization and refactoring is that class
normalization is typically performed to your models whereas refactorings are
applied to your source code.
Do you need to understand all three techniques?
Yes. It is always beneficial
to have several techniques in your intellectual toolkit.
What would you think of a carpenter with only one type of saw, one type
of hammer, and one type of screwdriver in their toolkit?
My guess would be that they wouldn't be as effective as one with a
selection of tools. Same thing can
be said of agile software developers.
5. What Have You Learned?
Although these techniques aren't as popular as refactoring or the application of design
patterns, I believe that they are important because they provide a very good
bridge between the object and data paradigms.
The rules of class normalization provide advice that effective object
designers have been doing for years, so there is really nothing new in that
respect. However, they describe
basic object design techniques in a manner that data professionals such as
agile data engineers can readily understand, helping to improve the communication within your teams.
My hope is that you have discovered that there is a fair bit to
OO. I also hope that you recognize
that there is some value in at least understanding the basic fundamentals of OO,
and better yet you may even decide to gain more experience in it. Object
technology is real, being used for mission-critical systems, and is here to
stay. At a minimum every IT
professional needs to be familiar with it.