As you can see at this site I've written a fair bit about
leading-edge practices surrounding the development and evolution of
relational database management systems (RDBMSs). Granted, I've also strayed to other technologies such as
XML
and
enterprise issues, but for the most part the focus has been on RDBMSs. Be that as it may, one thing that I
haven't done is written about relational theory at all. The reason for this is two-fold: first, my focus is on
practical matters and second, relational theory really doesn't seem to have much to offer to database
practitioners.
In the 1980s I earned a degree in Computer Science at the
University of Toronto. I was lucky enough to realize that data-related topics were important and chose to
take as many classes as I could in the subject (they were all electives if memory serves). The most memorable
was a fourth year course on RDBMSs where the focus was relational theory. The course was memorable not because
of the content, which was solid, but because none of it proved relevant on the job. In addition to a few problem
sets where the focus was on writing proofs using predicate calculus and set theory, the main assignment was the
development of a database engine which could process simple CRUD queries. Sadly, at the end of the course, I had
the skill to build an RDBMS engine to process SQL, but didn't know a thing about applying SQL in practice.
Over the years I worked on a variety of systems, almost all of which had RDBMSs on the back end. I did this
at banks, insurance companies, government agencies, telecommunications firms, and retail firms. I've worked with
a variety of technologies and with a range of people with different experiences. In all cases, although
relational theory was sometimes mentioned in conversation (more on this in a minute), it never proved directly
relevant in practice. Indirectly I very likely did benefit from learning about selection, projection, union,
relations, relation variables (relvars), tuples, and all those other good things. These days, I'm occasionally
asked what I think about relational theory and where it
fits into practice. My answer is that it's important in a few niche situations, and it does seem to provide a foundation upon which
you can build practical skills, but invariably it seems to me that the person asking about it really isn't interested in practice at all and is more likely looking for an excuse not to
improve their skillset.
From what I've seen over the years, relational theory is important when:
- You're developing a relational database engine. I haven't built a commercial database engine
myself, but I'm willing to go out on a limb and guess that the people working on such things at IBM,
Oracle, and Microsoft (to name a few), are interested in relational theory. However, considering how
much as been written about the fact that
RDBMS vendors are not remaining true to the
relational model, and
regularly go beyond it, it tells me that although they may be interested in relational theory
they're only acting on it when it makes practical sense. More importantly, when you see the features
which have been added to mainstream RDBMSs, such as the
Java VM
to Oracle and the CLR to Microsoft SQLServer, it's clear that the vendors are moving away from basing
their products on relational theory. As
Dawn Wolthuis has said, this may seem controversial
to the theorists, but more and more it's the state of the industry.
- You're a computer science academic. Academics like to focus on theory, and many have literally
made a career out of it, and have even managed to sell a few books on the subject. Good for them, but
that's not practice.
- That's all you know, or at least that's what you prefer to focus on. Many people within IT are
overly specialized, a reflection of the
Tayloristic theories inflicted on IT
in the 1960s and 1970s, and as a result they have an unjustified belief in the importance of their
specialty. This is a completely natural thing to happen, albeit a dysfunctional one. Luckily there is a
clear and growing trend in the IT industry away from specialists towards
generalizing specialists, so the already niche specialty of relational theorists will surely shrink
over time.
- You're focused on past glories. Relational theory has had important impacts on the IT industry,
in particular the
SQL language and RDBMSs are at least partially based upon
it, but that was way back in the 1970s. It's also had an impact on
data modeling
practices, including introducing the concept of
data normalization and functional
dependencies, which is clearly valuable. But, what has come of relational theory lately? Furthermore,
with data being only one of many aspects (e.g. functionality, security, hardware, network, user
interface, ...) facing IT professionals today, relational theory at best seems applicable only to a very
narrow sliver of software development and therefore doesn't appear to supply much of a basis for new
advances (and sure enough, it hasn't).
- You can't find anything better to talk about over a couple of beers. How sad.
Every so often someone asks me about how the techniques such as
database refactoring,
agile data modeling, or
database regression testing relate to relational theory. My answer is always the same: my focus is on
practice, not theory. As I indicate above, relational theory has provided the foundation for some important
practices, but I really can't recall the time when I saw a database practitioner stop to work out the relational
algebra behind whatever it was they were working on at the time. They just did the work.
The theorists like to claim that the reason why there are so many problems with existing database designs is
that practitioners don't understand relational theory, and in some ways they have a valid point. A lot of good
database developers that I know understand the theory, whether or not they've received any formal training in
it, but they also understand far more than just that. Unfortunately, the theorists struggle to make their ideas
attractive to practitioners, writing books which are either inaccessible to them or simply too-far divorced from
the realities of software development, and in the end exacerbate the problem they are trying to address. Worse
yet, the theorists seem to focus on modeling databases from scratch, they rarely seem to have advice for those
of us who are dealing with
existing legacy data sources and the
mission-critical systems using them. It typically isn't an option to start from scratch and rebuild them, so
where are the techniques to help us address the problems which we actually face? They don't seem to be coming
from the theory guys (NOTE: If you know of any writings from the theory folks among us which do address legacy
concerns, I'd appreciate it if you could send me an email
). Techniques such as
database refactoring and
database regression testing aren't coming from the theory folks, they're coming from those of us in the
trenches who are trying to find ways to get the job done.
Why Even Mention Relational Theory in Conversation?
So why do people ask about relational theory? From what I've seen, there are several reasons:
- That's all they know, or at least that's what they focus on. They're one-trick ponies, and
they're desperate to convince you that the one trick that they know is impressive.
- They're looking for an excuse not to change. This is probably the most common problem, the person
fears the many changes which they're seeing in the IT community and they're desperately trying to avoid
such changes. They're often looking to justify their unwillingness to change by claiming that the
promoter of a new idea doesn't understand relational theory (regardless of whether relational theory is
even applicable or whether the person actually does understand relational theory) or that if there isn't
a mathematical proof supporting the concept that it couldn't be any good. The book
Fearless Change is a great resource, as is
Becoming Agile.
- They're a zealot. Unfortunately, we have them among us and there's rarely anything that we can do
to help them. They have their way of doing things and they're really not interested in hearing about
anything else. Concepts such as applying the
right software process/method for your
situation, or
applying the right model for your situation, are the antithesis of their "one size fits all"
theories. Worse yet, they're often in complete denial that their approaches don't seem to be widely
adopted in practice but seem to think that it's only a matter of time until this happens. If
practitioners didn't bother to learn relational theory at its height of popularity, it's doubtful that
many will bother now.
- They think that it drives development efforts. Some people have a nasty habit of making sweeping
statements about the importance and applicability of relational and/or mathematical theory, statements
which often make sense only to people who are narrowly focused on data-oriented activities. How many
times have you heard claims that a
solid grounding in relational
theory will result in great database designs, or will ensure data integrity? Shouldn't we worry
about great overall designs which look at the
entire picture, not just data?
Wouldn't a good
testing strategy do more to
help ensure quality, particularly when the traditional approach certainly seems to have resulted in some
questionable database designs over the years? When you start to look at the bigger picture and you
accept the fact that there is far more to development than just data, then you quickly realize that
relational theory is not as important as the theorists would like you to believe. Or at best, it's one
of many aspects of theory that you should learn. Call me a radical, but shouldn't we adopt techniques
which work in practice and which address the actual problems that we face, and worry a little bit less
about mathematical theory?
- They've been misguided by the one-trick ponies, the fearful, and the zealots among us. These
people we can actually help, which is one of the reasons why I wrote this article. Many people will
often listen to someone, and when they hear what they expect to hear from them, or more importantly what
they want to hear from them, then they pretty much leave it at that. They may not know that there are
other sides to the issue, or that perhaps these other sides have been misrepresented to them (if
mentioned at all) by these other people.
Why is relational theory an issue for someone who is clearly a practitioner? I've become concerned
because of the damage within the IT industry that I'm seeing caused in its name. As I noted, some people use
relational theory as an excuse not to change, but frankly that's their business and I'm happy to let them
travel along their own path. But other people, in particular college instructors and book writers,
needlessly inflict relational theory on people who are trying to learn how to become an
effective data professional, or better yet
an effective IT professional. Too much focus on theory can really make data-oriented development techniques
unattractive to practitioners, which is one of the reasons why I think so many
application developers seem to have little or
no skills in this area.
Going back full circle, what should I have learned in my university database course. If I were organizing
such a course today, the agenda would look something like this:
- The history of databases and data theory. There's always value in spending a few hours on
foundational concepts, including relational theory as well as
newer data-oriented theories.
- An overview of data storage technologies. Students should know the differences between the
various data storage options available to them. This would include the various database management
system approaches (relational, network, hierarchical, XML, and object (this isn't a complete list)) as
well as file management strategies. Furthermore there should be a discussion of the trade-offs between
the approaches and advice for when to use each. An important message should be that RDBMSs and files are
the most common storage mechanisms in use today, and that XML is an important data transport
representation.
- Where data fits into the overall software development process. This is a message which is sorely
missing in many university curriculums and books (surprisingly, including the vast majority of data
books). The
Agile Data way of thinking (WoT)
look beyond data, says it well: Data is only one of many important aspects of IT. As you can see at
Agile Models Distilled
and
Software Development Phases Examined, data-oriented techniques represent a small portion of the
knowledge which IT professionals require to be successful. Important knowledge to be sure, but only a
small sliver of the overall picture.
- Data modeling.
Data modeling
is one of many important skills a developer should have. Furthermore, they should have an understanding
of both traditional approaches to data modeling as well as
agile/evolutionary approachesto be effective.
- Database development techniques. Students should learn when, and how, to
implement functionality in relational databases. They should understand what triggers, stored
procedures/functions, and database objects are and how to develop them. Furthermore, they should
understand relevant application development issues such as
how to retrieve objects from
an RDB,
security access control,
transaction control, and
concurrency control.
- Database testing. Students should learn how to
test relational databases.
Data is an important corporate asset, and measures should be taken to ensure its quality. Similarly,
mission-critical functionality is often implemented in databases which should also be tested.
- Database refactoring. Just like you should
refactor your code to ensure that it's of the highest quality design at all times, you should do the
same for your database schema. Modern developers work in an
evolutionary, if not
agile manner, and so must people doing database work.
Database refactoring
enables them to do exactly that.
- Working with legacy data sources. An understanding of the challenges presented by
legacy data sources, and how
to overcome them, is critical knowledge. Legacy data sources are a fact of life: you might be able to
refactor
them over time, but the reality is that you'll need to learn to live with them and to deal with the data
quality, design, and architectural challenges which they suffer from.
- Object/relational development techniques. A common strategy in organizations today is build
applications using a combination of
object and
relational technologies. Students should understand the
technical impedance
mismatch between the two technologies and understand the fundamentals of
O/R mapping.
- Reporting strategies. The course should include a discussion of the various strategies for
implementing reports, including
discussion of data marts and
data warehouses
.
- Data management within the enterprise. Although it will likely be difficult for students to grasp
due to lack of real world experience, they should be given an appreciation to
be enterprise aware
when developing systems. This includes having an appreciation for the importance of
enterprise
architecture
and administration
. Students should also learn about the
cultural impedance mismatch that they are likely to face in some organizations.
In short, relational theory does have its place in modern database practice, it's just that this place is
several orders of magnitude less than what the theorists among us would have us think. But they're welcome to
grind that axe if it makes them happy, they just shouldn't be surprised that the rest of us aren't paying much
attention to them. I also invite the theorists to get their hands dirty and gain some practical experience on a
modern software development team and see what actually happens.
|
Remember the adage:
In theory, practice and theory are one and the same.
In practice, they're not. |
|
Acknowledgements
I'd like to thank Curt Monash,
Curt Sampson, and
Dawn Wolthuis for their feedback regarding this article.