For the sake of our discussion a relational database is a persistent storage
mechanism that enables you to both store data and optionally implement
functionality. The goal of this article is to overview relational database
management system (RDBMS) technology and to explore the
practical issues applicable to its use in modern
organizations; the goal is not to discuss
relational
theory. RDBMSs are used to store the information required by
applications built using procedural technologies such as COBOL or FORTRAN,
object technologies such as Java and C#, and component-based technologies such
as Visual Basic. Because RDBMSs are the dominant persistent storage
technology it is critical that all IT professionals understand at least the
basics of RDBMSs, the challenges surrounding the technology, and when it is
appropriate to use them.
Table of Contents
- RDBMS Technology
- Coupling: Your Greatest Enemy
- Additional Challenges With RDBMSs
- Encapsulation: Your Greatest Ally
- Beyond RDBMSs: You
Actually Have A Choice
- Closing Thoughts
Let's start by defining some common terminology. A database
management system (DBMS) is the software which controls the storage, retrieval,
deletion, security, and integrity of data within a database. An RDBMS is a
DBMS which manages a relational database. A relational database stores data in tables. Tables
are organized into columns, and each column stores one type of data (integer,
real number, character strings, date, "¦).
The data for a single "instance" of a table is stored as a row.
For example, the Customer table
would have columns such as CustomerNumber,
FirstName, and Surname, and a row within that table would be something like {1701,
"James", "Kirk"}.
Tables typically have
keys, one or more columns that uniquely
identify a row within the table, in the case of the Customer
table the key would be CustomerNumber.
To improve access time to a data table you define an index on the table.
An index provides a quick way to look up data based on one or more
columns in the table, just like the index of a book enables you to find specific
information quickly.
The most common use of RDBMSs is to implement
simple CRUD - Create, Read, Update, and Delete - functionality.
For example an application could create a new order and insert it into
your database. It could read an
existing order, work with the data, and then update the database with the new
information. It could also choose
to delete an existing order, perhaps because the customer has cancelled it.
The vast majority of your interaction with an RDB will likely be to
implement basic CRUD functionality.
The easiest way to manipulate a relational database is to submit
Structured Query Language (SQL) statements to it.
Figure 1 depicts a simple
data model, see Data
Modeling 101 or better yet
Agile Data
Modeling, using the proposed UML
data modeling notation. To create a row in the Seminar table you
would issue an INSERT statement, an example of which is shown in Code Figure
1.
Similarly, Code Figure 2 presents and example of how to read a row by
issuing a SELECT statement. Code
Figure 3 shows how to update an existing row via an UPDATE statement and Code
Figure 4 how to delete a row with a DELETE statement.
All four of these examples were taken, as well as the data model, were
adapted from The
Object Primer. A
very good resource for learning SQL is SQL Queries for Mere Mortals by
Micheal J. Hernandez and John L. Viescas.
Code Figure 1. SQL statement to insert a row into the
"Seminar" table.
INSERT INTO Seminar
(SEMINAR_ID,
COURSE_ID, OVERSEER_ID, SEMINAR_NUMBER)
VALUES (74656,
1234, "˜THX0001138', 2)
|
Code Figure 2. SQL statement to retrieve a row from the
"Seminar" table.
SELECT * FROM Seminar
WHERE SEMINAR_ID = 1701
|
Code Figure 3. SQL statement to update a row in a table
in the "Seminar" table.
UPDATE Seminar
SET
OVERSEER_ID = "˜NCC0001701', SEMINAR_NUMBER = 3
WHERE SEMINAR_ID = 1701
|
Code Figure 4. SQL statement to delete data from the
"Seminar" table.
DELETE FROM Seminar
WHERE SEMINAR_ID > 1701
AND
OVERSEER_ID = "˜THX0001138'
|
Figure 1. A simple UML data model.

There are several "advanced features" of RDBMSs that developers learn once they've familiarized themselves with
basic CRUD functionality. Each of
these features is so important, and often so complex, that they require their
own articles to cover them properly. So
for now I will introduce you to the concepts and then link to these other
articles for the details. These features include:
-
Object storage.
To store an object in a relational database you need to flatten it
- create a data representation of the object - because relational
databases only store data. To
retrieve the object you would read the data from the database and then
create the object, often referred to as restoring the object, based on that
data. Although storing objects
in a relational database sounds like a simple thing to achieve, practice
shows that it isn't. This is
due to the object-relational impedance mismatch, the fact that relational
database technology and object technology are based on different underlying
theories, a topic discussed in
The
Object-Relational (O/R) Impedance Mismatch.
To store objects successfully in relational databases you need to
learn how to map your object schema to your relational database
schema.
-
Implementing behavior within the database.
Behavior is implemented in a relational database via stored
procedures and/or stored functions that can be invoked internally within the
database and often by external applications.
Stored functions and procedures are operations that run within an
RDBMS, the difference being what the operation can return and whether it can
be invoked in a query. The
differences aren't important for our purposes so the term "stored
procedure" will be used to refer to both stored functions and stored
procedures. In the past stored procedures were written in a proprietary
language, such as Oracle's PL/SQL, although now Java is quickly becoming
the language of choice for database programming.
A stored procedure typically runs some SQL code, massages the data,
and then hands back a response in the form of zero or more records, or a
response code, or as a database error message.
Effective use of stored procedures is discussed in detail in
Implementation
Strategies for Persisting Objects in RDBs.
-
Concurrency control.
Consider an airline reservation system.
There is a flight with one seat left on it, and two people are trying
to reserve that seat at the same time. Both people check the flight status
and are told that a seat is still available.
Both enter their payment information and click the reservation button
at the same time. What should
happen? If the system is
working properly only one person should be given a seat and the other should
be told that there is no longer one available.
Concurrency control is what makes this happen.
Concurrency control must be implemented throughout your object source
code and within your database.
-
Transaction control.
A transaction is a collection of actions on your database - such as
the saving of, retrieval of, or deletion of data - which form a work unit.
A flat transactions is an "all-or-nothing" approach where all the
actions must either succeed or be rolled back (canceled).
A nested transaction is an approach where some of the actions are
transactions in their own right. These
sub-transactions are committed once successful and are not rolled back if
the larger transaction fails. Transactions
may also be short-lived, running in thousandths of a second, or long-lived,
taking hours, days, weeks, or even months to complete.
Transaction control is
a critical concept for all developers to understand.
-
Enforcing referential integrity.
Referential
integrity (RI) is the assurance that a reference from one entity to another
entity is valid. For example, if a customer references an address, then
that address must exist. If the address is deleted then all references
to it must also be removed, otherwise your system must not allow the
deletion. Contrary to popular belief, RI isn't just a database issue, it's
an issue for your entire system. A customer is implemented as an object within a Java
application and as one or more records in your database - addresses are
also implemented as objects and as rows. To delete an
address, you must remove the address object from memory, any direct or
indirect references to it (an indirect reference to an address would include
a customer object knowing the value of the AddressID, the
primary key of the address in the database), the
address row(s) from your database, and any references to it (via foreign
keys) in your database. To
complicate matters, if you have a farm of application servers that address
object could exist simultaneously on several machines.
Furthermore, if you have other applications accessing your database
then it is possible that they too have representations of the address in
their memory as well. Worse yet, if the address is stored in several places
(e.g. different databases) you should also consider taking this into
account. All developers should understand basic
strategies for implementing referential integrity.
Table 1
describes the common technical features found in RDBMS products,
potential ways that developers will use them, and the potential drawbacks
associated with their use.
Table 1. Common RDBMS
Technical Features.
Feature
|
Potential Usage
|
Potential Drawbacks
|
Database cursors - A database cursor is effectively a
handle to the results of a SQL query, enabling you to move forward and
backward through the result set one or more records at a time.
|
-
Accessing large results sets in smaller portions enables
your application to display initial results earlier, increasing response
time.
-
Performance is improved when a portion of a result set
is required because less data is transmitted across the network.
|
-
Application developers need to understand that the
underlying data can change between the times that data records are
accessed via the cursor: previously retrieved records may have been
deleted, records may have been inserted into previously retrieved portions
of the result set, or previously retrieved records may have been modified.
-
Not all cursors are created equal.
Some cursors only allow forward scrolling.
-
Cursors are a resource drain on the database because they
are memory intensive.
|
Java - Most database vendors support a Java VM within the
database.
|
-
Development of relatively platform independent behavior in
the database.
-
Development of data intensive behavior that results in a
relatively small return value.
-
Encapsulation of database access to support security access
control to information.
-
Implementation of shared behavior required by many
applications.
|
-
Different version of VMs between application server and
database server increases complexity of development.
-
Behavior implemented in the database can easily become a
bottleneck.
|
Triggers - A trigger is a procedure that is run either
before or after an action (such as a create, update, or delete) is
performed on a row in a database table.
|
-
Enforce referential integrity within your database.
These types of triggers can often be automatically generated by
your data modeling or database administration tool.
-
Often a lowest common denominator for implementing
referential integrity constraints.
-
Perform hand-crafted audit logging.
|
-
Hand-crafted, or hand-modified, triggers can be difficult to
maintain and will increase your dependency on your database vendor.
-
Triggers are typically implemented in a proprietary language
requiring an extra skillset on your team.
-
Because triggers are automatically invoked they can be very
dangerous (such as "uncontrolled" cascading deletions resulting from
chained delete triggers).
-
Behavior implemented in the database can easily become a
bottleneck if your database doesn't scale well.
|
2. Coupling: Your Greatest Enemy
Coupling is a measure of the degree of dependence between
two items - the more highly coupled two things are the greater the chance that
a change in one will require a change in another.
Coupling is the "root of all evil" when it comes to
software development, and the more things that your database schema is coupled
to the harder it is to maintain and to evolve.
Relational database schemas can be coupled to:
-
Your application source code. When you change
your database schema you must also change the source code within your
application that accesses the changed portion of the schema.
Figure 2 depicts
the best-case scenario - when it is only your application code that is
coupled to your database schema. This situation is traditionally referred to
as a stovepipe. These
situations do exist and are often referred to as stand-alone applications,
stovepipe systems, or greenfield initiatives.
Count yourself lucky if this is your situation because it is very
rare in practice.
-
Other application source code.
Figure 3 depicts the
worst-case scenario for relational databases - a wide variety of software
systems are coupled to your database schema, a situation that is quite
common with existing production databases.
It is quite common to find that in addition to the application that
your team is currently working on that other applications, some of which you
know about and some of which you don't, are also coupled to your database.
Perhaps an online system reads and writes to your database.
Perhaps a manager has written a spreadsheet, unbeknownst to you, that
reads data from your database that she uses to summarize information
critical to her job.
-
Data load source code.
Data loads from other sources, such as government provided tax
tables or your own test data, are often coupled to your database schema.
-
Data extract source code.
There may be data extraction scripts or programs that read
data from your database, perhaps to produce an XML data file or simply so
your data can be loaded into another database.
-
Persistence frameworks/layers.
A persistence framework encapsulates the logic for mapping
application classes to persistent storage sources such as your database.
When you refactor your database schema you will need to update the
meta data, or the source code as the case may be, which describes the
mappings.
-
Itself. Coupling
exists within your database. A
single column is coupled to any stored procedure that references it, other
tables that use the column as a foreign key, any view that references the
column, and so on. A simple
change could result in several changes throughout your database.
-
Data migration scripts.
Changes to your database schema will require changes to your
data migration scripts.
-
Test code.
Testing code includes any source code that puts your database into a
known state, that performs transactions that affect your database, and that
reads the results from the database to compare it against expected results.
Clearly this code may need to be updated to reflect any database
schema changes that you make.
-
Documentation.
Some of the most important documentation that you are likely to keep
pertains to your physical database schema, including but not limited to
physical data models and descriptive meta data.
When your database schema changes the documentation describing will
also need to change accordingly. Although
Agile Modeling
implores you to Update
Only When It Hurts, that your documentation doesn't have to be
perfectly in synch with your schema at all times, the reality is that you
will need to update your docs at some point.
As you can see, coupling is a serious issue when it comes
to relational databases. To make
matters worse the concept of coupling is virtually ignored within
database
theory circles. Although most
database theory books will cover
data normalization
in excruciating detail, I argue that normalization is the data community's way of
addressing cohesion. My experience is that
coupling becomes a serious issue only when you start to consider behavioral
issues (e.g. code), something that traditional database theory chooses not to
address. This is another reason to follow Agile
Modeling (AM)'s Multiple
Models principle and look beyond data.
Figure 2. The
best-case scenario.
Figure 3. The
worst-case scenario.

Coupling isn't the only challenge that you face with RDBMSs, although it is clearly an important one.
Other issues that you will face include:
-
Performance issues are difficult to predict.
When you are working with a shared database, such as the situation
implied in Figure 3, you may find
that the performance characteristics of your database are hard to predict
because each application accesses the database in its own unique way.
For example, perhaps one legacy application updates information
pertaining to items for sale sporadically throughout the month, enabling a
human operator to add new items or update existing ones, activity that
doesn't really affect your application's performance in a meaningful
way. However, this same
application also performs batch loads of items available for sale via other
companies that you have partnered with, items that you want to carry on your
web site as soon as they are available.
These batch loads can take several minutes, during which period the Item table is under heavy load and thus your online application is
potentially affected.
-
Data integrity is difficult to ensure with shared
databases. Because no
single application has control over the data it is very difficult to be sure
that all applications are operating under the same business principles.
For example, your application may consider an order as fulfilled once
it has been shipped and a payment has been received.
The original legacy application that is still in use by your
customer support representatives to take orders over the phone may consider
an order as fulfilled once it has been shipped, the payment received, and
the payment deposited into your bank account.
A slight difference in the way that a fundamental business rule has
been implemented may have serious business implications for any application
that accesses the shared databases. Less
subtly, imagine what would happen if your online order taking application
calculates the total for an order and stores it in the order table, whereas
the legacy application calculates the subtotals only for order items but does
not total the order. When
the order fulfillment application sees an order with no total it calculates
the total, and appropriate taxes, whereas if a total already exists it uses
the existing figure. If a
customer makes an order online and then calls back a few hours later and has
one of your customer service representatives modify their existing order,
perhaps to add several items to it, the order total is no longer current
because it has not been updated properly.
Referential integrity issues such as this are covered in detail in
the Implementing
Referential Integrity article.
-
Operational databases require different design
strategies than reporting databases.
The schemas of operational databases reflect the operational needs of
the applications that access them, often resulting in a reasonably normalized
schema with some portions of it denormalized for performance reasons.
Reporting databases, on the other hand, are typically highly denormalized
with significant data redundancy within them to support a wide range of
reporting needs.
Every technology has its strengths and weaknesses, and
RDBMS technology is not an exception to this rule.
Luckily there are ways that you can mitigate some of these challenges,
and encapsulation is an important technique to do so.
Encapsulation is a design issue that deals with how functionality
is compartmentalized within a system. You should not have to know how something
is implemented to be able to use it. The
implication of encapsulation is that you can build anything anyway you want, and
then you can later change the implementation and it will not affect other
components within the system (as long as the interface to that component did not
change). People often say that encapsulation is the act of painting the box
black - you are defining how something is going to be done, but you are not
telling the rest of the world how you're going to do it.
For example, consider your bank. How
do they keep track of your account information, on a mainframe, a mini, or a PC?
What database do they use? What
operating system? It doesn't
matter to you because the bank has encapsulated the way in which they perform
account services. You just walk up
to a teller and do whatever transactions you wish.
By encapsulating access to a database, perhaps through something
as simple as data access objects or something as complex as a persistence
framework, you can reduce the coupling that your database is involved with.
The Implementation
Strategies for Persisting Objects in RDBs chapter
compares and contrasts various encapsulation strategies that you have available
to you. For now assume that it is
possible to hide the details of your database schema from the majority of the
developers within your organization while at the same time giving them access to
your database. Some people, often
just the Agile data engineer(s) responsible for supporting the database, will need to
understand and work with the underlying schema to maintain and evolve the
encapsulation strategy.
One advantage of encapsulating access to your database is that it
enables
application programmers to focus on the business problem itself.
Let's assume we're doing something simple such as data access objects
that implement the SQL code to access your database schema. The
application programmers will work with these data access classes, not the
database. This enables your data engineers to evolve the database schema as they need to,
perhaps via database refactorings, and all they need to worry about is keeping the data access
classes up to date.
This reveals a second advantage to this approach - it
provides greater freedom to Agile data engineers to do their job.
Figure
4 depicts the concept of encapsulating access to your database, showing how
the best case scenario of Figure 2 and
the worst case scenario of Figure 3
would likely change. In the best-case scenario your business source code
would interact with the data access objects that in turn would interact with the
database. The primary advantage
would be that all of the data-related code would be in one place, making it
easier to modify whenever database schema changes occurred or to support
performance-related changes. It's
interesting to note that the business code that your application programmers are
writing would still be coupled to the data access objects.
Therefore they'd need to change their code whenever the interface of a
data access object changed. You'll
never get away from coupling. However,
from the point of view of the application programmer this is a much easier
change to detect and act on - with the database encapsulation strategy in
place the application programmers are only dealing with program source code
(e.g. Java) and not program source code plus SQL code.
Figure 4. The scenarios revisited.

Things aren't quite so ideal for the worst-case
scenario. Although it is possible
that all applications could take advantage of your encapsulation strategy the
reality is that only a subset will be able to.
Platform incompatibility can be an issue - perhaps your data access
objects are written in Java but some legacy applications are written using
technologies that can't easily access Java.
Perhaps you've chosen not to rework some of your legacy applications
simply to use the database encapsulation strategy. Perhaps some applications already have an encapsulation
strategy in place (if so, you might want to consider reusing the existing
strategy instead of building your own). Perhaps
you want to use technologies, such as a bulk load facility, that require direct
access to the database schema. The
point to be made is that a portion of your organization's application will be
able to take advantage of your encapsulation strategy and some won't.
There is still a benefit to doing this, you are reducing coupling and
therefore reducing your development costs and maintenance burden, the problem is
that it isn't a fully realized benefit.
Another advantage of encapsulating access to a database is that it gives you
a common place, in addition to the database itself, to implement data-oriented
business rules.
5. Beyond RDBMSs: You
Actually Have a Choice
Because there are some clear problems with relational
database technology you may decide to use another technology.
Yes, RDBMSs are the most commonly used type of persistence
mechanism but they are not the only option available to you.
These choices include:
-
Object/relational databases. Object/relational
databases (ORDBs), or more properly object/relational database management
systems (ORDBMSs), add new object storage capabilities to RDBMSs. ORDBs, add new
facilities to integrate management of traditional fielded data, complex
objects such as time-series and geo-spatial data and diverse binary media
such as audio, video, images, and (sometimes) Java applets.
ORDBMSs basically add to RDBMSs features such as defined data types, for
example you could define a data type called SurfaceAddress
that has all of the attributes and behaviors of an address, as well as the
ability to navigate objects in addition to an RDBMS's ability to join
tables. By implementing objects
within the database, an ORDBMS can execute complex analytical and data
manipulation operations to search and transform multimedia and other complex
objects. ORDBs support the
robust transaction and data-management features of RDBMSs while at the same
time offer a limited form of the flexibility of object-oriented databases.
Because of ORDBMSs relational foundation, database administrators work with
familiar tabular structures and data definition languages (DDLs) and
programmers access them via familiar approaches such as SQL3, JDBC, and
proprietary call interfaces.
-
Object databases.
Object databases (ODBs), also known as object-oriented databases (OODBs)
or object-oriented database management systems (OODBMSs), nearly seamlessly
add database/persistence functionality to object programming languages. In
other words, full-fledged objects are implemented in the database.
They bring much more than persistent storage of programming language
objects. ODBs extend the semantics of Java to provide full-featured database
programming capability, via new class libraries specific to the ODB vendor,
while retaining native language compatibility. A major benefit of this
approach is the unification of the application and database development into
a seamless model. As a result, applications require less code, use more
natural persistence modeling, and code bases are easier to maintain.
Object-oriented developers can write complete database applications with a
modest amount of additional effort without the need to marshal their
objects into flatten data structures for storage, as a result forgoing the
marshalling overhead inherent with other persistence mechanism technologies
(such as RDBs). This one-to-one mapping of application objects to database
objects provides higher performance management of objects and enables better
management of the complex interrelationships between objects.
-
Native XML databases. Native XML databases store
information as XML documents following one of two approaches: First, a
native XML database will either store a modified form of the entire XML
document in the file system, perhaps in a compressed or pre-parsed binary
form. Second, a native XML database may opt to map the structure of the
document to the database, for example mapping the Document Object Model
(DOM) to internal structures such as Elements, Attributes, and Text -
exactly what is mapped depends on the database.
The most important difference between these approaches, from the
point of view of an application developer, is the way they are accessed:
with the first approach the only interface to the data is XML and related
technologies, such as XPath (a language design specifically for addressing
parts of an XML document, visit www.w3.org
for details) or the DOM whereas with the second approach the database should
be accessible via standard technologies such as JDBC.
The important thing to understand about native XML databases is that
they work with the internal structures of the XML documents - they don't
just store them as a binary large object (BLOB) in the database.
-
Flat files.
Flat files, such as .TXT or. CSV (comma separated value) files, are
commonly used to store data. A
single file can be used to store one type of data structure, such as
customer information or sales transaction information, or through a coding
and formatting strategy the structures of several types of data structures.
One approach to flat file organization is either to have data values
separated by a pre-defined character, such as a comma, or tag, such as </FirstName>
in an XML document. Another
common approach is to delimit data values by size - the first 20
characters of the row represent the first name of the customer, the next 20
characters represent the surname, and so on.
-
Hierarchical databases. Hierarchical databases
link data structures together like a family tree such that each record type
has only one owner, for example an order is owned by only one customer.
Hierarchical structures were widely used in the first mainframe database
management systems, and are still a very common source of data in many large
organizations. Hierarchical
databases fell out of favor with the advent of relational databases due to
their lack of flexibility because it wouldn't easily support data access
outside the original design of the data structure.
For example, in the customer-order schema you could only access an
order through a customer, you couldn't easily find all the orders that
included the sale of a widget because the schema isn't designed to all
that.
-
Prevalence layer.
Klaus Wuestefeld defines prevalence as "transparent
persistence, fault-tolerance and load-balancing of the execution of the
business logic of an information system through the use of state snapshots
as well as command and query queuing or logging". A prevalence layer is effectively a simple persistence
framework that serializes objects and writes them to log files. From the point of view of developers all objects are cached
in memory and the persistence of the objects is truly treated as a
background task that the developers don't need to worry about.
Table 2 presents a
comparison of the various types of persistence mechanism and provides references
to vendors where applicable. Table
3 presents suggestions for when you might use each type of technology.
Large organizations will find that they are using several types of
persistence mechanism and will even install the products of several different
vendors. Not only do you have a
choice you might be forced to work with a wide range of databases whether you
want to or not.
Table 2. Comparing Types of
Persistence Mechanisms.
Mechanism
|
Advantages
|
Disadvantages
|
Common Products
|
Flat Files
|
- Supports simple approach to
persistence
- Good solution for smaller systems
- Most development languages have
built-in support for file streams
- No licensing costs
|
|
N/A
|
Hierarchical Databases
|
- Supports transaction-oriented
applications
|
- Not in common use for development
of new applications
|
IBM's IMS (www.ibm.com)
|
Object Databases
|
- "Pure" approach to persisting
objects
- Existing vendors have survived the
market shakeout and are likely here to stay
- Excellent option for an
application-specific database (e.g. the best-case scenario of Figure 7.2)
when object-technology is used
- Uniformity of approach towards
application and data storage.
- Facilitates
refactoring
because everything is an object
|
- Not well accepted in the market
place
- No single dominant vendor
- Standards defined, such as Object
Query Language (OQL), are still evolving
|
Computer Associate's Jasmine (www.ca.com)
Versant Developer Suite (www.versant.com)
Object Design's ObjectStore (www.objectdesign.com)
Objectivity/DB (www.objectivity.com)
Poet (www.poet.com)
|
Object/Relational Databases
|
- Relational vendors are slowly
adopting object-relational features
- Less of an impedance mismatch with
objects
-
|
- Not well accepted in the market
place
- No single dominant vendor
- Emerging standards, such as SQL3,
are not yet widely adopted
- Small experience base
|
Cloudscape (www.cloudscape.com)
Cincom UniSQL (www.cincom.com)
Relational vendors listed above
|
Prevalence Layer
|
- Transparent persistence of objects
- Performance
- Simplicity
|
|
Prevayler (www.prevayler.org)
|
Relational Databases
|
- Mature technology
- Dominate the persistence mechanism
market
- Several well-established vendors
- Standards, such as Structured
Query Language (SQL) and JDBC well defined and accepted
- Significant experience base of
developers
|
|
Oracle (www.oracle.com)
Sybase (www.sybase.com)
IBM DB2 (www.ibm.com)
Microsoft SqlServer (www.microsoft.com)
Informix (www.informix.com)
|
XML Databases
|
- Native support for persisting XML
data structures (not just as a blob)
- For XML intensive applications it
removes the need for marshalling between XML structures and the database
structure
|
- Emerging technology
- Standards, e.g. the XML equivalent
of SQL, not yet in place for XML data access
- Not well suited for transactional systems
|
dbXML (www.dbxml.org)
OpenLink Software's Virtuoso (www.openlinksw.com)
Software AG's Tamino (www.softwareag.com)
X-Hive/DB (www.x-hive.com)
|
Table 3.
Potential Applications for Data Storage Mechanisms.
Mechanism
|
Potential Application
|
Flat Files
|
- Simple applications, particularly
those with a "read all the information, manipulate it for awhile, and save
it to disk" paradigm such as a word processor or spreadsheet where a
relational database would be gross overkill
- Persistence of configuration
information
- Sharing of information with other
systems
- Audit logging/reporting
|
Hierarchical databases
|
- Transaction-oriented applications
- Common source of legacy data
|
Object/Relational Databases
|
- Complex, highly inter-related data
structures (e.g. CAD/CAM parts inventory)
- Complex and low-volume
transactions (e.g. CAD/CAM, GIS applications)
- Simple, high-volume transactions
(e.g. point of sales)
- Single application, or single
application family, software products
|
Object Databases
|
- Complex, highly inter-related data
structures (e.g. CAD/CAM parts inventory)
- Complex and low-volume
transactions (e.g. CAD/CAM, GIS applications)
- Simple, high-volume transactions
(e.g. point of sales)
- Single application, or single
application family, software products
|
Prevalence Layer
|
- Complex object structures
- Single
application, or single application family, software products
|
Relational Databases
|
- High-volume applications
- Transaction-oriented applications
- Simple-to-intermediate complexity
of data
- Data intensive applications
- Shared, operational database
- Reporting database
|
XML Databases
|
- Ideally suited for XML-intensive
applications such as enterprise integration portals or online reporting
facilities
|
RDBMSs technology isn't perfect, no technology is, but it's the
one in most common use so we need to learn how to work with it effectively.
The reason why I have spent so much effort discussing the drawbacks of
this technology is that it is important that you understand what it is that
you're working with.
Many writers will focus on the benefits of RDBMSs, and there are clearly many benefits, but ignore the drawbacks.
Other writers will focus on
academic issues such as the concept that
there is no "true relational database" that fulfills all of
E.F. Codd's
original twelve rules, not to mention the more finely defined features of his
later writings. That's an
interesting issue to discuss over beer but I prefer to focus on the practical
issues that developers face day to day when working with this technology. |
 |
Coupling is a serious issue for all IT professionals, including
both application developers and Agile data engineers.
Encapsulating access to your database can help to alleviate the problems
of coupling but it is only a partial solution.
It is also important to recognize that relational databases are only one
of several choices that you have available to you to persist your data.
Non-relational approaches are viable solutions for some situations and
should be given appropriate consideration.
Having said this my assumption throughout the rest of this book is that
you will be working with relational databases to persist your data.