A common programming task is to retrieve one or more
objects, the data for which is stored in a
relational database, into memory.
Perhaps you need to display a list of people that work in a department.
Perhaps you need to enable your users to define search criteria used to
list available inventory items. Perhaps
you need to implement a report. Although
these sound like easy tasks there are various
architectures,
implementation
core practices, and ways
to represent the results that you need to be aware of.
For the sake of convenience the term "find strategy" will refer to your
implementation strategy for finding the data which objects in relational
databases. The deciding factor in
choosing a find strategy is the level of database encapsulation that you wish to
have. In Encapsulating
Database Access you saw that there was four basic approaches for
implementing database access - brute
force, data
access objects (DAOs), persistence
frameworks, and services.
Similarly there are different find strategies that you may choose from:
- Brute force. With the "brute force" find strategy you simply embed database access code, such as Structured Query Language
(SQL) statements or Enterprise JavaBean (EJB) Query Language (EJB QL), in your business objects.
This is the simplest of the three basic approaches and the one taken the
most often. It unfortunately suffers from a high level of coupling
between your object and database schemas.
- Query objects (Brant
& Yoder 2000). This is the data access object (DAO)
strategy for retrieving objects.
Instead of embedding SQL code in your business objects you instead
encapsulate it in separate classes.
A simple approach would implement a single public operation that accepted the
criteria and returned a collection of zero or more objects representing the
result set. A more sophisticated
approach would enable you to work with the query result in a number of different
ways, such as collections of objects, as XML documents, or as simple data sets.
The various ways that query results can
be represented are discussed later.
- Metadata driven. A metadata-driven approach is the most sophisticated strategy
available to you and is typically implemented as part of a persistence
framework. The basic idea is that
you want to decouple your object schema from your data schema and the only way
to do this is to describe the
mappings
between them in meta data instead of in hard coded SQL.
Instead of defining a SQL SELECT statement that specifies the search in
terms of database columns your application must instead define the search in
terms of the object attributes. Figure 1 overviews how this
strategy would work (based on my
Design of a
Robust Persistence Layer developed in the mid-1990s). The business
object submits the meta data for a query, perhaps represented as an XML
document or as a full-fledged object, to a query processor. This
meta data would represent concepts such as return all customers whose name looks
like "˜Sc* A*', return the account with account number 1701-1234, and return
all employees whose hire date is between January 1 1987 and June 14 1995 that
work in the marketing department. The
query processor passes the query to a query builder that uses the mapping meta
data to build a SELECT statement which can then be submitted to the database.
The results come back from the database and are converted into the
appropriate representation (such as an XML structure, a collection of objects,
and so on). The representation is then
returned to the business object.
Figure 1. Overview of the meta data driven
strategy.

The primary advantage of this approach is that it enables
you to keep your object schema and data schema decoupled from one another.
As long as the query meta data reflects the structure of the current
object schema and the mapping meta data is current then you do not need to embed
SQL within your object schema to find objects stored in relational databases.
 |
Deciding on, and then implementing, your find strategy is just the first
step. You also need to resolve basic issues such as how to handle
errors, how many objects you expect to come back as the result of a find query,
when to bring the result across the network, and how to accept search criteria
from users. In this section I
describe several "best
practices" that I have found useful over the years to address
these issues. |
- Use the native error handling strategy.
Things don't always go right and therefore you need
to handle error conditions properly. Languages
offer two basic facilities for indicating error: exceptions and return codes.
Languages such as Java and C# support the ability to throw exceptions
from operations.
- Expect "logic" errors. You will need a strategy for handling logic errors. When many users can
access the database simultaneously, the norm for most applications, logic errors
will occur.
These logic errors often represent potential referential integrity
problems.
- Always return a collection. A
retrieval/find operation should always return a
collection, such as a vector or array, as the result.
This is a good strategy because it's a simple, consistent approach; You can easily determine the size of a collection,
simplifying logic error detection because you can determine if there are no
objects as the result of your query or several objects when you only
expected one; and in languages that don't support exceptions you can
simply use the first element in the collection as the location for the error
code/message/object. A more sophisticated approach is to develop a FindResult
class that includes a collection containing the result set and the error (if
any). This class would have the
ability to iterate over the collection, to answer basic queries such as isEmpty()
and isSingleResult().
- Use proxies and lazy
initialization for search lists. Lazy initialization is an object-level
technique for improving system performance via an object proxy.
An object proxy contains just enough information to identify the object
within the system, very likely the
primary key attributes, and enough
information for users to identify the object.
The columns that are displayed on a search result screen often determine
the latter information.
The basic idea is that instead of bringing all of the data
for every object in the result set you only bring across the identifying
information. This information is
displayed to the user, who then selects one of the proxies from the list to work
with it. The system then retrieves
all of the information for the selected business object and enables the user to
work with that object.
- Use lazy reads for high-overhead attributes. Lazy reading is an attribute-level
technique for improving system performance.
The
basic idea that the attribute's value is read from the database, or calculated
as the case may be, the first time it is needed instead of setting the value
when the object is first retrieved into memory.
Lazy read is a good option when an object's attribute is high-overhead,
perhaps it is very large and would be slow to transmit over the network or it
requires intensive calculations to compute, and when it is rarely accessed.
- Program for people. When you're building a search screen
your users need some way to indicate their search criteria.
The important thing to remember is that your users very likely aren't
computer professionals, they might be struggling with basic computer literacy,
and may not even be comfortable using computers.
Have you ever met someone, told them that you were a software architect
for a large and impressive firm, and the only question they have for you is how
much memory they should get when they buy a home computer next month?
This person is very likely one of your more advanced users, and they
clearly don't understand what it is that you do for a living. The point is that your search facility
will need to be
user friendly.
It
should follow accepted user interface standards, which your organization should
have in place and if they don't then your team should follow industry
standards.
Although this may sound like blasphemy to object purists, you don't always
need nor want objects as the result of a search.
The fact is that there are several ways that the results of a find can be
represented, as you see in Table 2 which
describes the various ways to represent customers.
You don't need to support all of these representations in your
application but you will likely find that you need several.
Table 2. Various ways to
represent the results of a find.
Approach
|
Description
|
Business objects
|
The result set is marshaled into a collection of Customer
objects.
|
Comma Separated Value (CSV) file
|
The result set is marshaled into a text file, with
one row in the file for each customer.
Commas separate the column values (e.g. Scott,William,Ambler).
|
Data structure
|
The result set is marshaled into a collection of data
structures. Each customer
data structure is typically just a collection of data values.
|
Data transfer objects
|
The result set is marshaled into a collect of objects
that just contain the data and the getters and setters to access the data.
These objects are serializable.
See Marinescu
(2002) and Fowler
et. al. (2003) for detailed discussions.
|
Dataset
|
The result set from the database, as it is returned
by your database access library (e.g. JDBC or ADO.NET).
|
Flat File
|
The result set is marshaled into a text file, with
one row in the file for each customer.
The data values are written into known positions (e.g. the first
name is written into positions 21 through 40).
|
Proxies
|
The result set is marshaled into a collection of
proxy objects that contain just enough information for both the system and
your users to identify the object.
|
Serialized business objects
|
The result set is marshaled into a collection of
business objects. This
collection is in turn converted into a single binary large object (BLOB),
or another similar format, which can be transmitted across the network as
a single entity and then converted back into the original collection of
objects by the receiver.
|
XML document
|
The result set is converted into a single
XML
document, which will contain zero or more customer structures.
|
4. Acknowledgements
I'd like to thank Josh Collier for his feedback regarding this article.