Data Abstraction and Encapsulation: Strategies to Reduce Architectural Coupling

by Scott W. Ambler

Encapsulation is a design issue that deals with how functionality is compartmentalized within a system: you should not have to know how something is implemented to be able to use it. The implication of encapsulation is that you can build anything anyway you want, and then you can later change the implementation and it will not affect other components within the system as long as the interface to that component did not change. Data source encapsulation, also known as data abstraction, is the act of reducing the architectural coupling that your systems have with data sources.

People often say that encapsulation is the act of painting the box black – you are defining how something is going to be done, but you are not telling the rest of the world how you’re going to do it. For example, consider your bank. How do they keep track of your account information, on a mainframe, a mini, or a PC? What database do they use? What operating system? It doesn’t matter to you because the bank has encapsulated the way in which they perform account services. You just walk up to a teller and perform whatever transactions you wish. In this article I describe the concept of a data abstraction layer, various encapsulation architectures, and implementation strategies for database encapsulation. This article is written following the assumption that you are working with relational databases, although if this isn’t the case much of the advice in this article is still appropriate with slight modifications.

Data Abstraction Layers
Data Abstraction Layer Architectures
Implementation Strategies
Marshalling
Error Handling
References and Recommended Reading

1. Data Abstraction Layers

Figure 1 depicts a typical architectural layering strategy for building business applications which includes a database encapsulation layer (also known as a persistence layer or simply a data layer). A database encapsulation layer hides the implementation details of your database(s), including their physical schemas, from your business code. In effect this layer provides your business objects with persistence services – the ability to read data from, write data to, and delete data from – data sources. Ideally your business objects should know nothing about how they are persisted, it just happens. Database encapsulation layers aren’t magic and they aren’t academic theories; encapsulation layers are commonly used practice by both large and small applications as well as in both simple and complex applications. Database encapsulation layers are an important technique that every agile software developer should be aware of and be prepared to use.

Figure 1. A typical architectural layering strategy for an application (click to enlarge).

An effective database encapsulation layer will provide several benefits. It will:

Reduce the coupling between your code and your data schema, increasing your ability to evolve either one.
Implement all data-related access code in one place.
Simplify the job of application programmers.
Allow developers to focus on the business problem and Agile data engineer(s) can focus on the database.
Give you a common place to implement data-oriented business rules.
Take advantage of specific database features, increasing application performance.

There are potentially several disadvantages of database encapsulation layers:

They require investment.
They often require reasonably clean mappings.
Can provide too little control over database access.

2. Data Abstraction Layer Architectures

Figure 2 depicts the simplest architecture for encapsulating access to data source(s) – a single system working with one or more data sources that are the responsibility of the system team. In this situation there is the greatest potential for flexibility as your team should be able to choose the data abstraction layer implementation strategy, see below, that best fits your situation. Furthermore you should be in a position to evolve both your code and your data schema(s) as you implement new requirements.

Figure 2. Single system directly accessing “owned” data source(s) (click to enlarge).

Database encapsulation layer

A far more realistic situation to be in is Figure 3 that depicts multiple systems accessing your data sources. This architecture is common in organizations that have one or more centralized legacy database(s) with which all systems work. Another realistic situation is shown in Figure 4 where there are multiple applications working with multiple databases. In this case you are likely accessing your “preferred” database(s) as well as one or more legacy data sources.

Figure 3. Multi-application, single database architecture (click to enlarge).

Realistic database encapsulation layer

One interesting observation about both of these diagrams is that some systems may not take advantage of the encapsulation layer(s) and instead directly access data. There are several reasons for this:

Your data access layer is written in a language that some legacy applications can’t easily access (e.g. COBOL cannot easily access Java).
You’ve chosen not to rework some of your legacy applications to use the database encapsulation layers.
Several teams implemented their own data access layers.
You want to use technologies, such as a bulk load facility, that require direct access to the database schema. Note that this may motivate your team to sometimes go around the data access layer.

The point is that some systems will be able to take advantage of your data access layer(s) and some won’t. There are still benefits because you are reducing coupling and therefore reducing your development costs and maintenance burden.

Figure 4. Multi-application, multi-data source architecture (click to enlarge).

Figure 4 makes it clear that some systems already have an encapsulation layer in place. If this is the case you might want to consider reusing the existing approach instead of developing your own. By having a single data layer that all applications use to access all data sources (where appropriate) you potentially reduce the effort it takes to evolve your data source schemas via database refactoring because there is only one data access layer to update. If you’ve purchased the data layer there is an opportunity to reduce overall licensing fees because you only need to deal with one vendor. The potential disadvantage is that the team responsible for maintaining the data layer could become a bottleneck if they are unable or unwilling to work in an agile manner.

3. Implementation Strategies For Data Abstraction Layers

Regardless of whether you intend to purchase, build, or download data access layer it is critical for both agile data engineers and application developers to understand the various implementation strategies. There are four basic strategies – brute force, data access objects, persistence frameworks, and services – that you should consider using.

3.1 Data Abstraction Layer Strategy: Brute Force

The basic strategy with the brute force approach is that business objects access data sources directly, typically submitting Structure Query Language (SQL) or Object Query Language (OQL) code to the database. In Java applications this will be done via the Java Database Connectivity (JDBC) class library and via Open Database Connectivity (ODBC) application programming interface (API) in Microsoft-based applications. Note that Microsoft has newer approaches, such as the ActiveX Data Object (ADO) and Microsoft Data Access Component (MDAC) libraries, which encapsulate and extend ODBC. Other environments, such as Ruby, have their own native APIs which often take advantage of existing ODBC or JDBC database drivers.

The brute force approach isn’t really a database encapsulation strategy, it is what you do when you don’t have a database encapsulation layer. However, it is a valid option for database access. Furthermore it is likely the most common approach because it is simple and provides programmers with complete control over how their business objects interact with the database. Because of its simplicity this is a very good approach to take at the beginning of an initiative when your database access requirements are fairly straightforward. As your database access needs become more complex data access objects or persistence frameworks are likely better options.

3.2 Data Abstraction Layer Strategy: Data Access Objects (DAOs)

Data access objects (DAOs) encapsulate the database access logic required of business objects. The typical approach is for there to be one data access object for each business object, for example the Customer class would have a Customer_Data class. The Customer_Data class implements the SQL/OQL code required to access the database, similar to the brute force approach. The main advantage of data access objects over the brute force approach is that your business classes are no longer directly coupled to the database, instead the data access classes are. It is quite common to simply develop your own data access objects, although you may also choose to follow industry-standard approaches such as Java Data Object (JDO)s and DataObjects.Net. Note: This “industry standards” change every couple of years.

3.3 Data Abstraction Layer Strategy: Persistence Frameworks

A persistence framework, often referred to as a persistence layer, fully encapsulates database access from your business objects. Instead of writing code to implement the logic required to access the database you instead define meta data that represents the mappings. So, if the Customer class maps to the T_Customer table part of the meta data would represent this mapping. Meta data representing the mappings of all business objects, as well as the associations between them would also need to exist. Based on this meta data the persistence framework would generate the database access code it requires to persist the business objects. Depending on the framework, see this web search, this code is either generated dynamically at run time or it can be generated statically in the form of data access objects which are then compiled into the application. The first approach provides greater flexibility whereas the second provides greater performance.

Persistence frameworks will have a variety of features. Simple ones will support basic create, read, update, delete (CRUD) functionality for objects as well as basic transaction and concurrency control. Advanced features include robust error handling, database connection pooling, caching, XML support, schema and mapping generation capabilities, and support for industry standard technology such as EJB.

Taking the persistence framework approach the job of an Agile data engineer becomes a little more complex but a lot less onerous. You will be expected to install, if necessary, the persistence framework. You will also need to work with the administration facility to define and maintain the mapping meta data. In the case of explicitly controlled persistence frameworks application programmers will need mentoring in the use of the framework, often a very simple task.

3.4 Data Abstraction Layer Strategy: Services

For the sake of discussion a service is an operation offered by a computing entity that can be invoked by other computing entities. At the time of this writing the most popular architectural strategy is web services, however, as you see in Table 2 , it is only one of several common strategies available to you:

Common Object Request Broker Architecture (CORBA)
Customer Information Control System (CICS) Transaction
Electronic data interchange (EDI)
Stored procedures
Web Services

Services are typically used to encapsulate access to legacy functionality and data, and there is a clear preference within the industry to build new applications following a web services-based architecture to facilitate reuse via system integration.

One way to identify services is overviewed here. As you can see, it indicates the potential need for business services such as check to see if student exists, verify that a person is eligible to enroll, add applicant to database, and calculate enrollment fees. To support these services you would need to access the database. Whether you want to do that directly through DAOs, or through a layer of “pure data services”, whatever that implies, is up to you. Arguably the first and third business service that I listed are mostly data services, at least on the surface.

4. Marshalling

Let’s start with some terminology. Marshalling is the conversion of an object into a data structure such as an XML document or a data set. Unmarshalling is the conversion of data to objects. However, it is common to refer to both types of conversion simply as marshalling. Don’t worry about it.

Because (un)marshalling occurs at boundary points within your system you need to consider validating the data. You want to ensure that your data is in a valid state, that individual values conform to business rules (e.g. someone’s age is less than 150) and to referential integrity rules. The Joy of Legacy Data described many common data quality problems that you may want to try to detect. There are several data validation issues that you need to consider:

Do you validate the data at all?
Where is validation performed?
Do you validate automatically?
What do you do when you find a problem?

5. Error Handling

An important feature of a database encapsulation layer is its ability to handle database errors accordingly. Whenever the encapsulation layer interacts with a data source there is a potential that an error can occur. Common types of database-oriented errors include:

The database is not available
The network is not available
The request you made to the database is not correct (e.g. improperly formulated SQL code, you’re trying to invoke a stored procedure that doesn’t exist)
You are trying to work with data that doesn’t exist (e.g. trying to update a deleted record)
You are trying to insert existing data
The data you want to access is locked (e.g. you want to update a record that another user has write locked)

The list described above is nowhere near complete, you only have to look at the list of error codes in the manuals for your database to see this, but it is a good start. The point is that errors happen and you need to be prepared to act on them. A good encapsulation layer should be able to:

Detect database-oriented errors and continue processing.
Optionally log the details pertaining to the error.
Report the error to the invoking application in an intelligible manner.

6. Related Reading

The Design of a Robust Persistence Layer