Table of Contents
- Database Encapsulation Layers
- Encapsulation Layer Architecture
- Implementation Strategies
- Error Handling
- References and Recommended Reading
Figure 1 depicts a typical architectural layering strategy for building business applications which includes a database encapsulation layer (also known as a persistence layer or simply a data layer). A database encapsulation layer hides the implementation details of your database(s), including their physical schemas, from your business code. In effect this layer provides your business objects with persistence services – the ability to read data from, write data to, and delete data from – data sources. Ideally your business objects should know nothing about how they are persisted, it just happens. Database encapsulation layers aren’t magic and they aren’t academic theories; database encapsulation layers are commonly used practice by both large and small applications as well as in both simple and complex applications. Database encapsulation layers are an important technique that every agile software developer should be aware of and be prepared to use.
An effective database encapsulation layer will provide several benefits:
- It reduces the coupling between your object schema and your data schema, increasing your ability to evolve either one.
- It implements all data-related code in one place.
- It simplifies the job of application programmers.
- It allows application programmers to focus on the business problem and Agile data engineer(s) can focus on the database.
- It gives you a common place to implement data-oriented business rules.
- It takes advantage of specific database features, increasing application performance.
There are potentially several disadvantages of database encapsulation layers:
- They require investment.
- They often require reasonably clean mappings .
- Can provide too little control over database access (although in my paper The Design of a Robust Persistence Layer I show that this does not necessarily have to be so.
Figure 2 depicts the simplest architecture for encapsulating access to a relational database – a single application working with a single database. In this situation there is the greatest potential for flexibility as your team should be able to choose the implementation strategy, such as data access objects or a persistence framework, that best fits your situation. Furthermore you should be in a position to evolve both your object schema and your database schema as you implement new requirements.
A far more realistic situation to be in is Figure 3 that depicts a multiple application, single database architecture. This architecture is common in organizations that have a centralized legacy database with which all applications work. Another realistic situation is shown in Figure 4 where there are multiple applications working with multiple databases. In this case you are likely accessing your “preferred” database(s) as well as one or more legacy data sources.
One interesting observation about both of these diagrams is that some applications may not take advantage of the encapsulation layer(s) and instead directly access data. There are several reasons for this:
- Your data encapsulation layer is written in a language that some legacy applications can’t easily access (e.g. COBOL cannot easily access Java).
- You’ve chosen not to rework some of your legacy applications to use the database encapsulation layers.
- You want to use technologies, such as a bulk load facility or a reporting framework, that require direct access to the database schema. Note that this may motivate your team to sometimes go around the encapsulation layer.
The point is that some applications will be able to take advantage of your encapsulation layer(s) and some won’t. There are still benefits because you are reducing coupling and therefore reducing your development costs and maintenance burden.
Figure 4 makes it clear that some applications already have an encapsulation layer in place. If this is the case you might want to consider reusing the existing approach instead of developing your own. By having a single encapsulation layer that all applications use to access all data sources (where appropriate) you potentially reduce the effort it takes to evolve your database schemas via database refactoring because there is only one encapsulation layer to update. If you’ve purchased the encapsulation layer there is an opportunity to reduce overall licensing fees because you only need to deal with one vendor. The potential disadvantage is that the team responsible for maintaining the encapsulation layer could become a bottleneck if they are unable or unwilling to work in an agile manner.
Regardless of whether you intend to purchase, build, or download an a database encapsulation layer it is critical for both agile data engineers and application developers to understand the various implementation strategies. There are four basic strategies – brute force, data access objects, persistence frameworks, and services – that you should consider using.
The basic strategy with the brute force approach is that business objects access data sources directly, typically submitting Structure Query Language (SQL) or Object Query Language (OQL) code to the database. In Java applications this will be done via the Java Database Connectivity (JDBC) class library and via Open Database Connectivity (ODBC) application programming interface (API) in Microsoft-based applications. Note that Microsoft has newer approaches, such as the ActiveX Data Object (ADO) and Microsoft Data Access Component (MDAC) libraries, which encapsulate and extend ODBC. Other environments, such as Ruby, have their own native APIs which often take advantage of existing ODBC or JDBC database drivers.
The brute force approach isn’t really a database encapsulation strategy, it is what you do when you don’t have a database encapsulation layer. However, it is a valid option for database access. Furthermore it is likely the most common approach because it is simple and provides programmers with complete control over how their business objects interact with the database. Because of its simplicity this is a very good approach to take at the beginning of an initiative when your database access requirements are fairly straightforward. As your database access needs become more complex data access objects or persistence frameworks are likely better options.
Data access objects (DAOs) encapsulate the database access logic required of business objects. The typical approach is for there to be one data access object for each business object, for example the Customer class would have a Customer_Data class. The Customer_Data class implements the SQL/OQL code required to access the database, similar to the brute force approach. The main advantage of data access objects over the brute force approach is that your business classes are no longer directly coupled to the database, instead the data access classes are. It is quite common to simply develop your own data access objects, although you may also choose to follow industry-standard approaches such as Java Data Object (JDO)s and DataObjects.Net. Note: This “industry standards” change every couple of years.
A persistence framework, often referred to as a persistence layer, fully encapsulates database access from your business objects. Instead of writing code to implement the logic required to access the database you instead define meta data that represents the mappings. So, if the Customer class maps to the T_Customer table part of the meta data would represent this mapping. Meta data representing the mappings of all business objects, as well as the associations between them would also need to exist. Based on this meta data the persistence framework would generate the database access code it requires to persist the business objects. Depending on the framework, see this web search, this code is either generated dynamically at run time or it can be generated statically in the form of data access objects which are then compiled into the application. The first approach provides greater flexibility whereas the second provides greater performance.
Persistence frameworks will have a variety of features. Simple ones will support basic create, read, update, delete (CRUD) functionality for objects as well as basic transaction and concurrency control. Advanced features include robust error handling, database connection pooling, caching, XML support, schema and mapping generation capabilities, and support for industry standard technology such as EJB.
Taking the persistence framework approach the job of an Agile data engineer becomes a little more complex but a lot less onerous. You will be expected to install, if necessary, the persistence framework. You will also need to work with the administration facility to define and maintain the mapping meta data. In the case of explicitly controlled persistence frameworks application programmers will need mentoring in the use of the framework, often a very simple task.
For the sake of discussion a service is an operation offered by a computing entity that can be invoked by other computing entities. At the time of this writing the most popular architectural strategy is web services, however, as you see in Table 2 , it is only one of several common strategies available to you:
- Common Object Request Broker Architecture (CORBA)
- Customer Information Control System (CICS) Transaction
- Electronic data interchange (EDI)
- Stored procedures
- Web Services
Services are typically used to encapsulate access to legacy functionality and data, and there is a clear preference within the industry to build new applications following a web services-based architecture to facilitate reuse via system integration.
One way to identify services is overviewed here. As you can see, it indicates the potential need for business services such as check to see if student exists, verify that a person is eligible to enroll, add applicant to database, and calculate enrollment fees. To support these services you would need to access the database. Whether you want to do that directly through DAOs, or through a layer of “pure data services”, whatever that implies, is up to you. Arguably the first and third business service that I listed are mostly data services, at least on the surface.
Let’s start with some terminology. Marshalling is the conversion of an object into a data structure such as an XML document or a data set. Unmarshalling is the conversion of data to objects. However, it is common to refer to both types of conversion simply as marshalling. Don’t worry about it.
Because (un)marshalling occurs at boundary points within your system you need to consider validating the data. You want to ensure that your data is in a valid state, that individual values conform to business rules (e.g. someone’s age is less than 150) and to referential integrity rules. The Joy of Legacy Data described many common data quality problems that you may want to try to detect. There are several data validation issues that you need to consider:
- Do you validate the data at all?
- Where is validation performed?
- Do you validate automatically?
- What do you do when you find a problem?
An important feature of a database encapsulation layer is its ability to handle database errors accordingly. Whenever the encapsulation layer interacts with a data source there is a potential that an error can occur. Common types of database-oriented errors include:
- The database is not available
- The network is not available
- The request you made to the database is not correct (e.g. improperly formulated SQL code, you’re trying to invoke a stored procedure that doesn’t exist)
- You are trying to work with data that doesn’t exist (e.g. trying to update a deleted record)
- You are trying to insert existing data
- The data you want to access is locked (e.g. you want to update a record that another user has write locked)
The list described above is nowhere near complete, you only have to look at the list of error codes in the manuals for your database to see this, but it is a good start. The point is that errors happen and you need to be prepared to act on them. A good encapsulation layer should be able to:
- Detect database-oriented errors and continue processing.
- Optionally log the details pertaining to the error.
- Report the error to the invoking application in an intelligible manner.