Agile Data Logo

Implementing Reports

Follow @scottwambler on Twitter!

Reporting is a necessity within every organization and virtually within every business application. Your stakeholders will define some requirements that are best implemented as operational functionality, such as the definition and maintenance of customer information, and other requirements that are best implemented as reports. This article explores critical report implementation issues. How should you architect your database to support reporting? Should you build reports within your application or separately within another specialized reporting facility? Should you implement using object technology or with reporting tools? For our purposes a report is the read-only output of information, potentially including both "raw" base data and calculated/summarized values. Reports can be rendered in a variety of manners - printed, displayed on a screen, or an electronic file. Reports can be created in batch or in real time. A customer invoice is a report and so is a quarterly sales summary by division. In this article I discuss:

  1. Database deployment architecture
  2. Reporting within your application
  3. Reporting outside your application
  4. Database design strategies
  5. Implementation strategies
  6. Challenges that make reporting hard

1. Database Deployment Architecture

Most business applications, particularly those that edit and update data contained in a relational database, require relatively normalized data. This basic manipulation of data is often referred to as the operational features of an application. Data normalization is a design process where the goal is to ensure that data is stored in one and one place only. This results in cohesive tables and a database schema that is very flexible. Because reports often require a wide range of data, resulting in the need to join many normalized tables, a highly normalized database can be difficult to report from. This problem is exacerbated when a report needs "crunch" a large amount of data. The implication is that you need a database architecture that supports the operational needs of your application as well as its reporting needs.

My experience is that because the operational needs of an application are best served by a highly normalized schema, and that because reporting needs are best served by a denormalized data schema, that you want to consider implementing two separate schemas. So far the focus of this book has been on the operational needs of an application, not on reporting, and therefore I have not discussed the idea of separate data schemas until now. Figure 1 depicts a logical database deployment architecture, depicting the idea that your application will read and update information from an operational database. Data from the operational database will be used to load data marts, if any, as well as your corporate data warehouse. An operational database supports online transaction processing and analytical reporting. A data mart is a department/application specific database used for reporting. A data warehouse is collection of subject-oriented databases where each unit of data is relevant to a given moment in time. Table 1 compares and contrasts these types of databases.


Figure 1. Separate operational and reporting databases.


Table 1. Comparing Database Types.

Operational Database

Data Mart

Data Warehouse

Highly normalized

Some summary data for online reporting

Requirements driven

Supports read/write access by applications

Specific to one or more applications

Operational applications typically work with this type of database

Highly denormalized

Requirements driven

Specific to a single department and/or collection of application reports

Contains a snapshot of near-term information, typically less than a year old

Supports read-only access by applications

Often uses multi-dimensional database management system

Supports very flexible and unpredictable access to data

Ad-hoc reporting facilities typically work with this type of database

Highly normalized

May some summary data

Flexible design which stores granular data

Supports read-only access by applications

Not specific to any application or department

Driven by enterprise-level requirements

Contains massive amounts of data, typically an order of magnitude greater than a data mart or operational database

Often includes several (upwards to ten) years worth of data

"Standard" reports are developed and run on a regular basis against this type of database

Figure 1 depicts a logical architecture because a large organization is likely to have many physical operational databases, many physical data marts, and even several physical warehouses. A small company may have a single database that is used for all three purposes. Regardless of your physical implementation of this architecture, the basics still hold.

Inmon (2002) describes a process where the data from operational databases, including legacy databases, is denormalized and fed into the appropriate data marts. This data is loaded on a regular basis into the data marts, typically on a daily basis although periods of a few hours or even a few minutes is common. The data is also removed on a regular basis from the database to keep the data marts are a relatively stable size. Data is also loaded into your corporate data warehouse, although now this data will be highly denormalized and may need to be transformed to fit corporate data standards. The implication is that when an Agile data engineer evolves the data schema of your operational database, perhaps through database refactoring, they should always strive to ensure that the schema follows the enterprise data standards. This means they will need access to the standards, perhaps stored online in a meta data repository, and will need to work with the operations engineers to evolve the standards over time.

An advantage of separating operational databases from data marts and data warehouses is that it provides your team with the option to decouple your application from reporting-based data schemas. Although Figure 1 shows your application, via your encapsulation layer, accessing all three types of database you could choose to only interact directly with the operational database. This works when your application implements reports that are based only on data contained in the operational database. A better strategy is to externalize reports from your application and implement them via data mart or data warehouse-based reporting tools.


3. Reporting Within Your Application

Like any other functionality within your application, a report needs to be based on requirements. stakeholders should be working with application programmers and Agile data engineers to develop reports; the stakeholders provide the requirements and feedback on the work of the developers.

My implementation strategy for including reports in an application changes based on the development platform. When I'm building a fat-client application, perhaps building it with a Java Swing user interface or with Visual Basic, my preferred approach is to separate most if not all reports into their own application. In other words I build two applications, one that implements the operational logic and one that implements the reports. The operational application is typically implemented with an object-oriented language such as Java, C#, or Visual Basic and the reporting application is developed using a reporting tool (see Table 2). The reason for this is simple - I prefer to use the right development environment for the job. Sometimes I will invoke the reporting tool from the operational application so I can deploy a single, integrated application. Other times my stakeholders already have an existing reporting facility and they want the new reports to be added to it.

When I'm building a browser-based application I typically prefer to include reports in the operational application, although once again if my stakeholders want the reports to appear in a separate reporting application then that's what I'll do. My experience is that users of browser applications tend to want links to all related functionality within the application whereas users of fat client applications don't mind having a separate application. I'll implement a report as an HTML page that displays read-only information, or better it is implemented as an XML data structure than is then converted to HTML via XSLT (Extensible Stylesheet Language Transformation). Larry Greenfield presents a comprehensive list of reporting and query tools.

Table 2. Representative reporting frameworks.

Framework

Description

ActiveReports

ActiveReports includes a report wizard that steps you through creating simple reports without writing any code. Visual Basic and .NET versions of the product exist.

Crystal Reports

A reporting facility that can be integrated into Visual Studio .NET to create reports that can be invoked by .NET platform applications. It can also be integrated into Java applications via it's Java reporting SDK (software development kit), or use Java beans as data sources for reports.

Jasper Reports

A Java-based, open source report-generating tool that can deliver content onto the screen or printer, or into PDF, HTML, XLS, CSV and XML files.

Microsoft Access and Excel

A common approach to implementing reports within Microsoft applications is to simply invoke either Access or Excel.

Oracle Reports

A reporting tool for Oracle databases. A Java framework exists so that reports can be included in operation applications.

Report Builder Pro

A report-building IDE for Borland Delphi.

The logic to implement a report within your application code is fairly straightforward. You identify the selection rules for the information to appear in the report, for example "all employees with 5 or more years seniority that work in a Canadian subsidiary". You then obtain the data using one of the strategies - brute force, query objects, or persistent search criteria - described in Effective Practices for Retrieving Objects from Relational Databases. This data is then converted into a format that your report generation strategy can work with. If you are not using a reporting framework then you will need to code the report yourself. A good strategy is the Report Objects design pattern (Brant & Yoder 2000) which implement a report with objects that obtain the data, known as query objects, and with objects that output the data, known as viewing objects. Figure 2 depicts a UML Sequence diagram that overviews this strategy. The report object collaborates with the query object to obtain the data, marshalling the results into the format required by the viewing object. The viewing object works with the marshaled results to produce the outputted report.

A design rule of thumb is that reports that appear as part of your operational application should be based on your operational data and should answer a "what is happening right now" type of question. Examples of this type of question include "what is the current inventory levels of Blink 182 CDs?", "who is currently on call to answer Level 4 customer questions", and "when is a customer's order scheduled to be shipped?" When these two factors aren't true you should consider building the report as part of an external reporting application.

Figure 2. Report objects.

4. Reporting Outside Your Application

A common strategy is to implement reports outside of your application, typically using a reporting facility design for exactly that purpose. This strategy is often referred to as business intelligence reporting. Figure 1 depicts this concept, showing how an ad-hoc reporting facility is often used against data marts whereas pre-defined reports are often run against data warehouses. Ad-hoc reporting is typically performed for the specific purposes of a small group of users where it is common that the users write the report(s) themselves. Pre-defined reports are typically developed by the IT department in response to user request, sometimes within the scope of an application and sometimes as a small initiative in its own right.

Why separate ad-hoc reports from pre-defined reports? As you saw in Table 1 data marts are designed to support flexible, unpredictable access to data whereas data warehouses are not designed this way. So, we don't really need to separate the reports per se, just the databases. Representative vendors in the business intelligence reporting tool market include Cognos, Hummingbird, Information Builders, and Sagent.

The implication is that your organization may already have a reporting strategy in place. This strategy will be reflected in your enterprise's technical architecture and will encompass standard reporting tools, frameworks, and database technologies.


5. Database Design Strategies

What can you do design a database to be "performance friendly" to reports? In an ideal world you would like to have a perfectly normalized database, but it isn't an ideal world. To support reports you often need to consider the following changes to your database design:

There are other strategies that a good data engineer can implement toimprove reporting performance, strategies that Larry Greenfield nicely summarizes.


6. Implementation Strategies

When implementing reports, and the data extracts to support them, there are several implementation strategies you should adopt:


7. Challenges That Make Reporting Hard

Object technology doesn't readily lend itself to reporting. Although you have several implementation strategies available to you none of them are ideal. A "pure" object-oriented approach would be solely based on object, but because many reports require information from thousands and sometimes millions of object the database access and marshalling alone can be performance inhibitive. Luckily "non-pure" solutions exist, such as developing report objects or integrating with reporting tools, but they will most likely require you to break your database encapsulation strategy.

When reports are printed physical issues are brought into your design. Simple things such as aligning your output with the fields on pre-printed forms can be tedious when you don't have printer drivers specifically designed for those forms.

A more difficult issue to address is the fact that the owners of the data marts and your corporate data warehouse likely do not work in an agile manner, although this clearly does not need to be the case. You will need to find ways to work with them, perhaps a combination of helping them to become more agile and of learning to tolerate a little bit of bureaucracy. They may not be able to tolerate your team refactoring the layout of the data extract schema on a rapid basis, and may in fact require you to release on a much slower basis (quarterly instead of weekly). Furthermore they may not be able to respond quickly to your requests for schema changes within their databases. You'll need to find ways to work together effectively, something that can be particularly hard if your team is the first one in your organization to take an agile approach to development.

Similarly your operations department, who likely control access to your corporate printing facilities, may not work in an agile manner. Once again you'll need to find ways to work together.

Finally, don't allow your team to drift into analysis paralysis because of some misguided goal to ascertain "one truth".


8. What You Have Learned

This article focused on a basic but critical aspect of software-based systems - reporting. Your organization likely has a database architecture that takes reporting into consideration. This architecture may involve separate reporting databases, such as data marts and data warehouses, which you will need to export data too and then report from it. You learned that existing reporting tools and frameworks may exist for you to reuse on your initiative.

A fundamental decision that you need to make is whether reports are included as part of your application, if they are to be implemented in a separate reporting application/facility, or a combination thereof. When you are building them into your application you may choose to code them yourself or to use an integrated reporting tool.

My final word of advice is this: Don't underestimate the difficulties implementing reports. The technical issues are straightforward to overcome but the people-oriented issues can prove to be your downfall. Work closely with all of your stakeholders, not just the business stakeholders.