Agile Data Logo

Agile Database Tools and Scripts

Follow @scottwambler on Twitter!

To implement the Agile Data method within your organization you will need to adopt, build, and/or modify a collection of tools. Tools are just a start, you also need an effective technical environment in which to use them. This environment should be comprised of several "sandboxes" in which you will work. Finally, Agile data engineers will discover that they need to several different types of scripts to support their development efforts.

1. Tools

Having an effective toolset is a critical success factor for any software development effort. Table 1 lists categories of tools, the target audience for the tool, how you would use the tool, and links to a representative sample of such tools. Chances are very good that you already have many of these tools in house, although you will undoubtedly need to obtain several of them.

Agile isn't in the tool, it's in the way that you use the tool.

Table 2 lists tools that to my knowledge do not exist yet, at least at the time of this writing, that are needed to support the Agile Data method. My hope that we will see both commercial and open source tools available in the near future.

Table 1. Potential Tools That Support Agile Data Efforts.

Tool Category




CASE Tool - Development Modeling

Application Developer, Agile data engineer

To support your application development efforts.

CASE Tool - Physical Data Modeling

Agile data engineer

To define and manage your physical database schema. Many data modeling tools support the generation and deployment of DDL code, making it easier to change your database schema. And they also produce visual representations of your schema and support your documentation efforts.

Configuration Management


You need to place all data definition language (DDL), source code, models, scripts, documents, "¦ under version control.

Database Refactoring Tools Application Developer, Agile data engineer To evolve your database schema in small, safe steps.
Development IDE/Refactoring Browser

Application Developer, Agile data engineer

To support your programming and testing efforts.

Extract Transform Load (ETL)

Agile data engineer,

Operations engineer

ETL tools can automate your data cleansing and migrating efforts that evolve your database schema.

Persistence Frameworks

Application Developer, Agile data engineer

Persistence frameworks/layers encapsulate your database schema, minimizing the chance that database refactorings will force code refactorings external applications.

Release Tools Application Developer, Agile data engineer You need to deploy your database between sandboxes, including production.
Test Data Generator

Application Developer, Agile data engineer

Developers need test data against which to validate their systems. Test data generators can be particularly useful when you need large amounts of data, perhaps for stress and load testing.

Testing tools for load testing, user interface testing, system testing, "¦

Application Developer,

Agile data engineer

You will need to go beyond unit testing to perform a more robust set of tests that go beyond unit testing. The Full Lifecycle Object-Oriented Testing (FLOOT) method which encapsulates a wide range of traditional and agile testing techniques.

Traceability Management/ Repository


Unit testing tools for your applications

Application Developer

Developers must be able to unit test their work, and to support iterative development they must be able to easily regression test.

Unit testing tools for your database

Agile data engineer

Whenever you change your database schema, perhaps as the result of a database refactoring, you must be able to regression test your database to ensure that it still works.

Other Agile data engineer

Table 2. Future Tools.

Tool Category


Automated Schema Traceability Management Tools

Although Table 1 includes traceability management tools the reality is that most tools are geared either towards requirements traceability or data access traceability (as in the case of repositories such as Rochade and Advantage). Neither are suited for the fine-grained traceability required for database refactoring. Ideally you need a tool that can trace a wide range of application features, such as COBOL procedures and Java operations, to database features such as stored procedures and table columns. Because of the complexity of this task the less manual intervention the better - ideally it should be able to parse your application and database code and create the traceability matrix automatically.

2. Sandboxes

This section has been replaced by the Sandboxes Core Practice article.

3. Scripts

Agile Database Techniques

Pramod Sadalage and Peter Schuh (2002) suggest that data engineers maintain what they call a database change log and an update log, the minimum that you require for simple stovepipe initiatives where a single application accesses your database. However, to support more complex environments where many applications access the your database you also require a data migration log. Let's explore how you use each log:

  1. Database change log. This log contains the data definition language (DDL) source code that implements all database schema changes in the order that they were applied throughout the course of an initiative. This includes structural changes such as adding, dropping, renaming, or modifying things such as tables, views, columns, and indices.
  2. Update log. This log contains the source code for future changes to the database schema that are to be run after the deprecation period for database changes. The Process of Database Refactoring argues that changing your database schema is inherently more difficult than changing application source code - other developers on your team need time to update their own code and worse yet other applications may access your database and therefore need to be modified and deployed as well. Therefore you will find that you need to maintain both the original and changed portions of your schema, as well as any scaffolding code to keep your data in sync, for a period of time called the "deprecation period."
  3. Data migration log. This log contains the data manipulation language (DML) to reformat or cleanse the source data throughout the course of your initiative. You may choose to implement these changes using data cleansing utilities, often the heart of extract-transform-load (ETL) tools, examples of which are listed in Table 1.
You may choose to implement each logical script as a collection of physical scripts, perhaps one for each development iteration or even one for each individual database refactoring, or you may choose to implement as a single script that includes the ability to run only a portion of the changes. You need to be able to apply subsets of your changes to be able to put your database schemas into known states. For example you may find yourself in development iteration 10 to discover that you want to roll back your schema to the way it was at the beginning of iteration 8.