Why use subset databases?

Databases in business today

Databases today are growing in both size and complexity to meet the ever-increasing demands of business. Applications to process the data in these databases are also increasing in size and complexity. Alongside this, organisations want a rapid turn around for changes to these systems. Testing becomes more important than ever but timescales are squeezed. Ideally we would like to test all changes against up-to-date production data, so the "simple" answer is use a copy of the production database for all testing.

The Problem

When a database exceeds a certain size it becomes very expensive (in terms of elapsed time, hardware and manpower) to provide full-size copies of the production database for development, testing and training.

One resolution to this problem is to have fewer full size copies of the production database than are really needed, commonly only one, and ask the various development and testing teams to share.

Clearly this is far from optimal. A great deal of effort is required to manage and schedule usage of the production copy. Data in the database is left in an unknown state when passed from one team to the next. It takes a long time, frequently more than a day, to provide a refresh of the production copy when itís required. Always having an up-to-date production copy is virtually impossible.

The Solution

The reality is that databases required by training and application development teams rarely need to be full size, performance and volume testing are possible exceptions. In fact it is often easier to work on a small copy as response times and batch runtimes are much quicker. What is often required is a smaller version of the production database that correctly replicates the database structure and content of the larger database.


The main requirements for these "subset databases" are:

  • Referentially correct - there is no "missing data" for any particular data relationship
  • Correctly sized - they only take up the amount of disk space that is appropriate
  • Quick and easy to refresh from production
  • Easy to specify the data required for each refresh


The use of subset databases in preference to full-size copies of production provides a number of key benefits including:

  • Reduced hardware costs in terms of disk storage
  • More productive use of technical support staff (less time spent refreshing test databases)
  • Developers and application support staff are more productive.
  • Improved application quality through better testing

Creating a smaller, but referentially correct, subset database is not necessarily a quick or easy task. However the easier and more automated the process is, the better it is for everyone. This is where SelExL becomes an indispensable tool.

SelExL Logo - click to visit the SelExL page