Introduction to Synthetic Test Data Generation

This Knowledge Base section describes in detail how to generate VIP Synthetic Test Data, part of VIP Test Data Management. This document will mainly use an example-based approach to cover the important topics.  Synthetic Test Data is an extremely useful tool in assisting with the development of systems that are built to be as robust as possible in terms of data quantity and variability.

…but first, a few questions

Why do I want to generate Synthetic Test Data?

Primarily to be able to test software systems or applications. Software is created with a set of assumptions and constraints in mind. Once it is developed, we can use it, right? Not so fast. We first need to test the developed system using data not created by the developers.   VIP has a feature that allows the developers of the software to create ‘synthetic’ data which is generated using a set of predefined criteria .

Reasons why Synthetic Test Data is a good idea

  • Testers often spend many hours trying to find the correct test data for testing.
  • Data is often required to be consistent across multiple applications.
  • Many bugs are actually incorrect data being used in a test, not in the application.
  • Test Data often changes and becomes invalid for the specific test.
  • Testers often cannabilize each others data
  • Incorrect data destabilises automated testing and creates automated test failures.
  • Each tester hunts for their own data and there is little reuse of previous data finds.
  • High volumes of storage required for test and data and time-consuming to provision. It is often quicker to synthesise data and provision it in parallel.
  • Last, but not least, Synthetic Data complies with privacy and protection laws, so there’s no need to worry about illegal use of personal data.

What will I gain from generating Synthetic Test Data?

You will gain the peace of mind that the system has been tested and that the synthetic data is representative of production, but also goes well beyond it in terms of test coverage.

How long will it take me to generate Synthetic Test Data?

If the application tables and their relationships have already been created, then generating Synthetic Test Data should only take a short time (hour/s).