Today, more than 50% of organisations are using full-size copies of production data in database development and testing [1], while 54% of test teams still depend on production data copies [2]. This use of low-variety production data undermines test coverage, along with the software quality that depends on it.
Too often, test data practices overlook questions of test coverage. Yet, achieving the right coverage is paramount to successful testing. This is because test coverage focuses on mitigating the risk of costly bugs, by testing the system’s logic as rigorously as needed in-sprint.
Poor test coverage, by contrast, increases the risk of defects getting past testing and into production. This in turn increases the time and cost to fix the bugs, as they are detected too late in the software delivery lifecycle.
This blog will explore common causes of low test data coverage, before offering 5 techniques for overcoming these issues. These techniques have been chosen to help you consider a new and transformative approach to test data.
This blog is part 3/4 in a series focusing on test data modernization. Check out the other 3 parts below:
Testers and developers manage test data using a range of techniques, including generation, masking and subsetting. However, many legacy TDM practices persist across the industry. These hinder test coverage. Four such practices are summarised below:
Copying raw or masked production data is simply not good enough for rigorous testing. This is because production data rarely covers negative scenarios, edge cases, or data to test new functionality. By contrast, rigorous testing requires a spectrum of data combinations with which to execute each test:
Manually copying complex data across environments and systems is slow and error-prone, often breaking relationships in the data. Furthermore, databases are likely to change during refreshes, which causes data sets to become unaligned.
Testing with out-of-date and misaligned data in turn undermines test coverage and causes time-consuming test failures. In fact, 61% of respondents in the latest World Quality Report cite “maintaining test data consistency across different systems under test” as a test data challenge [2].
Subsetting test data is valuable for lowering storage costs, data provisioning time, and the time required to execute tests. However, simplistic subsetting techniques can damage both the relationships and coverage of data.
For instance, simply taking the first 1000 rows of each table will not respect the relationships between data that exists across tables. Nor will it typically provide the data needed to execute every test in a suite.
To boost test coverage, testers are often required to manually create the complex data needed to fulfil their test cases. However, manual data creation is time-consuming and error-prone, often creating inconsistent or incorrect data that causes time-consuming test failures.
These outdated TDM practices hold both testers and test coverage back. They call for new, structured and efficient techniques for test data generation, maintenance, and management.
Five different techniques for boosting test data coverage are set out below.
Synthetic test data is artificially created data, that can be used for development and testing of applications, and is typically key for enhancing overall test coverage. A modern synthetic test data generation solution can create missing combinations of test data on-demand. This means testers no longer need to create data manually. Nor do they use potentially-sensitive and incomplete production data.
Testers can use synthetic test data to fill the gaps in data not found in existing production data, including negative scenarios and edge cases needed for rigorous testing. Synthetic data can be created algorithmically, using coverage analysis to find and fill gaps.
Though synthetic data creation is a powerful tool for driving higher test coverage, the latest World Quality Report found that only around half of test teams create and maintain synthetic data for testing [2].
Data analysis and comparisons give tests teams the ability to measure coverage and compare it across different environments, identifying gaps in data density and variety, before filling them with synthetic test data generation.
Automated data analysis has compared data across two environments, identifying missing values in each.
Using data coverage analysis tools can help automatically identify gaps in existing test data, ensuring that test data can fulfil every test scenario needed for rigorous test coverage. This might be performed, for example, by linking test cases to data, performing data lookups based on the tests.
Automated analysis today can therefore help identify the missing data needed to produce complete test data, before using data generation to improve test coverage.
With on-the-fly test data find and makes, parallel teams and frameworks can create data automatically as tests run Finds look for data based on the test case requirements, while makes use integrated test data generation. This makes missing combinations needed in testing, improving overall test coverage.
Integrating the automated find and makes with test automation frameworks and CI/CD pipelines lets tests self-provision the data they need on-the-fly, rapidly running the rigorous and targeted tests needed for optimal in-sprint coverage.
Techniques used today for finding data can be standardised and automated, rapidly building a catalogue of reusable data “finds”. Manual or automated tests can then parameterise and reuse these automated finds whenever they need data, with integrated data generation to create missing combinations on-the-fly:
On-the-fly “find and makes” ensure that every tester, developer and automated test comes equipped with the data they need.
Data cloning is another technique for boosting test coverage.
Data combination cloning creates multiple sets of a given combination, assigning unique identifiers to each clone. It duplicates data with the same characteristics, allowing parallel testers and tests to work without using up or editing one another’s data.
Data cloning ensures that all your tests can run in parallel and without failures, as it multiplies the data needed for test scenarios that require the same or similar data combinations. Cloning is particularly useful for automated testing that burns rapidly through data, as it ensures that new data is always readily available. This boosts in-sprint test coverage, as every test in a suite runs with the data it needs.
Test data subsetting, performed correctly, extracts compact, consistent, and intact data sets. “Covered” subsetting is further designed to retain coverage, reducing the volume of data copies while retaining data variety.
Extracting “covered” subsets provisions complete copies of data to multiple teams and frameworks. This avoids the delays caused by cross-team constraints, while reducing the cost of maintaining multiple data copies. Maintaining the variety and relationships of data further means that every test runs smoothly using consistent data, unlocking optimal coverage levels.
Using Enterprise Test Data, covered test data subsetting can be integrated with the different techniques set out in this article. Each technique is furthermore reusable on-the-fly, automatically allocating coverage-optimised data to parallel teams and frameworks:
Integrated test data technologies can be reused on-the-fly to ensure that every tester and test is equipped with the data they need.
The automated test data techniques outlined in this article enable organisations to create and allocate the data they need for every test scenario, boosting test coverage drastically. Furthermore, these techniques form part of an integrated and automated test data suite, Curiosity’s Enterprise Test Data.
Enterprise Test Data combines all the techniques covered in this article and more, enabling parallel teams and frameworks to stream the data they need, when and where they need it. Rather than a blocker to speed and test coverage, test data instead becomes available on demand, at all times, across the whole SDLC.
This blog is part 3/4 in a series focusing on test data modernization. Check out the other 3 parts below:
[1] Redgate (2021), The 2021 State of Database DevOps Report. Retrieved from https://www.red-gate.com/solutions/database-devops/report-2021
[2] Capgemini, Sogeti (2021), World Quality Report 2021-22. Retrieved from https://www.capgemini.com/gb-en/research/world-quality-report-wqr-2021-22/