Discover Curiosity's AI-powered platform for redefining outer loop software delivery and productivity

Learn more
Book a meeting

5 Ways to Keep Your Test Data Compliant

Mantas Dvareckas
|

6 mins read

Table of contents

As a result of the constantly evolving environment of global data protection legislation, test data management has become increasingly complex. Organisations must have adequate data protection and use plans in place, not only to protect themselves, but also their customers.

Yet, despite the compliance risks, the majority of organisations are continuing to use full-size copies of sensitive production data in less secure database development and testing environments [1].

This blog will explore common reasons why organisations are not meeting data compliance and legislative requirements, before offering 5 solutions for keeping test data compliant. These solutions have been chosen to help you consider a new and transformative approach to test data.

This blog is part 4/4 in a series focusing on test data modernization. Check out the first three parts below:

  1. Five Test Data Challenges That Every CTO Should Know About.
  2. 5 Techniques for Overcoming Test Data Bottlenecks.
  3. 5 Solutions to Test Data Coverage Issues.

Test Data Compliance Challenges Today

Evolving Data Protection Requirements

Failing to comply with data protection legislation can be a potentially devastating oversight, as fines under EU GDPR can exceed €20 million. However, data compliance requirements extend beyond the EU, with legislation including Canada’s CCPA, India’s PDPB, The California Consumer Act and Brazil’s LGPD. Global organisations today must consider a range of legislation when managing their data.

Given that testers at 45% of organisations do not always comply with security and privacy regulations for test data [2], the distribution of sensitive data to non-production environments should be a chief concern. Non-production environments are typically also less secure than production. Spreading sensitive information across non-production can in turn add to the risk of data breaches, along with the associated risks of legislative fines, customer churn, and reputational damage.

Using sensitive data in non-production environments further poses particular logistical challenges for legislative compliance. For example, organisations today might need to locate, copy and delete every copy of a person’s data “without delay”. They might furthermore need to demonstrate that they know exactly how sensitive data is being used, that the processing has a legal basis, and that data is not being used by more people than necessary or for longer than necessary.

This level of control and oversight typically exceeds the capabilities of non-production infrastructure today.

The Need for Data in Parallel

Data leaks and breaches are often internal and non-malicious, occurring due to the accidental mismanagement of sensitive data. The safest way to minimize this risk is to reduce the spread of sensitive information. Yet, today, test data is typically needed by more teams and frameworks, in more environments than ever before.

The parallelisation of teams has become common across software delivery, creating more environments requiring data. This risks exacerbating the risks associated with using production data in testing and development, as organisations might be tempted to spread production data across even more non-production environments. This in turn increases the risk of exposing data that is both commercially and personally sensitive.

Furthermore, teams often share environments. Potentially sensitive data might then be edited, moved or deleted without sufficient tracking. This poses a compliance risk, as it can undermine an organisation’s ability to comply with legislation like the Right to Erasure or Data Portability.

Download our free Test Data-as-a-Service Solution Brief to learn how Test Data Automation can help you transform the relationship that your teams and frameworks share with data.

Download the solution brief

Solutions for Compliant Test Data

One of the biggest and most significant consequences of a data breach is its impact on customer trust and loyalty. Therefore, when an organisation takes the correct steps to data compliance, they can boost trust and credibility with their customers.

The five solutions outlined below showcase how legal and compliance leaders can build a software delivery pipeline consisting of responsible data use, reducing the risk of data breaches and supporting legislative compliance.

Rapid and Reliable Data Masking

Data masking offers a way to mitigate many compliance requirements when testing, as well as against the risk of a data breach.

Alongside data masking, data profiling, or the ability to rigorously scan and highlight potentially sensitive data for masking, is key. This gives testers and DBAs the ability to identify sensitive information across databases and files, before masking them to reduce compliance risks.

To support rapid data allocation, data profiling should feed central data dictionaries and a catalogue of associated masking rules, creating masking jobs that can be run on demand. The maintenance of audit logs for this masking can be useful for demonstrating compliance if called upon.

Check out our demo of data masking using Test Data Automation:

 

Synthetic Test Data Generation

Effective data masking identifies sensitive information across databases and files, anonymising it before the data is moved to non-production environments. However, a more beneficial solution to test data compliance issues involves synthetic test data generation.

Taking a hybrid approach and replacing masked data with synthetic data over time not only supports test data compliance; it can also increase the rigour of testing. A hybrid approach allows organisations to move away from masked production data over time, while filling gaps in test data coverage by generating new data from scratch.

Synthetic test data is artificially created data, that can be used for development and testing of applications. A modern synthetic test data generation solution can create missing combinations of test data on demand and remove risky live data from test environments.

Automating data generation means that testers no longer need to create data manually, and nor do they need to use potentially-sensitive and incomplete production data. Synthetic test data can also help your organisation take a data protection by design approach to development. In fact, the EU is advocating for its use in testing and quality assurance.

Synthetic data generation offers an opportunity to turn data compliance requirements into more rigorous testing, filling gaps in test data that production data may not, including negative scenarios, outliers and edge cases.

Containerised Test Databases

Copies of sensitive production data are often today shared across parallel teams and environments, potentially increasing the exposure of data to breaches and undermining sufficient tracking.

In addition to masking and generating data, organisations can further implement database virtualisation and containerisation to match the demand for data from parallel teams and frameworks.

Virtualising databases allows testers, frameworks, and CI/CD pipelines to spin up the databases they need in seconds, at a fraction of the cost when compared to copying physical data. Rather than sharing copies of sensitive production data, testers can spin up traceable, masked and synthetic databases on demand.

While testers and developers can access the data they need on demand, organisations can retain full control. A supervisory service and control plane manages the databases clones, including access controls and audit logs to support legislative compliance.

Test Data Containerisation using Curiosity’s Test Data Automation and Windocks.

Test Data Containerisation using Curiosity’s Test Data Automation and Windocks.

Rapid and Coherent Data Subsetting

“Data minimisation” and “Purpose Limitation” are key tenets of legislation like the EU GDPR. In a nutshell, they dictate that only as much data as needed should be used to fulfil tightly-defined, and legally justified, purposes. This includes minimising the volumes of data that are used, but also sharing data with only as many people as needed to fulfil a legitimate purpose.

For organisations using (masked) production data in non-production environments, data subsetting offers a tool for data minimisation. Effectively subsetting a database produces smaller, compact data sets, that share all the contents and characteristics of the source data.

Creating “covered” or “scenario” subsets that are tailored to fulfil particular testing and development needs can in turn support data minimisation, as it enables organisations to provision only as much data as needed to test and develop against.

Subsetting production data prior to masking can furthermore reduce data provisioning times, as less data requires anonymisation. In addition to the benefits for compliance, subsetting can furthermore support greater parallelisation and overall agility in testing and development, creating compact data sets to support parallel teams and frameworks.

Test Data Allocation - Automatic Data Find and Makes

Test data allocation provides another technique for ensuring that only as much data as needed is used in testing, and no more. Allocation matches data combinations from across sources to tests, ensuring that every test is equipped with the data it needs.

Additionally, data within a database, file, API or message can then be locked for use by an individual test, or can be set to have “read” privileges if the test does not need to edit it. Read privileges allow other tests to access the data without editing it, maintaining clear oversight of how data is being used.

At the same time, parallel tests still enjoy all the data they need, by virtue of integrated test data “find and makes”. With on-the-fly test data find and makes, parallel teams and frameworks can create synthetic data automatically as tests run. Finds look for data based on the test case requirements, while makes use integrated test data generation to create any data that cannot be found:

On-The-Fly data “find and makes” using Curiosity’s Test Data Automation.

On-The-Fly data “find and makes” using Curiosity’s Test Data Automation.

Compliance and Quality at the Speed of Delivery

The solutions discussed in this article integrate to reduce data compliance risks, while simultaneously boosting testing quality and accelerating test data provisioning. Best of all, these solutions are all part of one complete toolset, Curiosity’s Test Data Automation!

Test Data Automation enables parallel teams and frameworks to stream compliant test data on demand. Rather than a costly data compliance risk, test data becomes a tool for faster development, enabling organisations to create and allocate the data they need on demand.

To learn more about Test Data Automation and how it can help overcome your test data compliance challenges, download our free Test Data-as-a-Service Solution Brief!

Download the solution brief

Footnotes:

[1] Redgate (2021), The 2021 State of Database DevOps Report. Retrieved from https://www.red-gate.com/solutions/database-devops/report-2021  

[2] Capgemini, Sogeti (2021), World Quality Report 2021-22. Retrieved from https://www.capgemini.com/gb-en/research/world-quality-report-wqr-2021-22/