“We mustn’t use live data for testing”. This is the reason why most organizations start to look at superficial solutions to certain challenges that are ingrained in their DNA. For years, this aversion has driven the way that organizations have changed their “best” practices, struggling to wean themselves off deep-set habits.
These organizations often start with low hanging fruit and create a capability to replace live data with either masked/obfuscated data or synthetic alternatives. They then believe that’s “job done”! It isn’t. It doesn’t tackle or even reduce many of the core challenges associated with using production in test, let alone the systemic problems that led the organization to test using production data in the first place.
Most teams do not take this narrow course of action by choice; using production data is typically born out of necessity. The live data being copied from the production system is considered the only way to understand the profile of our customers, the business rules we’ve lost control of, and the erroneous data that we know we’ve got in production but never had the time to remediate.
At the same time, the organization demands that (IT) change happens now. “Our customers demand feature x yesterday”, “we can’t get left behind by our competitors” – these are all very valid concerns.
Yet, when does the organization stop to take stock and address technical debt, assessing whether the merry-go-round of the way they work and make change is compounding the risk of something going wrong?
In the case of data, that might mean data loss, incorrect data processing, or failed data/systems migrations. There have been a number of widely reported instances in the past few years of such problems occurring.
Should we really be surprised, if our system predates the EU GDPR being introduced, that it might not be compliant with that particular piece regulation? Or, at the very least, that we’d need to do some work around demonstrating that it is?
In terms of GDPR compliance, ask yourself: Do you have an up-to-date and maintained data model? And do you track all of the data flows in your organization? Did you further go back and retrospectively design and change your systems to ensure you could demonstrate “Data security by design”?
More specifically, do you have a data dictionary or similar in your organization that both maps all of the data in your organization and classifies it against PII, PSI, or even PCI if you are going broader than GDPR?
All of these requirements are a massively tall order for change teams to grasp, let alone resolve, on top of the business demands of getting the next feature into the wild.
Data gets into every aspect of an organization and its working practices. When that data is live data, the challenges of control and subsequently removing that data must be seriously considered.
The “bow tie” diagram below is a technique used in the work of risk to visualize and articulate complex and causal control weaknesses that probably exist in your organization.
This diagram indicates the types of prior causes and knock-on risks that are commonly associated with using live data. The final “cause” before the devastating non-compliance event lies with the decision to allow the use of live data in non-production environments.
Though often decided on a project-by-project basis, such decisions are typically thematic. They reflect a pattern of behavior in which management allow actions they perceive as inconsequential, creating a “slippery slope” in which risks pile up over time
It then only takes once incremental risk to fail, causing a catastrophic event. A decision, in one project in one part of the organization, might have allowed the use of live data to expedite a project delivery. This decision was made to hit a date that wasn’t really needed, while access controls were compromised as people moved transiently around projects.
Figure 1 – A bowtie diagram showing the systemic causes leading to a catastrophic “Event” of a data breach.
With loose access to various systems and a weak culture of retaining data management experts, ways of working and data movement become an organic ball of mud. This ball just gets bigger as time goes by. It will only stop if you recognize these sliding door events and take steps to address them, stopping them from overlapping.
So what can you do? The first steps would be to recognize the problem, perhaps use the bowtie above to recognize these event happening in your organization. Seeing the potential impacts of risky decisions helps to crystallize their significance to leaders, especially if they are the material risk taker within your organization. This might include the Data Protection Officer (DPO), a role required by the GPDR for many organizations.
Next, understand the context of the specific challenge in your organization and why it occurs. This will allow you to get to the route of the problem. Remember that a lot of the time, where a team uses live data, the live data itself is a symptom of the root problem.
Just as the individual problems converged over time to create your big ball of mud, fixing the problem must start by diverging the problems. You need to separate the challenges and work on solutions. Isolation is key and should be echoed through architecture best practices (like loose coupling) and controls (like RBAC).
When it comes to data and data flows, much of decoupling the problems comes down to analysis. Simply put, you have known unknowns, and need to perform analysis to grow aware of them. This involves becoming aware of the data within a system, its sensitivity classification, and whether it should be there. You must also know which systems that data will flow too, and where that data is incorrect.
This is not a problem that can be resolved in a day, and there is no AI silver bullet. It’s probably taken you years to build up this spiraling debt, and, just like anyone struggling with debt, you need to put a repayment plan in place.
Technology can help. You almost certainly have numerous untapped sources of information where a version of the truth exists. This might include production, out-of-date documentation and the numerous different understandings in different people’s heads. Data analysis tools, ML capabilities and modelling can start to build a picture of those disparate versions of the truth.
As these pictures mature and grow, you converge on a single version of that truth, picturing how the system should actually work. You’ll be surprised that production doesn’t work how you think it does.
At this point, you have a living specification, or a Master Data Management system, for the organization. You now have a central control focus for much of the subject of data within your organization, even when you have federated and probably siloed teams. You are finally taking data seriously.
Regulation by this point isn’t a hurdle; it’s an accelerator. The structured, considered approach is accelerating our ability to deliver – go slower to go faster really is true!
Figure 2 – A dedicated data capability for servicing federated development teams.
The fundamental of agile delivery (and DevOps) is small, iterative delivery. Facing into this problem with data is in turn key if your organization is serious about going further in implementing agile. Otherwise, you are probably doing agile in a silo and then inevitably bumping into the big ball of mud that the rest of the organization is struggling with.
An interesting evolutionary journey is culturally facing into these challenges and using them as opportunities to turn your team/organization into one that is efficient and effective at learning. Data with context has become knowledge that proliferates across team boundaries – much more than a means to scrubbing sensitive information from non-production.
Figure 3 – Data with context becomes knowledge and reduces the chaos across an organisation.
Learn more about the relationship between data, compliance and understanding in Rich's Test Data at the Enterprise video series.