The Test Data Management methodology lays a foundation for identifying and generating appropriate datasets for software application testing. Test Data Management (TDM) is critical for Quality Assurance teams to ensure high data quality and expected application performance in production environments. The better a test environment reflects a production environment, the more confident you can be in every release. Test data can either be generated, or obfuscated from the source system. This article walks through common considerations in a test data management implementation.
- Test Data Shape and Size
As any good tester knows, a four hundred test case test plan is not always better than a fifty case test plan. Each test case brings a cost and a diminishing benefit. The same is true of datasets. A five million row data set might not bring more testing value than ten thousand rows.
It is important to identify your testing goals and curate your dataset to meet those goals. Think critically about what characteristics of production data are important to your testing.
If you do not need a data element for your testing, the cost of properly obfuscating it may not be worth it. Some data elements may not be included at all in testing due to legal restrictions or data use agreements. Why take the risk if there is no benefit to your testing? Take what you need and no more. Bonus: it helps save space in your testing environments.
On the other hand, if you are testing performance, the bigger the dataset, the better the results. Testers should determine the correct shape and size of their dataset to achieve their testing goals.
- Test Data Sources
As you develop your testing strategy, it is important to define your test data source(s). Test data can be obfuscated from the production environment or generated by an outside tool. It is important to identify your testing goals and curate your dataset to meet your needs. As a part of this process, review design assumptions. It would be disheartening to spend time crafting a dataset that does not meet the requirements of your project. Measure twice, cut once.
Let’s look at two different testing examples.
If you are testing data element validations, knowing what values exist in production is crucial to a successful test plan. It is appropriate to pull obfuscated data from the production system to meet this testing goal.
However, if your production environment is a tenth the size you expect it to be in six months, duplicating or generating data that reflects the production data would be more meaningful to meet performance testing needs.
Consider leveraging tools (like Informatica Test Data Management) to generate test data, with options like random, goal-oriented, and pathwise to speed up the process.
- Obfuscation and Data masking
Certain types of data must be kept in compliance with applicable regulations. Testers can achieve this through Data Masking, the process of creating a “structurally similar but inauthentic version of an organization's data that can be used for purposes such as software testing and user training. The purpose is to protect the actual data while having a functional substitute for occasions when the real data is not required.” (Source: TechTarget.com) Data masking is most commonly performed on the following types of data:
- Personally Identifiable Information (PII)
- Social Security numbers
- Credit Card numbers
- Data Integrity of your Test Data
As part of the testing process, test data may need to be altered. It is important to back up the initial data set and version copies as you test. That way there is always a source of truth and testers don’t need to go back to the well of production data when they need a new version of the test data set.
The testing needs of your organization will influence your Test Data Management process. Don’t worry about boiling the ocean. Focus on common, recurring test data needs and design a repeatable process around those needs. These considerations will help you understand your organization’s recurring test data needs.