Good test data is the very foundation of good testing. The best test data is production data, which is typically not legal. How to derive test data from production data in a legal and secure way?
Good test data is the very foundation of good testing. But good test data is hard to get. If you create it manually or build a script or program to generate test data, the test data will probably reflect your understanding of- and expectations to production data rather than the actual properties of the production data. For that reason, it is unfortunately not uncommon to use production data or data trivially derived from production data for testing.
Using production data for testing has problems of its own. GDPR (the new EU privacy lay) applies to such data. It obviously applies when using production data directly. But surprising to many, GDPR also applies in almost all situations when test data is based on scrambled or anonymized production data.
Overall content of the talk:
- The importance of good, representative and “fresh” test data and the importance of fast, cheap and low-friction access to the test data.
- Metrics for test data (how to measure test data quality).
- Which are the compliance and security challenges (GDPR, Segregation of Duties, data loss prevention, corporate policies, etc.).
- A helicopter view of the most relevant articles of GDPR.
- A helicopter view of the techniques that can be used to protect data, such as anonymization, pseudonymization, synthetic data, tokenization, and format-preserving encryption.
- Strategies for generating test data while respecting privacy and security.
- How to ensure GDPR compliance.
- What to do and where to start.
Martin will also make sure to address some of the most prominent and serious misconceptions, such as that many believe that data can easily be anonymized (and thus get out of GDPR scope) and that hash function can ensure privacy.
Without good test data, your test is not representative to the real-life production situation.