Derive test data from production data while respecting GDPR

30-minute Talk

Good test data is the very foundation of good testing. The best test data is production data, which is typically not legal. How to derive test data from production data in a legal and secure way?

Timetable

10:25 a.m. – 10:55 a.m. Thursday 7th

Room

Room F3 - Track 3: Talks

Audience

General Interest - An introduction to the topic

Key-Learning

  • Inspiration on how to improve test data quality.
  • You existing solutions for generating test data from production data is most likely not GDPR compliant.
  • What it takes to generated high-quality test data from production data in a compliant way.
  • Become prepared for discussing with IT Security Department & GDPR Data Protection Officer

Good test data is the very foundation of good testing. But good test data is hard to get. If you create it manually or build a script or program to generate test data, the test data will probably reflect your understanding of- and expectations to production data rather than the actual properties of the production data. For that reason, it is unfortunately not uncommon to use production data or data trivially derived from production data for testing.

Using production data for testing has problems of its own. GDPR (the new EU privacy lay) applies to such data. It obviously applies when using production data directly. But surprising to many, GDPR also applies in almost all situations when test data is based on scrambled or anonymized production data.

Overall content of the talk:

- The importance of good, representative and “fresh” test data and the importance of fast, cheap and low-friction access to the test data.

- Metrics for test data (how to measure test data quality).

- Which are the compliance and security challenges (GDPR, Segregation of Duties, data loss prevention, corporate policies, etc.).

- A helicopter view of the most relevant articles of GDPR.

- A helicopter view of the techniques that can be used to protect data, such as anonymization, pseudonymization, synthetic data, tokenization, and format-preserving encryption.

- Strategies for generating test data while respecting privacy and security.

- How to ensure GDPR compliance.

- What to do and where to start.

Martin will also make sure to address some of the most prominent and serious misconceptions, such as that many believe that data can easily be anonymized (and thus get out of GDPR scope) and that hash function can ensure privacy.

Without good test data, your test is not representative to the real-life production situation.

Related Sessions

11:10 a.m. – 11:40 a.m.
Room F2 - Track 2: Talks

30-minute Talk

2:25 p.m. – 2:55 p.m.
Creative Space Room - Track 9: Bonus Sessions

30-minute Talk

2:25 p.m. – 5:25 p.m.
Room D5+D6 - Track 6: Accessibility Deep Dive

150-min Workshop

2:25 p.m. – 2:55 p.m.
Room F1 - Track 1: Talks

30-min New Voice Talk