What are the consequences of bad data? How do you solve data quality issues? These were the challenges when I joined a team trying to build a new reporting platform. Here's what I learned on the way
Why are there so many bugs with our analytics product!? This was the question heard far too often in my workplace. It even led to some of our customers losing faith in the entire product as a whole. As an outsider to the team, I assumed it was due to software quality issues. It wasn’t really my problem though... until it became my problem.
I decided to join the analytics team to help test a new reporting platform and now I had to get myself up to speed on how to test big data and quick! I soon learned the problem wasn’t anything to do with how the analytics team was building software. It was the quality of the data generated throughout the entire system. The reports were just shining a spotlight on these data quality issues. Garbage in, garbage out! Now the question was how do we detect, correct and prevent these data-related issues. The team was building a new reporting platform which in itself was a challenge, but it also presented a great opportunity. We had the chance to build a testable, observable system with data quality in mind from the outset.
In this talk I’ll go through the strategy we developed for testing big data at the start of the project. How we implemented tests for data quality, accuracy and availability throughout the system. How we also ensured the system was performant and robust when dealing with large volumes of data. Finally, I’ll highlight the importance of getting talented developers and software architects onboard when you want to create a testable system.