Test with Real Data and Avoid Mock Data Pitfalls!
Have you ever tested your system with scenario-based, manually generated, mock data and found out that they don’t work in production? If so, you are not alone. Many agile teams face this problem because mock data is often different from real data in subtle ways. Most people make the assumption that logic in code is independent of real data. In this article, I will show you how to test with real data and avoid mock data pitfalls.
Say you have a rule-based engine to bucket orders for prioritization, based on product types and demographics. You think about all possible scenarios and create mock data to test positive, negative, exceptions and rare scenarios. You run the tests, all test cases pass, and you label the rules as QC PASSED. The rules go live, but in production no rules are hitting!
What went wrong? You investigate and find that the production data is ever so slightly different than your mock data. One field used in the rule may have trailing spaces and upper-case strings, whereas mock data had lower-case strings with no whitespace! Ah! no problem with the testing method you say, just wrong mock data. It can be even worse if only some rules are not working silently, and you don’t find out until it is too late.
This is a classic scenario which explains why testing with manually created mock data fails consistently. But is there a better way? Yes, use real data wherever possible. It is easy if you know the methods for the below steps.
Testing with real data:
Recommended by LinkedIn
Why mocking from real data performs better?
Some may object to testing with real data because of the perceived cost of copying production data into test environment and other issues such as storage, network, data security, privacy, etc. However, these issues can be easily mitigated with simple and effective methods. Moreover, the cost of poor test quality far exceeds the cost of testing with real data. Testing with real data ensures that your system works as expected in production and prevents costly errors and rework.
But what if you don’t have real data yet? Start with manual mock data, go live, constantly analyze production scenarios for some time while you collect real data, then update your mock test data with real data using above method. There are very few real limitations which you can easily mitigate, and I shall discuss those in another article.
In conclusion, testing with real data is a better way to ensure the quality of your system than testing with mock data. Have you tried testing with real data? What issues have you encountered with manually generated test data?
Software Engineer: Tech Mentor, Innovator, Technology Migration
1yRoshan Shetty, Puneet Dubey did you enjoy this method? You can share your experience in comments here 😊