Apache Spark is a general engine for processing data on a large scale. Employing this tool in a distributed environment to process large data sets is undeniably beneficial.
But what about fast feedback loop while developing such application with Apache Spark? Testing it on a cluster is essential, but it does not seem to be what most developers accustomed to TDD workflow would like to do.
In the talk, ŁLLukasz will share with you some tips on how to write the unit and integration tests, and how Docker can be applied to test Spark application on a local machine.
Examples will be presented within the ScalaTest framework, and it should be easy to grasp by people who know Scala and other JVM languages.