From the course: Spark for Machine Learning & AI

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Tips for using Spark MLlib

Tips for using Spark MLlib

- [Instructor] Let's review some tips for working with Spark MLlib. There are three basic stages of building machine learning models. There's a pre-processing phase where we collect, reformat, and transform the data. And once we have that data, we can build our models using a variety of machine learning algorithms. And then we want to make sure we evaluate our data to assess the quality of the models we built. With that framework in mind, let's look at some tips to make each of these stages go smoothly. First, when we're pre-processing, we want to first load our data into DataFrames. If you're working with text files, it helps to have headers or column names in the text file. And when you read a file, make sure you use the inferSchema=True option. That'll make sure that things like dates and numeric values get mapped to their appropriate data type. Use the VectorAssembler to create feature vectors and the StringIndexer to map from strings to numeric indexes. During the model building…

Contents