From the course: Spark for Machine Learning & AI
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Tips for using Spark MLlib - Apache Spark Tutorial
From the course: Spark for Machine Learning & AI
Tips for using Spark MLlib
- [Instructor] Let's review some tips for working with Spark MLlib. There are three basic stages of building machine learning models. There's a pre-processing phase where we collect, reformat, and transform the data. And once we have that data, we can build our models using a variety of machine learning algorithms. And then we want to make sure we evaluate our data to assess the quality of the models we built. With that framework in mind, let's look at some tips to make each of these stages go smoothly. First, when we're pre-processing, we want to first load our data into DataFrames. If you're working with text files, it helps to have headers or column names in the text file. And when you read a file, make sure you use the inferSchema=True option. That'll make sure that things like dates and numeric values get mapped to their appropriate data type. Use the VectorAssembler to create feature vectors and the StringIndexer to map from strings to numeric indexes. During the model building…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.