From the course: Spark for Machine Learning & AI

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Linear regression

Linear regression

- [Instructor] Now that we have some data to work with, let's look at linear regression. So, I'm just going to verify I'm in the right directory. Great, now I'm going to start pyspar. Okay, first thing I want to do is import some code for linear regression. So I'll use my from pyspark.ml command. And I'll import linear regression. Now, in our last video, we downloaded and preprocessed a file called power plant.csv. I'm going to read that into a data frame and I'll call it pp for power plant data frame and I'll reference the spark context and I'll read the CSV file and that file is in my home directory. And it's called power_plant.csv. And let's just take a look at the structure. I forgot to indicate that there was a header in that file. As you may recall, the column names have a header in them. So I'm going to re-execute this read file and I'm going to say header=True. Another thing you'll notice is that common data types are all string. That's because I forgot to indicate that I…

Contents