Building a Simple Regression Model

Building a Simple Regression Model

Regression analysis is one of the most fundamental techniques in machine learning and statistics. It is used to predict a continuous outcome variable based on one or more predictor variables. In this blog, we’ll walk through the process of building a simple linear regression model using Python. By the end, you’ll have a clear understanding of how to implement and interpret a regression model.


What is Simple Linear Regression?

Simple linear regression is a statistical method that models the relationship between a dependent variable (target) and a single independent variable (predictor). The goal is to find the best-fitting straight line that describes the relationship between the two variables. The equation of the line is:

y=mx+by=mx+b

Where:

  • yy = Dependent variable (target)
  • xx = Independent variable (predictor)
  • mm = Slope of the line (coefficient)
  • bb = Y-intercept


Steps to Build a Simple Regression Model

1. Import Required Libraries

We’ll use Python libraries like pandas, numpy, matplotlib, and scikit-learn for data manipulation, visualization, and modeling.

python

Copy

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score        

2. Load and Explore the Dataset

For this example, let’s use a simple dataset like the Boston Housing Dataset (available in scikit-learn) or a custom dataset.

python

Copy

# Load dataset
from sklearn.datasets import load_boston
boston = load_boston()
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['PRICE'] = boston.target

# Display the first few rows
print(data.head())

# Basic statistics
print(data.describe())        

3. Select Features and Target

For simple linear regression, we’ll use one feature (independent variable) to predict the target (dependent variable). Let’s use RM (average number of rooms per dwelling) as the predictor and PRICE as the target.

python

Copy

# Select feature and target
X = data[['RM']]  # Independent variable
y = data['PRICE']  # Dependent variable        

4. Visualize the Data

Before building the model, it’s helpful to visualize the relationship between the feature and the target.

python

Copy

# Scatter plot
plt.scatter(X, y, color='blue')
plt.title('Room Count vs House Price')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('House Price (PRICE)')
plt.show()        

5. Split the Data into Training and Testing Sets

We’ll split the data into a training set (to train the model) and a testing set (to evaluate the model).

python

Copy

# Split the data (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)        

6. Train the Regression Model

Now, we’ll create and train a simple linear regression model using the training data.

python

Copy

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)        

7. Make Predictions

Use the trained model to make predictions on the test data.

python

Copy

# Predict on the test set
y_pred = model.predict(X_test)        

8. Evaluate the Model

Evaluate the model’s performance using metrics like Mean Squared Error (MSE) and R-squared (R²).

python

Copy

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")        

9. Visualize the Regression Line

Plot the regression line to see how well it fits the data.

python

Copy

# Plot the regression line
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted Prices')
plt.title('Room Count vs House Price (Test Set)')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('House Price (PRICE)')
plt.legend()
plt.show()        

10. Interpret the Results

  • Slope (Coefficient): Indicates the change in the target variable for a one-unit change in the predictor. For example, if the slope is 10, it means that for every additional room, the house price increases by $10,000.
  • Intercept: The value of the target variable when the predictor is zero.
  • R-squared: Represents the proportion of variance in the target variable that is explained by the predictor. A value closer to 1 indicates a better fit.


Full Code Example

python

Copy

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load dataset
from sklearn.datasets import load_boston
boston = load_boston()
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['PRICE'] = boston.target

# Select feature and target
X = data[['RM']]
y = data['PRICE']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

# Visualize the regression line
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted Prices')
plt.title('Room Count vs House Price (Test Set)')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('House Price (PRICE)')
plt.legend()
plt.show()        

Conclusion

Building a simple linear regression model is a great way to understand the basics of predictive modeling. By following these steps, you can create, train, and evaluate a regression model using Python. As you progress, you can explore more advanced techniques like multiple linear regression, polynomial regression, and regularization.

Happy modeling! 🚀

To view or add a comment, sign in

More articles by OUMA BECKON

Insights from the community

Others also viewed

Explore topics