Linear Regression Performance Metrics
Credits: LinkedIn AI Image Generator

Linear Regression Performance Metrics

In the world of analytics and data science, we are constantly using Linear Regression to understand the relationship between two variables. Linear Regression is used by most analysts to uncover stories hidden in the data and we use metrics to understand what Linear Regression is telling us.

This article is a continuation of Linear Regression in Python, in this article, we will cover metrics that help us in assessing linear regression. We will discuss Mean Squared Error (MSE), R-squared (R²), and Mean Absolute Error (MAE) for starters.

You can use Python to calculate them and I will show you code snippets that might help in calculating these metrics. Instead of using Python libraries such as sklearn, I will calculate the metrics from scratch so you can understand the process.

Here is the data we are going to use-

# data example 
actual_sales = [200, 300, 350, 400]
predicted_sales = [207, 290, 360, 390]        

  • Mean Squared Error (MSE): MSE measures the average squared difference between actual and predicted values. A lower MSE indicates a better fit. It's calculated by summing the squared differences and dividing by the number of observations.

# MSE = (1/n) * Σ(actual - predicted)^2

mse = sum((actual - predicted)**2 for actual, predicted in zip(actual_sales, predicted_sales)) / len(actual_sales)

print("Mean Squared Error (MSE):",mse)        
Article content
VS Code Result: MSE

While interpreting MSE, strive for a lower value to ensure that your model is making more accurate predictions.

  • R-squared (R²): R-squared is the proportion of the variance in the dependent variable that is predictable from the independent variable. A higher R-squared indicates a better fit. It ranges from 0 to 1, with 1 signifying a perfect fit.

# R² = 1 - Σ(actual - predicted)^2 / Σ(actual - mean_actual)^2

mean_actual = sum(actual_sales) / len(actual_sales)
ss_total = sum((actual - mean_actual)**2 for actual in actual_sales)
ss_residual = sum((actual - predicted)**2 for actual, predicted in zip(actual_sales, predicted_sales))
r_squared = 1 - (ss_residual / ss_total)

print("R-squared (R²):", r_squared)        
Article content
VS Code Result: R²

Since R² is an explanation of how well Sales is explained by the independent variables. The value of 0.98 indicates a very high goodness of fit for our linear regression model. A high R² is a good sign. (We'll learn about situations where a high R² could pose challenges, but for beginners, it's a useful metric.)

  • Mean Absolute Error (MAE): MAE is the average absolute difference between actual and predicted values. Like MSE, a lower MAE suggests a better model fit.

# MAE = (1/n) * Σ|actual - predicted| --- (absolute) 

mae = sum(abs(actual - predicted) for actual, predicted in zip(actual_sales, predicted_sales)) / len(actual_sales)

print("Mean Absolute Error (MAE):", mae)        
Article content
VS Code Result: MAE

A MAE of 9.25 suggests that, on average, the absolute difference between the actual and predicted sales values is 9.25 units. Lowe MAE is a good indicator and so for the Sales example, our linear regression fits well.

Putting it All Metrics Together:

These metrics provide a solid foundation for evaluating and understanding Linear Regression models, enabling analysts to derive meaningful stories from their data. I use them every day while doing my homework for Machine Learning or Statistics Classes or if I am solving Kaggle datasets for my GitHub. Here is a table to wrap up the article so you can refer to it while understanding metrics.

Article content
Table Created Using Microsoft Word

If you would like to follow more coding projects, you can take a peek at my projects on GitHub.

Kristen Tudor, Ed.D.

Chief of Staff, California State University, Sacramento

1y

You continue to shine! Always so proud of you, Maanvee!

Like
Reply
Azriel (A.Z) Nicdao

Pip Install FiftyOne | Enterprise Solutions @Voxel51 | Computer Vision & Data Centric ML

1y

Super cool Maanvee Mehrotra!

To view or add a comment, sign in

More articles by Maanvee Mehrotra, MSBA

  • Understanding Nonlinear Regression Estimation

    Introduction: In data science and statistical modeling, linear regression often takes the spotlight for its simplicity…

  • Demystifying the Kalman Filter

    A Beginner's Guide Ever heard of the Kalman Filter? It's a powerful tool in the world of numbers and predictions. Let's…

  • Interpreting Parameters and Scalability

    In our previous discussions, we delved into the mechanics of running linear regression and utilizing it for prediction.…

  • Linear Regression in Python!

    Back in August 2023, I started graduate school and weeks into my program I learned the power of linear regression…

    2 Comments

Insights from the community

Others also viewed

Explore topics