Linear Regression Performance Metrics
In the world of analytics and data science, we are constantly using Linear Regression to understand the relationship between two variables. Linear Regression is used by most analysts to uncover stories hidden in the data and we use metrics to understand what Linear Regression is telling us.
This article is a continuation of Linear Regression in Python, in this article, we will cover metrics that help us in assessing linear regression. We will discuss Mean Squared Error (MSE), R-squared (R²), and Mean Absolute Error (MAE) for starters.
You can use Python to calculate them and I will show you code snippets that might help in calculating these metrics. Instead of using Python libraries such as sklearn, I will calculate the metrics from scratch so you can understand the process.
Here is the data we are going to use-
# data example
actual_sales = [200, 300, 350, 400]
predicted_sales = [207, 290, 360, 390]
# MSE = (1/n) * Σ(actual - predicted)^2
mse = sum((actual - predicted)**2 for actual, predicted in zip(actual_sales, predicted_sales)) / len(actual_sales)
print("Mean Squared Error (MSE):",mse)
While interpreting MSE, strive for a lower value to ensure that your model is making more accurate predictions.
Recommended by LinkedIn
# R² = 1 - Σ(actual - predicted)^2 / Σ(actual - mean_actual)^2
mean_actual = sum(actual_sales) / len(actual_sales)
ss_total = sum((actual - mean_actual)**2 for actual in actual_sales)
ss_residual = sum((actual - predicted)**2 for actual, predicted in zip(actual_sales, predicted_sales))
r_squared = 1 - (ss_residual / ss_total)
print("R-squared (R²):", r_squared)
Since R² is an explanation of how well Sales is explained by the independent variables. The value of 0.98 indicates a very high goodness of fit for our linear regression model. A high R² is a good sign. (We'll learn about situations where a high R² could pose challenges, but for beginners, it's a useful metric.)
# MAE = (1/n) * Σ|actual - predicted| --- (absolute)
mae = sum(abs(actual - predicted) for actual, predicted in zip(actual_sales, predicted_sales)) / len(actual_sales)
print("Mean Absolute Error (MAE):", mae)
A MAE of 9.25 suggests that, on average, the absolute difference between the actual and predicted sales values is 9.25 units. Lowe MAE is a good indicator and so for the Sales example, our linear regression fits well.
Putting it All Metrics Together:
These metrics provide a solid foundation for evaluating and understanding Linear Regression models, enabling analysts to derive meaningful stories from their data. I use them every day while doing my homework for Machine Learning or Statistics Classes or if I am solving Kaggle datasets for my GitHub. Here is a table to wrap up the article so you can refer to it while understanding metrics.
If you would like to follow more coding projects, you can take a peek at my projects on GitHub.
Chief of Staff, California State University, Sacramento
1yYou continue to shine! Always so proud of you, Maanvee!
Pip Install FiftyOne | Enterprise Solutions @Voxel51 | Computer Vision & Data Centric ML
1ySuper cool Maanvee Mehrotra!