Precision, Recall, F1: The Pillars of Performance in Classification
Precision, Recall, and F1 are some of the important topics in machine learning and also one of the most confusing topics for me every time I tried to learn it.
So, in one of my interviews, the interviewer asked what is precision and recall and explained it via an example. So my stupid ass started blah blahing about what I rote learned from different sources of data science, but this doesn't impress him a lot and I didn't get selected :(.
So, I thought this time, I would just not "learn" it. I would "understand" it completely.
Why do we even need these metrics?
So, you might have heard a lot of times that if your target class is imbalanced, then accuracy might not work. It is better to look into other metrics.
It doesn't help to visualize what it meant in actuality. So, let's look at a very intuitive example.
Problem: Mr. X wants to build a classification model that will predict whether his crush accepts his proposal or not.
Assumption: X had tried multiple times in the past on his other crushes he has also gathered the data that if he tries certain things it would result in "Acceptance" (1), and if he tried other things it would result in a "Not Acceptance" (0).
Execution: It's a simple binary classification model. He builds a model, and now it's time to look into how his model has performed.
Classification report: Let's say you have the following classification metrics:
So, to give you an Idea, Mr X is quite a Playboy type of person :p. He had tried 1000 times.
So the current model is predicting good or Not accepted. It seems the data is imbalanced (i.e., X got rejected 999 times and only 1 time someone has accepted his proposal.)
Since the data is imbalanced the model has also predicted the wrong result.
X has got one step ahead and started calculating the "accuracy" of his model. (Mr. X is a noob in data science :p). He got an accuracy of 99.9%. Happy he, started giving input to his model to predict whether his crush accepted his proposal. But did you figure out what is wrong with his model?
In the meantime, his friend Mr. Y (an expert in data science) told him to wait as he was doing things wrong and introduced him to the concepts of Precision, Recall, and F1.
Precision
It tells out of all the predicted values as True how many are actually true.
So in professional terminology, precision is the ratio of (True Positive/ True Positive+False Positive).
We always try to keep precision as high as possible and hence we try to keep false positives (type 1 error) as low as possible
so as per our example X's model has a 0 precision which means a high false positive. So he has to check what he has done wrong and correct his model (maybe apply some resampling techniques to handle imbalance data)
Recall
It tells about out-of-actual true values and how many of them are correctly predicted as true. (see how it is different from precision and can be confusing but not anymore)
Recommended by LinkedIn
So, it is the ratio of (true positive/true positive +false negative)
We always try to keep recall as high as possible and hence we try to keep false negatives (type 2 error) as low as possible
so as per our example X's model has a 0 recall which means a high false negative. So he has to check what he has done wrong and correct his model.
F1 Score
There might be some situations when you don't wanna see recall/precision separately. In this case, metrics F1 score comes into handy.
It is just the Harmonic Mean of Precision (P) & Recall (R) and it should be as high as possible.
[Pro tip: Always remember Harmonic Mean tends towards the value which is less among the two values]
Precision-Recall Trade-off
So as you have read through the article these two metrics somehow feel inversely related if one increases other decreases vice versa.
So, it depends upon the kind of problem you are solving to understand which metrics are to keep high whether precision or recall.
So if we take our example Mr. X should try to build such a model that has high precision (low False Positives) and can also compromise on False negatives (these are those cases which are in his favor only if this gets predicted that are not an issue as per me.)
Next Step
In the future series of this article, we will look into how these metrics are calculated for multiclass classification problems.
Thanks
Puru Bhatnagar