Precision, Recall, F1: The Pillars of Performance in Classification

Precision, Recall, F1: The Pillars of Performance in Classification

Precision, Recall, and F1 are some of the important topics in machine learning and also one of the most confusing topics for me every time I tried to learn it.

So, in one of my interviews, the interviewer asked what is precision and recall and explained it via an example. So my stupid ass started blah blahing about what I rote learned from different sources of data science, but this doesn't impress him a lot and I didn't get selected :(.

So, I thought this time, I would just not "learn" it. I would "understand" it completely.

Why do we even need these metrics?

So, you might have heard a lot of times that if your target class is imbalanced, then accuracy might not work. It is better to look into other metrics.

It doesn't help to visualize what it meant in actuality. So, let's look at a very intuitive example.

Problem: Mr. X wants to build a classification model that will predict whether his crush accepts his proposal or not.

Assumption: X had tried multiple times in the past on his other crushes he has also gathered the data that if he tries certain things it would result in "Acceptance" (1), and if he tried other things it would result in a "Not Acceptance" (0).

Execution: It's a simple binary classification model. He builds a model, and now it's time to look into how his model has performed.

Classification report: Let's say you have the following classification metrics:

Article content

So, to give you an Idea, Mr X is quite a Playboy type of person :p. He had tried 1000 times.

So the current model is predicting good or Not accepted. It seems the data is imbalanced (i.e., X got rejected 999 times and only 1 time someone has accepted his proposal.)

Since the data is imbalanced the model has also predicted the wrong result.

X has got one step ahead and started calculating the "accuracy" of his model. (Mr. X is a noob in data science :p). He got an accuracy of 99.9%. Happy he, started giving input to his model to predict whether his crush accepted his proposal. But did you figure out what is wrong with his model?

In the meantime, his friend Mr. Y (an expert in data science) told him to wait as he was doing things wrong and introduced him to the concepts of Precision, Recall, and F1.

Precision

It tells out of all the predicted values as True how many are actually true.

So in professional terminology, precision is the ratio of (True Positive/ True Positive+False Positive).

Article content

We always try to keep precision as high as possible and hence we try to keep false positives (type 1 error) as low as possible

so as per our example X's model has a 0 precision which means a high false positive. So he has to check what he has done wrong and correct his model (maybe apply some resampling techniques to handle imbalance data)

Recall

It tells about out-of-actual true values and how many of them are correctly predicted as true. (see how it is different from precision and can be confusing but not anymore)

So, it is the ratio of (true positive/true positive +false negative)

Article content

We always try to keep recall as high as possible and hence we try to keep false negatives (type 2 error) as low as possible

so as per our example X's model has a 0 recall which means a high false negative. So he has to check what he has done wrong and correct his model.

F1 Score

There might be some situations when you don't wanna see recall/precision separately. In this case, metrics F1 score comes into handy.

It is just the Harmonic Mean of Precision (P) & Recall (R) and it should be as high as possible.

Article content

[Pro tip: Always remember Harmonic Mean tends towards the value which is less among the two values]


Precision-Recall Trade-off

So as you have read through the article these two metrics somehow feel inversely related if one increases other decreases vice versa.

So, it depends upon the kind of problem you are solving to understand which metrics are to keep high whether precision or recall.

So if we take our example Mr. X should try to build such a model that has high precision (low False Positives) and can also compromise on False negatives (these are those cases which are in his favor only if this gets predicted that are not an issue as per me.)


Next Step

In the future series of this article, we will look into how these metrics are calculated for multiclass classification problems.


Thanks

Puru Bhatnagar




To view or add a comment, sign in

More articles by Puru Bhatnagar

  • A/B Testing via Bayesian Lens

    Here is another article in the series of #MLwithPB. For the last few weeks, I have been studying A/B testing not from…

  • Anomaly Detection: Red flag Identifier

    How wonderful would it be if you could identify strange or deviating patterns in your life that may skew your…

  • Playlist Recommender

    This is a simple yet good learning project I have made this weekend. I was bored of listening to the same music again…

Insights from the community

Others also viewed

Explore topics