✅Naive Bayes Algorithm - Explained💯
Naive Bayes is a probabilistic algorithm that’s typically used for classification problems. It uses Conditional probability, which is a measure of the probability of an event occurring given that another event has (by assumption, presumption, assertion, or evidence) occurred. It is simple, intuitive, and yet performs surprisingly well in many cases. It is based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
Assumptions made by Naive Bayes
The fundamental Naïve Bayes assumption is that each feature makes an:
- Independent
- Equal
contribution to the outcome.
For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.
Note - Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x), and P(x|c). Look at the equation below:
Above,
- P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
- P(c) is the prior probability of class.
- P(x|c) is the likelihood which is the probability of predictor given class.
- P(x) is the prior probability of predictor.
This is a rather simple transformation, but it bridges the gap between what we want to do and what we can do. We can’t get P(C|X) directly, but we can get P(X|C) and P(C) from the training data. Here’s an example:
In this case, X =(Outlook, Temperature, Humidity, Windy), and Y=Play. P(X|Y) and P(Y) can be calculated:
Having this amount of parameters in the model is impractical. To solve this problem, a naive assumption is made. We pretend all features are independent. What does this mean?
Now with the help of this naive assumption (naive because features are rarely independent), we can make classification with much fewer parameters:
This is a big deal. We changed the number of parameters from exponential to linear. This means that Naive Bayes can deal with high-dimensional data well.
Another Example with Mathematics -
Problem: Players will play if the weather is sunny. Is this statement is correct?
Recommended by LinkedIn
We can solve it using above discussed method of the posterior probability.
- P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
- Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
- Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different classes based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.
Naïve Bayes Classifier assumes that all the features are unrelated to each other. The presence or absence of a feature does not influence the presence or absence of any other feature.
In real-world datasets, we test a hypothesis given multiple evidence on features. So, the calculations become quite complicated. To simplify the work, the feature independence approach is used to uncouple multiple pieces of evidence and treat each as an independent one.
The zero-frequency problem
One of the disadvantages of Naïve-Bayes is that if you have no occurrences of a class label and a certain attribute value together then the frequency-based probability estimate will be zero. And this will get a zero when all the probabilities are multiplied.
Solution - An approach to overcome this ‘zero-frequency problem’ in a Bayesian environment is to add one to the count for every attribute value-class combination when an attribute value doesn’t occur with every class value.
There are three types of Naive Bayes model under the sci-kit-learn library:
Gaussian: It is used in classification and it assumes that features follow a normal distribution.
Multinomial: It is used for discrete counts. For example, let’s say, we have a text classification problem. Here we can consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.
Bernoulli: The binomial model is useful if your feature vectors are binary (i.e. zeros and ones). One application would be text classification with a ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.
What are the Pros and Cons of Naive Bayes?
Pros:
Cons:
Tips to improve the power of the Naive Bayes Model
Here are some tips for improving the power of the Naive Bayes Model:
Applications of Naive Bayes Algorithms
Real-time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real-time.
Multi-class Prediction: This algorithm is also well known for its multi-class prediction feature. Here we can predict the probability of multiple classes of target variables.
Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to better results in multi-class problems and independence rule) have a higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)
Recommendation System: Naive Bayes Classifier and Collaborative Filtering together build a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not.
Thanks for Reading, Like Comment and Sharing if it's good.
Director (IT)/ Scientist 'E' at National Informatics Centre, MeitY
3yA good and neatly explained article . thanks
Ammonia and urea process technology specialist. Supervisor, Applications and Process Safety Engineering
3yVery clear explanation. Thanks for sharing