Data Science Basics: Supervised Learning
Classification techniques have a significant technique and it is the Naive Bayes algorithm. It is a method of creating classifiers, which are models that assign labels to instances based on certain features. Naive Bayes classifiers may also be called “Simple Bayes” or “Independence Bayes”. There are several different ways to build a Naive Bayes classifier, including probabilistic models, Gaussian, and multinomial models, and others.
A Naive Bayes Classifier is a program which deals with predictions of a class value given a set of set of attributes.
For each known class value,
-- Calculate probabilities for each attribute, conditional on the class value.
-- Use the product rule to obtain a joint conditional probability for the attributes.
-- Use Bayes rule to derive conditional probabilities for the class variable.
Once this has been done for all class values, output the class with the highest probability.
Bayes’ Theorem is stated as:
P(h|d) = (P(d|h) * P(h)) / P(d)
Where
-- P(h|d) is the probability of hypothesis h given the data d. This is called the posterior probability.
-- P(d|h) is the probability of data d given that the hypothesis h was true.
-- P(h) is the probability of hypothesis h being true (regardless of the data). This is called the prior probability of h.
-- P(d) is the probability of the data (regardless of the hypothesis).
You can see that we are interested in calculating the posterior probability of P(h|d) from the prior probability p(h) with P(D) and P(d|h).
For our example in this article, we’re going to focus on the probabilistic model. For this model, our instances are represented as a vector of independent variables called features. Based on this feature vector, probabilities for each class are assigned to each feature. For example, if we had the two classes “Male” and “Female”, the Boolean feature “beard” may have an 80% probability for men, and a 1% probability for women. Thus, our one feature vector of (true) would be classified as “Male” with 80% confidence. These probabilities are trained by running instances through the classifier which we already have marked with the correct classes.
To better understand the Naive Bayes, we can say that, it is a classification algorithm for binary (two-class) and multi-class classification problems. This is a method, which deals with our understanding when we use binary or categorical input values. It is also called an idiot Bayes since the calculation of the probabilities for each hypothesis are shortened to make their calculation docile. Rather than attempting to calculate the values of each attribute value P(d1, d2, d3|h), they are supposed to be conditionally independent given the target value and calculated as P(d1|h) * P(d2|H) and so on.
This is a very strong assumption that is most unlikely in real data, i.e. that the attributes do not interact. Nevertheless, the approach performs surprisingly well on data where this assumption does not hold.
Since our classifications are created using these feature vectors, it is important to make sure that your features are appropriate for your classes. For example, the feature “beard” is much more useful in classifying “Male” vs. “Female” than the feature “eye color” might be. It is also important to make sure your training data is an accurate representation of your instances. For example, if our training data happens to only have one male and one female, then we might end up in a situation where our classifier “thinks” 100% of females will have a beard, while no males will ever have a beard.