Naive Bayes
Introduction
The Naive Bayes algorithm is a simple and efficient probabilistic classification algorithm that is based on Bayes' theorem. It is widely used in various machine learning applications, particularly in natural language processing and text classification.
Maths behind it
The key idea behind Naive Bayes is to use Bayes' theorem to calculate the probability that a given data point belongs to a particular class. Bayes' theorem is expressed as: Here,- P(C|X) is the posterior probability that data point X belongs to class C
- P(X|C) is the likelihood of observing data point X given that it belongs to class C
- P(C) is the prior probability of class C
- P(X) is the marginal likelihood of observing data point X
In the context of Naive Bayes, we make a naive assumption that the features used to describe the data are conditionally independent given the class label. This simplifies the calculation of P(X|C) as the product of the probabilities of individual features:
How the classification works
- Calculate the prior probabilities P(C) for each class based on the training data
- For each feature xi in the data point X, calculate P(xi|C) for each class C based on the training data
- Use Bayes' theorem to calculate the posterior probabilities P(C|X) for each class C
- Assign the data point X to the class with the highest posterior probability
Example (Most important)
Let's assume the following represents the dataset about wether someone can play a match on a particular day
Weather | Can Play? |
---|---|
Sunny | Yes |
Windy | Yes |
Sunny | Yes |
Rain | No |
Windy | No |
Windy | No |
Windy | No |
Rain | Yes |
Sunny | No |
Rain | No |
Likelihood of the table
Weather | Yes | No | Probabiity |
---|---|---|---|
Sunny | 2 | 1 | 3/10=0.3 |
Windy | 1 | 3 | 4/10=0.4 |
Rain | 0 | 3 | 3/10=0.3 |
All | 3/10=0.3 | 7/10=0.7 |
So lets say we want to find out the probability of a player going to play on a sunny day P(Yes|Sunny).
Remember P(C|X) is the posterior probability that data point X belongs to class C. Sunny is X and C is class Yes
Here,
P(Yes) = 0.3 # row all
P(Sunny) = 0.3 # row sunny
P(Sunny|Yes) = Number of times it was Sunny with Yes/ Total number of Yes
= 2 / 3 = 0.66
Thus,
P(Yes|Sunny) = 0.66 * 0.3/ 0.3 = 0.66
Similarly , player not going to play on a sunny day P(No|Sunny)
Here,
P(No) = 0.7 # row all
P(Sunny) = 0.3 # row sunny
P(Sunny|No) = Number of times it was Sunny with No/ Total number of No
= 1 / 7 = 0.14
Thus,
P(No|Sunny) =0.14 * 0.7/ 0.3 = 0.32
Cost function
Unlike many other machine learning algorithms, Naive Bayes does not have a cost function that is optimized during training. Instead, it relies on probability calculations and the Bayes' theorem to make classification decisions. The goal of Naive Bayes is to maximize the posterior probability P(C|X) for each class C to make accurate predictions.
Comments
Post a Comment