Naive Bayes

Maths behind Naive Bayes

Introduction

The Naive Bayes algorithm is a simple and efficient probabilistic classification algorithm that is based on Bayes' theorem. It is widely used in various machine learning applications, particularly in natural language processing and text classification.

Maths behind it

The key idea behind Naive Bayes is to use Bayes' theorem to calculate the probability that a given data point belongs to a particular class. Bayes' theorem is expressed as:
Here,
  • P(C|X) is the posterior probability that data point X belongs to class C
  • P(X|C) is the likelihood of observing data point X given that it belongs to class C
  • P(C) is the prior probability of class C
  • P(X) is the marginal likelihood of observing data point X

In the context of Naive Bayes, we make a naive assumption that the features used to describe the data are conditionally independent given the class label. This simplifies the calculation of P(X|C) as the product of the probabilities of individual features:

P(X|C) = P(x1|C) . P(x2|C) ... P(xn|C)

How the classification works

  1. Calculate the prior probabilities P(C) for each class based on the training data
  2. For each feature xi in the data point X, calculate P(xi|C) for each class C based on the training data
  3. Use Bayes' theorem to calculate the posterior probabilities P(C|X) for each class C
  4. Assign the data point X to the class with the highest posterior probability

Example (Most important)

Let's assume the following represents the dataset about wether someone can play a match on a particular day

Weather Can Play?
Sunny Yes
Windy Yes
Sunny Yes
Rain No
Windy No
Windy No
Windy No
Rain Yes
Sunny No
Rain No


Likelihood of the table

Weather Yes No Probabiity
Sunny 2 1 3/10=0.3
Windy 1 3 4/10=0.4
Rain 0 3 3/10=0.3
All 3/10=0.3 7/10=0.7


So lets say we want to find out the probability of a player going to play on a sunny day P(Yes|Sunny).
Remember P(C|X) is the posterior probability that data point X belongs to class C. Sunny is X and C is class Yes
P(Yes|Sunny) = {P(Sunny|Yes) . P(Yes)}/ P(Sunny)
Here,
P(Yes) = 0.3 # row all
P(Sunny) = 0.3 # row sunny
P(Sunny|Yes) = Number of times it was Sunny with Yes/ Total number of Yes
= 2 / 3 = 0.66

Thus,
P(Yes|Sunny) = 0.66 * 0.3/ 0.3 = 0.66

Similarly , player not going to play on a sunny day P(No|Sunny)
P(No|Sunny) = {P(Sunny|No) . P(No)}/ P(Sunny)
Here,
P(No) = 0.7 # row all
P(Sunny) = 0.3 # row sunny
P(Sunny|No) = Number of times it was Sunny with No/ Total number of No
= 1 / 7 = 0.14

Thus,
P(No|Sunny) =0.14 * 0.7/ 0.3 = 0.32
Since P(Yes|Sunny) > P(No|Sunny) ie 0.66 > 0.32, the player will play

Cost function

Unlike many other machine learning algorithms, Naive Bayes does not have a cost function that is optimized during training. Instead, it relies on probability calculations and the Bayes' theorem to make classification decisions. The goal of Naive Bayes is to maximize the posterior probability P(C|X) for each class C to make accurate predictions.

References:

  1. Naïve Bayes Classifier Algorithm
Note: Parts of the article are developed by using ChatGPT

Comments

Popular Posts