Math behind Simple and Multiple Linear Regression

Maths behind Support Vector Machine

Introduction

Linear regression is a fundamental machine learning algorithm used for predicting a continuous target variable based on one or more input features. It's a simple yet powerful method for modeling the relationship between variables. In simple linear regression, we have one independent variable, while in multiple linear regression, we have multiple independent variables.

1. Simple Linear Regression

In simple linear regression, we have only one feature (X1), the dataset will look like:

Cost of House (Y) Area of House(X)
150,000 1500
450,000 2000
2,000,000 10000
350,000 3000
... ...

Equation

Thus, the equation for Simple Linear Regression becomes:
Y = θ0 + θ1X + ϵ
Here,
  • Y is the predicted value
  • θ0 is the intercept
  • θ1 is the slope
  • ϵ is the error

Cost Function

The cost function (J) using the Mean Squared Error (MSE) can be defined as :
J(θ01)=(1/2n) * Σni=1(Yi - (θ0 + θ1Xi))2
Here,
  • n is the number of data points
  • Yi is the actual value
  • 0 + θ1Xi) is the predicted value for data point i

2. Multiple Linear Regression

In Multiple linear regression, we have more than one feature (X1..Xp ), the dataset will look like:

Cost of House (Y) Area of House(X) Number of Rooms Built In
150,000 1500 3 1950
450,000 2000 4 1990
2,000,000 10000 10 2019
350,000 3000 2 2010
... ... ... ...

Equation

Thus, the equation for Multiple Linear Regression becomes:
Y = θ0 + θ1X1 + θ2X2 + .. + θpXp

Cost Function

The cost function (J) using the Mean Squared Error (MSE) can be defined as :
J(θ012..θp)=(1/2n) * Σni=1(Yi - (θ0 + θ1X1i + θ2X2i +.... + θpXpi))2
Here,
  • p is the number of features
  • 0 + θ1X1i + θ2X2i +.... + θpXpi) is the predicted value for data point i

3. Polynomial Regression

Polynomial regression is a powerful extension of linear regression. While linear regression models relationships using straight lines, polynomial regression can capture curved and nonlinear relationships between variables. We are using only 1 feature here, as the number of feature increases, the equation becomes more complex.

Equation

The equation for Polynomial Regression is:
Y = θ0 + θ1X11 + θ2X21 + .. + θnXm1 + ϵ
Here,
  • θ0, θ1 ... θm are the coefficients to be estimated
  • ϵ is the error term
  • m is the degree of polynomial

Cost Function

The cost function (J) using the Mean Squared Error (MSE) remains the same :
J(θ012..θp)=(1/2n) * Σni=1(Yi - θ0 - Σmj=1θjXij)2
Note The summation(Σ) is still for the number of data points along with one more Σ for the number of features

4. Lasso and Ridge Regression

Lasso

Lasso (Least Absolute Shrinkage and Selection Operator) regression is a regularization technique that can not only predict but also select important features. It works by finding the best linear equation (a combination of features with coefficients) that fits your data and predicts the target variable. However, it adds a twist: it penalizes the absolute values of the coefficients of the features.

Cost function

The equation for Lasso Regression cost function adds λ.Σni=1​∣βi​∣:
J(θ012..θn)=(1/2n) * Σni=1(Yi - (θ0 + θ1X1i + θ2X2i...+θnXni))2 + λ.Σnj=1​∣βj​∣
The penalty term λ.Σni=1​∣βi​∣ encourages some coefficients to become exactly zero, effectively performing feature selection. Lasso helps in simplifying complex models by removing less important features. This is L1 regularization as we are doing regularization of degree 1.

Ridge

Ridge regression uses L2 regularization, which adds the squares of the coefficients as a penalty term to the cost function. L2 regularization helps prevent overfitting by shrinking the coefficients towards zero but doesn't force them to become exactly zero. It retains all features in the model.

Cost Function

The equation for Ridge Regression cost function adds λ.Σni=1​∣β2i​∣:

Comments

Popular Posts