Linear Regression

Different techniques can be used to prepare or train the linear regression equation from data, the most common of which is called Ordinary Least Squares

Linear Regression is a supervised learning algorithm which assumes a linear relationship between the input variables (x) and the continuous output variable (y). In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. When there is one input (explanatory or independent) variable, is called simple linear regression; for more than one, the process is called multiple linear regression.

Table of Content

  1. Overview
  2. Types of Linear Regression
  3. Assumptions
  4. Video
  5. Evaluation (To be added)
  6. Overfitting in Regression (To be added)
  7. Code (To be added)


The objective of Linear Regression is to predict a dependent variable value (y) based on a given independent variables (x) using a best fit line. We assume there is an linear relationship between dependent and independent variables; and dependent variable must be continuous in nature.

Let’s understand theory of linear regression using an example:

We want to predict the house prices (y) using an independent feature “size in square feet” (x).

size (sq. ft.) (x)price ($) (y)
linear regression housing dataset
Housing Data Set

To predict house price using above data set, Linear regression will help. Let’s suppose above line is the best fit line of linear regression (pink line in the above image), with a corresponding mathematical equation. Now, if we know the equation of that line, then for any given house size i.e. input (x), we can find any house price i.e. output (y).

Hypothesis Function

Lets use the hypothesis function formula as:


Cost Function

Linear regression cost function formula

Optimization – Gradient Descent

The vertical distance between the points and the fitted line (best fit line) are called errors. The main idea of objective function is to fit this regression line by minimizing the sum of squares of these errors or Cost Function. This is also known as principle of least squares.

minimize θ01 — J(θ01)

– where J(θ01) is the cost function.

Linear Regression Gradient Descent Iterations gif video
Univariate Linear Regression – Gradient Descent Iterations by updating m(θ1) and c(θo)

Visit this link for regression notes: Univariate Linear Regression

Assumptions of Linear Regression

1Linearity Linearity should be present1. Visualization – Scatter Plot (between dependent and independent variables)
2. Correlation with Dependent Variable
2Mean of Residuals should be ZeroResiduals are the differences between the true value and the predicted
3Check for HomoscedasticityHomoscedasticity means that the residuals have equal or almost equal variance across the regression line. By plotting the error terms with predicted terms we can check that there should not be any pattern in the error terms. There should be No heteroscedasticity1. Visualization – Residuals vs fitted values plot
2. Goldfeld Quandt Test
3. Bartlett’s Test
4Check for Normality of residualsDistribution for residuals should be normally distributedVisualization – Residuals Histogram Plot
5No autocorrelation of residualsWhen the residuals are autocorrelated, it means that the current value is dependent of the previous (historic) values and there is a definite unexplained pattern in the Y variable that shows up in the error terms. More common in time series data.1. Visualization – Residuals vs fitted values line plot
2. Ljungbox test to check autocorrelation
3. Autocorrelation Plot (ACF)
4. Partial Autocorrelation Plot (PACF)
6No perfect multicollinearityMulticollinearity is presence of high correlations among two or more independent variables.Variable Inflation Factors (VIF)
Assumptions of Linear Regression

Read more: Link1 Link2


Linear Regression 8 Videos Playlist

Leave a Comment

Keytodatascience Logo



Join our email list to receive the latest updates.

© 2022 KeyToDataScience