×
☰ Menu

Multiple Linear Regression (MLR)

Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that predicts the outcome of a response variable by combining several explanatory variables. Modeling the linear relationship between the explanatory (independent) variables and response (dependent) variables is the aim of multiple linear regression. In essence, multiple regression is the extension of ordinary least-squares (OLS) regression because it involves more than one explanatory variable.

Key Points in Multiple Linear Regression (MLR)

  • Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable.
  • Multiple regression is an extension of linear (OLS) regression that uses just one explanatory variable.
  • MLR is widely used in econometrics and financial analysis.

Assumptions of multiple linear regression

Multiple linear regression makes all of the same assumptions as simple linear regression:

Homogeneity of variance (homoscedasticity): The size of the error in our prediction does not vary significantly across independent variable values.

Independence of observations: The dataset's observations were gathered using statistically valid sampling methods, and there are no hidden relationships between variables.

Normality: A normal distribution can be inferred from the data.

Linearity: The line of best fit through the data points is a straight line, rather than a curve or some sort of grouping factor.

In multiple linear regression, it is possible that some of the independent variables are actually correlated with one another, so it is important to check these before developing the regression model. If two independent variables are too highly correlated (r2 > ~0.6), then only one of them should be used in the regression model.


How to Perform a Multiple Linear Regression

 

The formula for a multiple linear regression is:

yi​=β0​+β1​xi1​+β2​xi2​+...+βp​xip​+ϵ

where, for i=n observations:

yi​ =dependent variable

xi = explanatory variables

β0​ =y -intercept (constant term)

βp= slope coefficients for each explanatory variableϵ=the model’s error term (also known as the residuals)​