×
☰ Menu

Supervised Learning: Regression

Supervised learning is a type of machine learning where the algorithm is trained on labelled data, with inputs and outputs specified, to learn a mapping function from the input to the output. Regression is a type of supervised learning where the goal is to predict a continuous output variable based on one or more input variables.

In supervised learning regression, the algorithm tries to find the best relationship between the input and output variables to make accurate predictions on new data. The output variable can be a continuous variable, such as price, temperature, or weight, which the algorithm tries to predict based on input features, such as age, height, or location.

To perform regression analysis, we need a dataset with labelled data. The dataset is usually divided into a training set and a test set. The training set is used to train the model, and the test set is used to evaluate the performance of the model.

The performance of the regression model is measured using metrics such as mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination (R-squared).

There are various regression algorithms that can be used for supervised learning, such as linear regression, polynomial regression, decision trees, random forests, support vector regression, and neural networks. The choice of algorithm depends on the nature of the problem, the type and size of the dataset, and the performance requirements.

What is Regression

Regression is a statistical method used in data analysis and machine learning for modelling the relationship between a dependent variable (also known as the response variable or the outcome variable) and one or more independent variables (also known as explanatory variables, predictor variables, or features). The goal of regression analysis is to find a mathematical equation that can predict the value of the dependent variable based on the values of the independent variables.

The dependent variable is typically a continuous numerical variable, such as temperature, sales, or height, that we want to predict or explain. The independent variables can be either continuous or categorical variables, such as age, gender, income, or geographical location, that we believe may influence the value of the dependent variable.

Regression models are often used in predictive modelling, where we want to estimate the value of the dependent variable for new observations based on the values of the independent variables. There are various types of regression models, including linear regression, logistic regression, polynomial regression, ridge regression, Lasso regression, and elastic net regression, among others. The choice of the regression model depends on the nature of the problem, the type of data, and the assumptions made about the relationship between the variables.

Common Regression Algorithms

The most common regression algorithms are

  • Simple linear regression
  • Multiple linear regression
  • Polynomial regression
  • Multivariate adaptive regression splines
  • Logistic regression
  • Maximum likelihood estimation (least squares)