×
☰ Menu

Selection of Model

There are three broad categories of machine learning approaches used for resolving different types of problems. They are

  1. Supervised

(a) Classification

(b) Regression

  1. Unsupervised

(a) Clustering

(b) Association analysis

  1. Reinforcement

The model that needs to be built/trained differs for each of the cases. When we try to choose the model to solve a machine learning problem, a number of factors come into play. The most important factors are (i) the kind of problem we want to solve using machine learning and (ii) the nature of the underlying data.

Machine learning algorithms are broadly of two types: models for supervised learning, which primarily focus on solving predictive problems and models for unsupervised learning, which solve descriptive problems.


Predictive Models

Models for supervised learning or predictive models, as is understandable from the name itself, try to predict certain value using the values in an input data set. The learning model attempts to establish a relation between the target feature, i.e., the feature being predicted, and the predictor features. The predictive models have a clear focus on what they want to learn and how they want to learn. Predictive models, in turn, may need to predict the value of a category or class to which a data instance belongs to. Below are some examples:

(i) Predicting win/loss in a cricket match

(ii) Predicting whether a transaction is fraud

(iii) Predicting whether a customer may move to another product

The models which are used for prediction of target features of categorical value are known as classification models. The target feature is known as a class and the categories to which classes are divided into are called levels. Some of the popular classification models include k-Nearest Neighbor (kNN), Naive Bayes, and Decision Tree. Predictive models may also be used to predict numerical values of the target feature based on the predictor features.

Below are some examples:

(i) Prediction of revenue growth in the succeeding year

(ii) Prediction of rainfall amount in the coming monsoon

(iii) Prediction of potential flu patients and demand for flu shots next winter

The models which are used for prediction of the numerical value of the target feature of a data instance are known as regression models. Linear Regression and Logistic Regression models are popular regression models.


Descriptive models

To describe a data set or to draw conclusions from it, models for unsupervised learning or descriptive models are used. In the case of unsupervised learning, there is no target feature or single feature of interest. A number of intriguing patterns or insights about the data set are discovered based on the value of all features.

Descriptive models which group together similar data instances, i.e. data instances having a similar value of the different features are called clustering models.

Examples of clustering include

(i) Customer grouping or segmentation based on social, demographic, ethnic, etc. factors

(ii) Clustering of music based on different aspects like language, genre, time-period, etc.

(iii) Clustering of commodities in an inventory

k-Means is the most widely used clustering model. For market basket analysis of transactional data, descriptive models related to pattern discovery are used. Market basket analysis determines the possibility of purchasing one product based on the purchase of another product based on the purchase pattern available in transactional data. For example, transactional data may reveal a pattern in which a customer who purchases milk also purchases a biscuit. This can be useful for targeted promotions or setting up in-store displays. Promotions for biscuits can be sent to customers who buy milk products, and vice versa. In addition, milk-related products can be placed near biscuits in the store.