Random forest is an ensemble classifier, i.e. a combining classifier that uses and combines many decision tree classifiers. Ensembling is usually done using the concept of bagging with different feature sets. The reason for using large number of trees in random forest is to train the trees enough such that contribution from each feature comes in a number of models. After the random forest is generated by combining the trees, majority vote is applied to combine the output of the different trees. The result from the ensemble model is usually better than that from the individual decision tree models.
The random forest algorithm works as follows:
Out-of-bag (00B) error in random forest
We have seen that in random forests, each tree is built using a different bootstrap sample taken from the original data. The samples left out of the bootstrap and not used in the construction of the i-th tree can be used to measure the performance of the model. At the end of the run, predictions for each such sample evaluated each time are tallied, and the final prediction for that sample is obtained by taking a vote. The total error rate of predictions for such samples is termed as out-of-bag (OOB) error rate.
The error rate shown in the confusion matrix reflects the OOB error rate. Because of this reason, the error rate displayed is often surprisingly high.
Strengths of Random Forest
Weaknesses of Random Forest
Application of Random Forest
A very powerful and effective classifier called random forest combines the adaptability of numerous decision tree models into a single model. This ensemble model is becoming widely used and well-liked among machine learning practitioners to address a variety of classification issues as a result of the superior results.