Ensemble technique is the method of combining several models(called as weak learners) into a single model so as to produce a model with much better accuracy and get better results. Weak Models when combined carefully can give rise to more robust and accurate models.
Ensemble Techniques broadly fall into three buckets: bagging, boosting, and stacking. This will focus on boosting algorithms. Bagging and Boosting decrease the variance of your single estimate as they combine several estimates from different models. So the result may be a model with higher stability.
Before we start, we have to assume you have some prior knowledge of Decision trees and Random Forests. Also you need to be familiar with the concept and the tradeoff between bias and variance. Gradient boost for regression is different from doing Linear Regression, so don’t get confused.
If the problem is that the single model gets a very low performance, Bagging will rarely get a better bias. However, Boosting could generate a combined model with lower errors as it optimizes the advantages and reduces the pitfalls of the single model.
You can refer my other article on Bagging based algorithms to understand them further!
So how does Boosting actually work?
Boosting works by randomly choosing datasets from the training dataset. The objective of boosting is to train weak learners sequentially, each trying to correct its predecessor. Classification models are created using selected features then tested with the existing training dataset. For building the next random dataset, data points that were wrongly classified in the previous training dataset will be prioritized. In this manner, boosting gradually builds M random datasets using learnings from the previous models. It is also called building up of multiple decision stumps to improve on the training dataset.
Types of Boosting Algorithms
AdaBoost was the first successful algorithm developed for Binary Classification. AdaBoost is best used to boost the performance of decision trees on Binary Classification problems. They achieve accuracy just above random chance on a classification problem. AdaBoost uses Decision Stumps(one level Decision tree taking decision based on a single feature). Decision Stumps are used to classify the data points and are iteratively improved by increasing the priority of data points. Aggregating Decision stumps can also be done to ensure that data fits without overfitting.
An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
AdaBoost algorithm can be accessed from Sklearn library by sklearn.ensemble by using the AdaBoostRegressor and AdaBoostClassifier.
Here is a code sample of AdaBoost:
You can learn all the methods and parameters involved in AdaBoost here.
It produces a prediction model in the form of ensemble weak prediction models, usually decision trees. Gradient Boosting combines weak learners into a single strong learner in an iterative manner. Gradient Boost focuses on minimising the loss function within the model. Once a loss function is defined for a particular prediction model, gradient boosting is used to minimise the value of this function, thus minimising the error which constructing the subsequent tree, modifying the weights associated with data points in the data set. Gradient boosting is a greedy algorithm and can overfit the training dataset quickly. here are three types of enhancements to basic gradient boosting that can improve performance:
1.Tree Constraints: such as the depth of the trees and the number of trees used in the ensemble.
2.Weighted Updates: such as a learning rate used to limit how much each tree contributes to the ensemble.
3.Random sampling: such as fitting trees on random subsets of features and samples.
Gradient boosting can be implemented from sklearn by using sklearn.ensemble GradientBoostingRegressor and GradientBoostingClassifier.
Here is a code sample of Gradient Boosting:
You can also learn more about the Gradient Boosting technique here.
Extreme Gradient Boosting algorithm is an improved implementation of gradient boosting. It is a technique in which new models are added to mitigate the error made in previous models. this process is repeated until the accuracy reaches an optimum level. Off late, xgboost has become the go-to boosting algorithm when it comes to structured and tabular datasets on classification and regression predictive modelling problems, as it improves on the gradient boosting algorithms, providing much better results on the same testing data.
Xg boost is a different software library that needs to be downloaded and installed using sudo pip install xgboost, and look at the official documentation of xgboost for following all the steps to use xgboost here.
Here is a sampe code for xgboost:
Boosting techniques reduce the bias introduced in our datasets, and we learnt about 3 different boosting techniques in this blog. Boosting techniques aggregate weak learners and make a strong model that gives us more accurate results. Adaboost makes predictions by engaging several decision trees to every sample and aggregate their results. Gradient Boosting provides much more flexibility over Adaboost and is also able to handle missing data properly. However Gradient Boosting is resource-intensive & xgboost addresses these challenges better.
References used –