Bagging
I want to go through the Wikipedia series on Machine Learning and Data mining. Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
References
Notes
Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of ML classification and regression algorithms. It also reduced variance and overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method.
There are three types of datasets in bootstrap aggregating. These are the original, bootstrap, and out-of-bag datasets. The bootstrap dataset is made by randomly picking objects from the original dataset. It must be same size of the original dataset, but the bootstrap dataset can have duplicate objects. The out-of-bag dataset represents the instances that were not in the bootstrap dataset. The next step of the algorithm involves the generation of decision trees from the bootstrapped dataset.
Advantages:
- Many weak learners aggregated typically outperform a single learner over the entire set, and have less overfit
- Reduces variance in high-variance low-bias weak learner, which can improve efficiency
- Can be performed in parallel, as each separate bootstrap can be processed on its own before aggregation
Disadvantages:
- For a weak learner with high variance, bagging will also carry high bias into its aggregate
- Loss of interpretability of a model
- Can be computationally expensive depending on the dataset
Comments
You can read more about how comments are sorted in this blog post.
User Comments
There are currently no comments for this article.