The Random Subspace Method for Constructing Decision Forests
Looking for more research papers to read, I scanned my Hands-On Machine Learning notes for the many papers that were referenced there. This is one of those papers. These papers are mainly on machine learning and deep learning topics.
Reference The Random Subspace Method for Constructing Decision Forests Paper
Introduction
Much of the previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between over fitting and achieving maximum accuracy is seldom resolved. In this paper, a method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudo-randomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experimenting on public available datasets, where the method’s superiority is demonstrated.
A persistent problem in using decision tree for classification is how to avoid overfitting a set of training data while achieving maximum accuracy. This author has proposed combining multiple trees constructed in randomly selected subspaces can achieve nearly monotonic increase in generalization accuracy while preserving perfect accuracy on training data.
Notes
Methods for Constructing a Decision Tree
The majority of tree construction methods use linear splits at each internal node. A typical method selects a hyperplane or multuple hyperplanes at each internal node, and samples are assigned to branches representing different regions of the feature space bounded by such hyperplanes. The methods differ by the number of and orientations of hyperplanes. These method can be categoized by the types of splits they produce:
- axis-parallel linear splits A threshold is chosen on the values of a particular feature dimension, and samples are assigned to the branches according to whether the corresponding feature values exceed the threshold.
- oblique linear splits Samples are addigned to the branches according to which side of a hyperplane or which region bounded by multiple hyperplanes they fall in, but the hyperplanes are not necessarily parallel to any axis of the feature space
- piecewise linear splits Branches represent a Voroni tessellation of the feature space. Samples are assigned on nearest-neighbor matching to chosen anchor points.
Within each category, the splitting function can be obtained in many ways. Stopping criteria is a big issue in tree construction. Each splitting function defines a model for projecting classification form the training sampels to unclassified points in the space. From this point of view, it is not surprising to see that, no method could be universally best for an arbitrary, finite training set.
Systematic Construction of a Forest
The author’s method is another example of inproving accuracy through combining the power of multiple classifiers. Here, each classifier is a decision tree, and the combined classifier is called a decision forest. The method relies on an autonomous, pseudorandom procedure to select a small number of dimensions from a given feature space. In each pass, such a selection is made and a subspace is fixed where all points have a constant value (say zero( in the unselected dimensions. A;; samples are projected to this subspace, and a decsion tree is constructed using the projected training samples.
For a forest to achieve better accuracy than individual trees, it is critical there shoould be sufficient independence or dissimilarity between the trees. A simple measure of similarity between two decsion trees can be the amount that their decision regions overlap on.
Comments
You can read more about how comments are sorted in this blog post.
User Comments
There are currently no comments for this article.