Extremely Randomized Trees
Looking for more research papers to read, I scanned my Hands-On Machine Learning notes for the many papers that were referenced there. This is one of those papers. These papers are mainly on machine learning and deep learning topics.
Reference Extremely Randomized Trees Paper
Introduction
This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to the problem specifics by the appropriate choice of a parameter. This paper evaluates the robustness of the default choice of this parameter and provides insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency.
One can view Bagging as a particular instance of a broader family of ensemble methods that is called randomization methods. These methods explicitly introduce randomization into the learning algorithm and/or exploit at each run a different randomized version of the online learning sample, so as to produce an ensemble of more or less strongly diversified model. Predictions of these models are aggregated by a simple average or a majority vote in the case of classification. The generic randomization methods often considerably improve the accuracy of decision or regression trees, which, otherwise, are often not competitive with other supervised learning algorithms like neural networks or support vector machines.
The very high variance of decision and regression tree splits suggest investigating whether higher randomization levels could improve accuracy with respect to existing ensemble methods. To this end, the paper proposes and studies a new algorithm that, for a given numerical attribute, selects its cut-point fully at random, i.e. independent of the target variable. At each tree node, this is combined with a random choice of a certain number of attributes among which the best one is determined. In the extreme case, the method randomly picks a single attribute and cut-point at each node, and hence builds totally randomized trees whose structures are independent of the target variable values of the learning sample.
Extra-Trees Algorithm
Definitions
- attribute: denotes a particular input variable used in supervised learning problem
- candidate attributes: denote all input variables that are available for a given problem
- output refers to the target variable that defines the supervised learning problem
- learning sample denotes the observations used to build a model
- learning sample denotes the observations used to build a model
- test sample refers to the observations used to compute its accuracy.
- refers to the size of the learning sample
The Extra-Trees algorithm builds an ensemble of un-pruned decision or regression trees according to the classical top-down procedure. Its two main differences with other tree-based ensemble methods are that it splits nodes by choosing cut-points fully at random and that it uses the whole learning sample to grow the trees.
The Extra-Trees splitting procedure for numerical attributes is given by the image above. It has two parameters: , the number of attributes randomly selected at each node and , the minimum sample size for splitting a node. It is used several times with the original learning sample to generate an ensemble model. The predictions of trees are aggregated to yield the final prediction, by mahority vote in classification problems and arithmetic average in regression problems.
Comments
You can read more about how comments are sorted in this blog post.
User Comments
There are currently no comments for this article.