Extremely Randomized Trees

Looking for more research papers to read, I scanned my Hands-On Machine Learning notes for the many papers that were referenced there. This is one of those papers. These papers are mainly on machine learning and deep learning topics.

Reference Extremely Randomized Trees Paper

Date Created:
Last Edited:
1 29

Introduction

This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to the problem specifics by the appropriate choice of a parameter. This paper evaluates the robustness of the default choice of this parameter and provides insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency.

One can view Bagging as a particular instance of a broader family of ensemble methods that is called randomization methods. These methods explicitly introduce randomization into the learning algorithm and/or exploit at each run a different randomized version of the online learning sample, so as to produce an ensemble of more or less strongly diversified model. Predictions of these models are aggregated by a simple average or a majority vote in the case of classification. The generic randomization methods often considerably improve the accuracy of decision or regression trees, which, otherwise, are often not competitive with other supervised learning algorithms like neural networks or support vector machines.

The very high variance of decision and regression tree splits suggest investigating whether higher randomization levels could improve accuracy with respect to existing ensemble methods. To this end, the paper proposes and studies a new algorithm that, for a given numerical attribute, selects its cut-point fully at random, i.e. independent of the target variable. At each tree node, this is combined with a random choice of a certain number of attributes among which the best one is determined. In the extreme case, the method randomly picks a single attribute and cut-point at each node, and hence builds totally randomized trees whose structures are independent of the target variable values of the learning sample.

Extra-Trees Algorithm

Definitions

  • attribute: denotes a particular input variable used in supervised learning problem
  • candidate attributes: denote all input variables that are available for a given problem
  • output refers to the target variable that defines the supervised learning problem
  • learning sample denotes the observations used to build a model
  • learning sample denotes the observations used to build a model
  • test sample refers to the observations used to compute its accuracy.
  • N refers to the size of the learning sample

The Extra-Trees algorithm builds an ensemble of un-pruned decision or regression trees according to the classical top-down procedure. Its two main differences with other tree-based ensemble methods are that it splits nodes by choosing cut-points fully at random and that it uses the whole learning sample to grow the trees.

PIC

The Extra-Trees splitting procedure for numerical attributes is given by the image above. It has two parameters: K, the number of attributes randomly selected at each node and nmin, the minimum sample size for splitting a node. It is used several times with the original learning sample to generate an ensemble model. The predictions of trees are aggregated to yield the final prediction, by mahority vote in classification problems and arithmetic average in regression problems.

You can read more about how comments are sorted in this blog post.

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Files

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Upload Previous Version of Editor State

ESC