Ensembles on Random Patches

Looking for more research papers to read, I scanned my Hands-On Machine Learning notes for the many papers that were referenced there. This is one of those papers. These papers are mainly on machine learning and deep learning topics.

Reference Ensembles on Random Patches Paper

Date Created:
Last Edited:
1 2

Introduction

This paper considers supervised learning under the assumption that the available memory is small compared to the dataset size. This general framework is relevant in the context of big data, distributed databases and embedded systems. This paper investigates a very simple, yet effective, ensemble framework that builds each individual model of the ensemble from a random patch of data obtained by drawing random samples from both instances and features from the whole dataset. Using decision-tree based estimators, this paper shows that this proposed method provides on par performance in terms of accuracy while simultaneously lowering the memory needs and attains significantly better performance when memory is severely constrained.

Notes

This paper considers supervised learning problems for which the dataset is so large that it cannot be loaded into memory. In a previous paper, it was proposed to use the Pasting method to tackle this problem by learning an ensemble of estimators individually built on random subsets of the training examples, hence alleviating the memory requirements the vase estimators would be built on only small parts of the whole dataset. Another paper proposed to learn an ensemble of estimators individually built on random subspaces (random subsets of features). This can be seen as a way to reduce the memory requirements of building individual models. This paper proposes to combine and leverage both approaches at the same time: learn an ensemble of estimators on random patches, on random subsets of the samples and features. This paper shows that this approach preserves comparable accuracy with respect to other ensemble approaches which build base estimators on the whole training set while drastically lowering the memory requirements and allowing an equivalent reduction of the global computing time.

The Random Patches algorithm in this work is a wrapper ensemble method that can be described in the following terms: Let R(ps,pf,D) be the set of all random patches of size psNs × pfNf that can be drawing from the dataset D, where Ns (resp. Nf) is the number of samples D (resp. the number of features in D) and where ps [0,1] (resp. pf) is a hyperparameter that controls the number of samples in a patch (resp, the number of features). That is, R(ps,pf,D) is the set of all possible subsets containing psNs samples (among Ns) with pfNf features (among Nf). The method then works as follows:

  1. Draw a patch r U(R(ps,pf,D)) uniformly at random
  2. Build an estimator on the selected patch r
  3. Repeat 1-2 for a preassigned number T of estimators
  4. Aggregate the predictions by voting (in case of classifiers) or averaging (in case of regressors) the predictions of the T estimators

Results

The results show first and foremost that ensembles of randomized trees nearly always beat ensembles of standard decision trees. s off-the-shelf methods, the paper advocates that ensembles of such trees should be preferred to ensembles of decision trees. This study validates the Random Patches approach.

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC