Pasting Small Votes for Classification in Large Databases and On-Line

Looking for more research papers to read, I scanned my Hands-On Machine Learning notes for the many papers that were referenced there. This is one of those papers. These papers are mainly on machine learning and deep learning topics.

Reference Pasting Small Votes for Classification in Large Databases and On-Line Paper

Date Created:
Last Edited:
1 3

Introduction

Many databases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. IF what we want to do is to use these databases to construct predictions of various characteristics, then since the usual methods require that all data be held in fast memory, various work-arounds have to be used. This paper studies one such class of methods which give accuracy comparable to that which could have been obtained if all data could have been held in core and which are computationally fast. The procesure takes small pieces of data, grows a predictor on each small piece and then pasts these predictions together. A version is given that scales to terabyte data sets.

Suppose that database D is a large collection of examples (yn,xn) where xn are input vectors and the yn are class labels. What we want to do is use this data to construct a classifier which can accurately assign a class label to future inputs x. But D is too large to hold in the memory of any available computer. What is shown in this paper is that the aggregation of many classifiers, each grown on a training set of modest size N selected from the data base D, can achieve almost optimum classification accuracy. The procedure is refered to as pasting votes together, and has two key items in its implementation:

  1. Suppose that up to the present, k predictors have been constructed. A new training set of size N is selected from D either by random or importance sampling. The (k + 1)st predictor is grown on this new training set and aggregated with the previous k. The aggregation is by unweighted plurality voting. If random sampling is used to select the training set, the training set is called an Rprecinct and the procedure is called pasting Rvotes (R=random). If is done by importance sampling, the training set is called Iprecinct and the procedure as pasting Ivotes (I=importance). The importance sampling used samples more heavily from the instances more likely tp be misclassified.
  2. An estimate e(k) of the generalization error for the kth aggregation of e(k) is updated. The pasting stops when e(k) stops decreasing. The estimate of e(k) can be gotten using a test set, but our implementation uses out-of-baf estimation.

The simplest version of pasting is to select each training set of size N by random sampling from the data base D, grow the classifier, repeat a preassigned number of K times, stop and aggregate the classifiers by voting. If, after aggregation, accuracy is checked on the test set, then further runs can be used to optimize the values of K and N. A more sophisticated version estimates e(k) after the kth aggregation and stops after e(k) stops decreasing.

The prime conclusion is that pasting small Ivotes together is a promising approach to constructing classifiers for large data sets and for on-line learning.

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC