Supervised Learning

I want to go through the Wikipedia series on Machine Learning and Data mining. Supervised learning (SL) is a paradigm in machine learning where input objects (for example, a vector of predictor variables) and a desired output value (also known as a human-labeled supervisory signal) train a model.

Date Created:
2 434

References



Notes


Supervised Learning (SL) is a paradigm in machine learning where input objects (for example, a vector of predictor variables) and a desired output value (also known as a human-labeled supervisory signal) train a model. The training data is processed, building a function that maps new data to expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data in a "reasonable" way. The statistical quality of an algorithm is measured through the so-called generalization error.

Steps in supervised learning

  1. Determine the type of training samples.
  2. Gather a training set that is representative of the real-world use of the function.
  3. Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object.
  4. Determine the structure of the learned function and corresponding learning algorithm, e.g., SVM or Decision Trees.
  5. Complete the design. Run the learning algorithm on the gathered training set.
  6. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate form the training set.

Algorithm Choice

These are the four major issues to consider in supervised learning"

Bias-variance tradeoff

A learning algorithm is biased for a particular input if, when trained on each of these data sets, it is systematically incorrect when predicting the correct output for . A learning algorithm has high variance for a particular input if it predicts different output values when trained on different training sets. The prediction error of a learned classifier is related to the sum of the bias and variance of the learning algorithm. Generally, there is a tradeoff between bias and variance.

Function Complexity and Amount of Training Data

If the true function - the function that accurately captures the data - is simple, then an "inflexible" learning algorithm with high bias and low variance will be able to learn from small amounts of data. But if the true function is highly complex, then the function will only be able to learn with a large amount of training data paired with a flexible learning algorithm with high variance and low bias.

Dimensionality of the input space

If the input feature vectors have large dimensions, the learning function can be difficult even if the true function only depends on a small number of those features. This is because many of the extra dimensions can confuse the learning algorithm and cause it to have high variance. Removing irrelevant features will likely improve the accuracy of the learned function.

Noise in the Output Values

If the desired output values are often incorrect, then the learning algorithm should not attempt to find a function that exactly matches the training examples. Attempting to fit the data too carefully leads to overfitting. Early stopping and removing the noisy training examples prior to training are two ways to alleviate the problems that arise from noise in the output values.


Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC