Supervised Learning
I want to go through the Wikipedia series on Machine Learning and Data mining. Supervised learning (SL) is a paradigm in machine learning where input objects (for example, a vector of predictor variables) and a desired output value (also known as a human-labeled supervisory signal) train a model.
References
Notes
Supervised Learning (SL) is a paradigm in machine learning where input objects (for example, a vector of predictor variables) and a desired output value (also known as a human-labeled supervisory signal) train a model. The training data is processed, building a function that maps new data to expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data in a "reasonable" way. The statistical quality of an algorithm is measured through the so-called generalization error.
Steps in supervised learning
- Determine the type of training samples.
- Gather a training set that is representative of the real-world use of the function.
- Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object.
- Determine the structure of the learned function and corresponding learning algorithm, e.g., SVM or Decision Trees.
- Complete the design. Run the learning algorithm on the gathered training set.
- Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate form the training set.
Algorithm Choice
These are the four major issues to consider in supervised learning"
Bias-variance tradeoff
A learning algorithm is biased for a particular input if, when trained on each of these data sets, it is systematically incorrect when predicting the correct output for . A learning algorithm has high variance for a particular input if it predicts different output values when trained on different training sets. The prediction error of a learned classifier is related to the sum of the bias and variance of the learning algorithm. Generally, there is a tradeoff between bias and variance.
Function Complexity and Amount of Training Data
If the true function - the function that accurately captures the data - is simple, then an "inflexible" learning algorithm with high bias and low variance will be able to learn from small amounts of data. But if the true function is highly complex, then the function will only be able to learn with a large amount of training data paired with a flexible learning algorithm with high variance and low bias.
Dimensionality of the input space
If the input feature vectors have large dimensions, the learning function can be difficult even if the true function only depends on a small number of those features. This is because many of the extra dimensions can confuse the learning algorithm and cause it to have high variance. Removing irrelevant features will likely improve the accuracy of the learned function.
Noise in the Output Values
If the desired output values are often incorrect, then the learning algorithm should not attempt to find a function that exactly matches the training examples. Attempting to fit the data too carefully leads to overfitting. Early stopping and removing the noisy training examples prior to training are two ways to alleviate the problems that arise from noise in the output values.