Notes / Resources on Machine Learning
Creating this page to keep track of some helpful machine learning resources that I come across and to remind myself of some concepts as I try to complete some machine learning projects
Imputation of Missing Values
For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs, or other placeholders. Such datasets are incompatible with scikit-learn estimators which assume that all values in an array are numerical, and that all have and hold meaning.
- You could remove the rows of data or impute them - infer them from known parts of data.
- You can either impute values only by using values from that column - SimpleImputer - or you could use multivariate imputation algorithms to use the entire set of available feature dimensions to estimate the missing values - IterativeImputer
- SimpleImputer allows you to replace missing values with a provided constant value or using statistics from that column (mean, median, or most frequent)
- KNNImputer provides imputation for filling in missing values using the k-Nearest-Neighbors approach.
Cross Validation
- Something you need to review
Precision vs Recall of Classifiers
- Something you need to review
Bagging and Pasting
- Something you need to review
Kernel Trick
- Something you need to review
Random Forests
- Something you need to review
Comments
There are currently no comments to show for this article.