Notes / Resources on Machine Learning

Creating this page to keep track of some helpful machine learning resources that I come across and to remind myself of some concepts as I try to complete some machine learning projects

Date Created:
Last Edited:
2 519

Imputation of Missing Values


For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs, or other placeholders. Such datasets are incompatible with scikit-learn estimators which assume that all values in an array are numerical, and that all have and hold meaning.
  • You could remove the rows of data or impute them - infer them from known parts of data.
  • You can either impute values only by using values from that column - SimpleImputer - or you could use multivariate imputation algorithms to use the entire set of available feature dimensions to estimate the missing values - IterativeImputer
  • SimpleImputer allows you to replace missing values with a provided constant value or using statistics from that column (mean, median, or most frequent)
  • KNNImputer provides imputation for filling in missing values using the k-Nearest-Neighbors approach.


Cross Validation


  • Something you need to review



Precision vs Recall of Classifiers


  • Something you need to review



Bagging and Pasting


  • Something you need to review


Kernel Trick


  • Something you need to review


Random Forests


  • Something you need to review


You can read more about how comments are sorted in this blog post.

User Comments