Rest of Notes for Hands-On Machine Learning with Scikit Learn and Tesnor Flow

I decided to just skim the rest of the textbook and just try to become awarer of concepts and tools that may be useful. I think I will be able to learn more if I try to start training models using Kaggle competitions / other datasets and learn more that way.

Chapter 5 - Support Vector Machines

A support vector machine is a very powerful and versatile Machine Learning model, capable of performing linear or nonlinear classification, regression, and even outlier detection. It is one of the most popular models in Machine Learning, and anyone interested in Machine Learning should have it in their toolbox. SVMs ate particularly well suited for classification of complex but small - or medium-sized datasets.

  1. What is the fundamental idea behind Support Vector Machines?
  1. What is a support vector?
  1. Why is it important to scale the inputs when using SVMs?
  1. Can an SVM classifier output a confidence score when it classifies an instance? What about a probability?
  1. Should you use the primal or dual form of the SVM problem to train a model on a training set with millions of instances and hundreds of features?
  1. Say you trained an SVM classifier with an RBF kernel. It seems to underfit the training set: should you increase or decrease gamma? What about C?
  1. Train a LinearSVC on a linearly separable dataset. Then train an SVC and a SGDClassifier on the same dataset. See if you can get them to produce roughly the same model.
  2. Train an SVM classifier on the MNIST dataset. Since SVM classifiers are binary classifiers, you will need to use one-versus-all to classify all 10 digits. You may want to tune the hyperparameters using small validation sets to speed up the process. What accuracy can you reach?
import pandas as pd 
import numpy as np
from sklearn.datasets import load_iris, load_digits, fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC, LinearSVC
from sklearn.linear_model import SGDClassifier
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt 
from typing import Tuple
import os
np.random.seed(42) # make random number reproducible
rand_num = np.random.randint(0,100)
os.system('cls')

print("-"*20 + "QUESTION 1" + "-" * 20)
iris = load_iris(as_frame=True)
iris: pd.DataFrame = iris.frame
target = iris['target']
data = iris.drop(columns=['target'])
# Pass in random_state for reproducible output across multiple function calls
X_train, X_test, y_train, y_test = train_test_split(data,target,random_state=rand_num,test_size=0.2)

C = 5
alpha = 1  / (C * len(X_train))
# You could perform gridsearchcv to improve the models by tuning the hyperparameters

svc_clf = Pipeline([
    ('scaler',StandardScaler()),
    ('svc', SVC(kernel="linear",C=C,random_state=rand_num))
])
lin_svc_clf = Pipeline([
    ('scaler',StandardScaler()),
    ('lin_svc',LinearSVC(loss="hinge",C=C,random_state=rand_num)),
])
sgd_clf = Pipeline([
    ('scaler',StandardScaler()),
    ('sgd',SGDClassifier(loss="hinge",learning_rate="constant",eta0=0.001,alpha=alpha,max_iter=1000,tol=1e-3,random_state=rand_num))
])

svc_clf.fit(X_train,y_train)
lin_svc_clf.fit(X_train,y_train)
sgd_clf.fit(X_train,y_train)

print("SVC Classifier Score: ",svc_clf.score(X_test,y_test))
print("Linear SVC Classifier Score: ",lin_svc_clf.score(X_test,y_test))
print("SGD Classifier Score: ",sgd_clf.score(X_test,y_test))



print("-"*20 + "QUESTION 2" + "-" * 20)

digits_data = load_digits(as_frame=True)
digits_frame: pd.DataFrame = digits_data.frame
digits_series = digits_frame['target']
digits_frame = digits_frame.drop(columns=["target"])
X_train, X_test, y_train, y_test = train_test_split(digits_frame,digits_series,random_state=rand_num,test_size=0.2)

base_model = Pipeline([
    ('std',StandardScaler()),
    ('svc',SVC(kernel="linear",C=C,random_state=rand_num))
])
param_grid = {
    "svc__kernel": ["linear","poly","rbf"],
    "svc__C": [5,10,20],
    "svc__gamma": ['scale','auto']
}

digit_clf = GridSearchCV(base_model,param_grid=param_grid,cv=5)
digit_clf.fit(X_train,y_train)
print(digit_clf.best_params_)
print(digit_clf.score(X_test,y_test))


print("-"*20 + "QUESTION 3" + "-" * 20)
housing = fetch_california_housing()
X = housing['data']
y = housing['target']
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=rand_num)

Chapter 6 - Decision Trees

Like SVMs, Decision Trees are versatile Machine Learning algorithms that can per‐
form both classification and regression tasks, and even multioutput tasks. They are very powerful algorithms, capable of fitting complex datasets.
One of the many qualities of Decision Trees is that they require very little data preparation. In particular, they don’t require feature scaling or centering at all.
As you can see Decision Trees are fairly intuitive and their decisions are easy to interpret. Such models are often called white box models. In contrast, as we will see, Random Forests or neural networks are generally considered black box models. They make great predictions, and you can easily check the calculations that they performed to make these predictions; nevertheless, it is usually hard to explain in simple terms why the predictions were made.

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
import os 
os.system('cls')
iris = load_iris()
X = iris.data[:,2:] # petal length and width 
y = iris.target
# Decision Trr Classifier
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X,y)
print(tree_clf.predict_proba([[5, 1.5]]))
# Decision Tree Regressor
tree_reg = DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X,y)

Chapter 7 - Ensemble Learning and Random Forests

Chapter 8 - Dimensionality Reduction

Many machine learning problems involve thousands or even millions of features for each training instance. Not only does this make training extremely slow, it can also make it much harder to find a good solution, as we will see. This problem is often referred to as the curse of Dimensionality.

Chapter 9 - Unsupervised Learning and Random Forests

Although most of the applications of Machine Learning today are based on supervised learning (and a result, this is where most of the investments go), the vast majority of available data is actually unlabeled: we have the input features X, but do not have the labels y.

Chapter 10 - Introduction to Artificial Networks With Keras

Keras is a high-level Deep Learning API that allows you to easily build, train, evaluate and execute all sorts of neural networks. Its documentation (or specification) is available at Keras.io