Rest of Notes for Hands-On Machine Learning with Scikit Learn and Tesnor Flow

I decided to just skim the rest of the textbook and just try to become awarer of concepts and tools that may be useful. I think I will be able to learn more if I try to start training models using Kaggle competitions / other datasets and learn more that way.

2 450

Chapter 5 - Support Vector Machines

A support vector machine is a very powerful and versatile Machine Learning model, capable of performing linear or nonlinear classification, regression, and even outlier detection. It is one of the most popular models in Machine Learning, and anyone interested in Machine Learning should have it in their toolbox. SVMs ate particularly well suited for classification of complex but small - or medium-sized datasets.

  1. What is the fundamental idea behind Support Vector Machines?
  1. What is a support vector?
  1. Why is it important to scale the inputs when using SVMs?
  1. Can an SVM classifier output a confidence score when it classifies an instance? What about a probability?
  1. Should you use the primal or dual form of the SVM problem to train a model on a training set with millions of instances and hundreds of features?
  1. Say you trained an SVM classifier with an RBF kernel. It seems to underfit the training set: should you increase or decrease gamma? What about C?
  1. Train a LinearSVC on a linearly separable dataset. Then train an SVC and a SGDClassifier on the same dataset. See if you can get them to produce roughly the same model.
  2. Train an SVM classifier on the MNIST dataset. Since SVM classifiers are binary classifiers, you will need to use one-versus-all to classify all 10 digits. You may want to tune the hyperparameters using small validation sets to speed up the process. What accuracy can you reach?
import pandas as pd 
import numpy as np
from sklearn.datasets import load_iris, load_digits, fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC, LinearSVC
from sklearn.linear_model import SGDClassifier
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt 
from typing import Tuple
import os
np.random.seed(42) # make random number reproducible
rand_num = np.random.randint(0,100)
os.system('cls')

print("-"*20 + "QUESTION 1" + "-" * 20)
iris = load_iris(as_frame=True)
iris: pd.DataFrame = iris.frame
target = iris['target']
data = iris.drop(columns=['target'])
# Pass in random_state for reproducible output across multiple function calls
X_train, X_test, y_train, y_test = train_test_split(data,target,random_state=rand_num,test_size=0.2)

C = 5
alpha = 1  / (C * len(X_train))
# You could perform gridsearchcv to improve the models by tuning the hyperparameters

svc_clf = Pipeline([
    ('scaler',StandardScaler()),
    ('svc', SVC(kernel="linear",C=C,random_state=rand_num))
])
lin_svc_clf = Pipeline([
    ('scaler',StandardScaler()),
    ('lin_svc',LinearSVC(loss="hinge",C=C,random_state=rand_num)),
])
sgd_clf = Pipeline([
    ('scaler',StandardScaler()),
    ('sgd',SGDClassifier(loss="hinge",learning_rate="constant",eta0=0.001,alpha=alpha,max_iter=1000,tol=1e-3,random_state=rand_num))
])

svc_clf.fit(X_train,y_train)
lin_svc_clf.fit(X_train,y_train)
sgd_clf.fit(X_train,y_train)

print("SVC Classifier Score: ",svc_clf.score(X_test,y_test))
print("Linear SVC Classifier Score: ",lin_svc_clf.score(X_test,y_test))
print("SGD Classifier Score: ",sgd_clf.score(X_test,y_test))



print("-"*20 + "QUESTION 2" + "-" * 20)

digits_data = load_digits(as_frame=True)
digits_frame: pd.DataFrame = digits_data.frame
digits_series = digits_frame['target']
digits_frame = digits_frame.drop(columns=["target"])
X_train, X_test, y_train, y_test = train_test_split(digits_frame,digits_series,random_state=rand_num,test_size=0.2)

base_model = Pipeline([
    ('std',StandardScaler()),
    ('svc',SVC(kernel="linear",C=C,random_state=rand_num))
])
param_grid = {
    "svc__kernel": ["linear","poly","rbf"],
    "svc__C": [5,10,20],
    "svc__gamma": ['scale','auto']
}

digit_clf = GridSearchCV(base_model,param_grid=param_grid,cv=5)
digit_clf.fit(X_train,y_train)
print(digit_clf.best_params_)
print(digit_clf.score(X_test,y_test))


print("-"*20 + "QUESTION 3" + "-" * 20)
housing = fetch_california_housing()
X = housing['data']
y = housing['target']
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=rand_num)

Chapter 6 - Decision Trees

Like SVMs, Decision Trees are versatile Machine Learning algorithms that can per‐
form both classification and regression tasks, and even multioutput tasks. They are very powerful algorithms, capable of fitting complex datasets.
One of the many qualities of Decision Trees is that they require very little data preparation. In particular, they don’t require feature scaling or centering at all.
As you can see Decision Trees are fairly intuitive and their decisions are easy to interpret. Such models are often called white box models. In contrast, as we will see, Random Forests or neural networks are generally considered black box models. They make great predictions, and you can easily check the calculations that they performed to make these predictions; nevertheless, it is usually hard to explain in simple terms why the predictions were made.

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
import os 
os.system('cls')
iris = load_iris()
X = iris.data[:,2:] # petal length and width 
y = iris.target
# Decision Trr Classifier
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X,y)
print(tree_clf.predict_proba([[5, 1.5]]))
# Decision Tree Regressor
tree_reg = DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X,y)

Chapter 7 - Ensemble Learning and Random Forests

Chapter 8 - Dimensionality Reduction

Many machine learning problems involve thousands or even millions of features for each training instance. Not only does this make training extremely slow, it can also make it much harder to find a good solution, as we will see. This problem is often referred to as the curse of Dimensionality.

Chapter 9 - Unsupervised Learning and Random Forests

Although most of the applications of Machine Learning today are based on supervised learning (and a result, this is where most of the investments go), the vast majority of available data is actually unlabeled: we have the input features X, but do not have the labels y.

Chapter 10 - Introduction to Artificial Networks With Keras

Keras is a high-level Deep Learning API that allows you to easily build, train, evaluate and execute all sorts of neural networks. Its documentation (or specification) is available at Keras.io

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC