Training Deep Neural Networks Exercises Answers

This chapter goes over the considerations training deep neural networks with Keras.

Training Deep Neural Networks Exercises Answers

Question 1

Is it okay to initialize all the weights to the same value as long as that value is selected randomly using He initialization?

No, all weights should be sampled independently; they should not all have the same initial value. One important goal of sampling weights randomly is to break symmetry: if all the weights have the same initial value, even if that value is not zero, then symmetry is not broken (i.e., all neurons in a given layer are equivalent), and backpropagation will be unable to break it. Concretely, this means that all the neurons in any given layer will always have the same weights. It's like having just one neuron per layer, and much slower. It is virtually impossible for such a configuration to converge to a good solution.

Question 2

Is it okay to initialize the bias terms to 0?

It is perfectly fine to initialize the bias terms to zero. Some people like to initialize them just like weights, and that's OK too; it does not make much difference.

Question 3

Name three advantages of the SELU activation function over ReLU.

RELU(z)=max(0,z)SELUα(z)={α(exp (z)1) if z<0z if z0\text{RELU}(z) = \text{max}(0,z)\\[0.25em] \text{SELU}_{\alpha}(z)=\begin{cases}\alpha (\text{exp}\space (z) - 1 ) & \text{ if } z < 0 \\ z & \text{ if } z \geq 0 \end{cases}RELU(z)=max(0,z)SELUα(z)={α(exp (z)1)z if z<0 if z0
  1. SELU takes on negative values when z<0z < 0z<0 , whch allows the unit to have an average output closer to 0. This helps alleviate the vanishing gradients problem. The hyperparameter α\alphaα defines the value that the ELU function approaches when zzz is a large negative number.
  2. It has a nonzero gradient for z<0z < 0z<0 , which avoids the dead neurons problem.
  3. If α=1\alpha = 1α=1 then the function is smooth everywhere, which helps speed up Gradient Descent, since it does not bounce as much left and right of z=0z = 0z=0 .

Question 4

In which cases would you want to use each of the following activation functions: SELU, leaky ReLU (and its variants), ReLU, tanh, logistic, and softmax?

You should use leaky ReLU on large training sets when you are using the ReLU activation function and you notice that during training, some neurons effectively die, meaning that they stop outputting anything other than 0. In some cases, you may fund that half of all your network's neurons are dead, especially if you need to use a large learning rate. A neuron dies when its weights get tweaked in a way that the weighted sum of its inputs are negative for all instances in the training set. Ehn this happens, it just keeps outputting 0s, and gradient descent does not affect it anymore since the gradient of the ReLU function is 0 when its input is negative.

The vanishing gradients problem, where gradients get smaller and smaller as the algorithm progresses down to the lower layers, might be most common in deep neural networks. The exploding gradients problem, in which the gradients can grow bigger and bigger, so many layers get insanely large weights and the algorithms diverges, is mostly encountered in recurrent neural networks. More generally, deep neural networks suffer from unstable gradients; different layers may learn at widely different speeds.

In general SELY > ELU > leaky ReLU (and its variants) > tanh > logistic. If the network's architecture prevents it from self-normalizing, then ELU may perform better than SELU (since SELU is not smooth at z=0z =0z=0 ). If you care about runtime latency, then you may prefer leaky ReLU. If you do not want to tweak yet another hyperparameter, you may just use the default α\alphaα values used by Keras. If you have spare time and computing power, you can use cross-validation to evaluate other activation functions, in particular RReLU if your network is overfitting, or PReLU if you have a huge training set.

Question 5

What may happen if you set the momentum hyperparameter too close to 1 (e.g., 0.99999) when using an SGD optimizer?

If you set the momentum hyperparameter too close to 1 (e.g., 0.99999) when using an SGD optimizer, then the algorithm will likely pick up a lot of speed, hopefully moving roughly toward the global minimum, but its momentum will carry it right past the minimum. Then it will slow down and come back, accelerate again, overshoot again, and so on. It may oscillate this way many times before converging, so overall it will take much longer to converge than with a smaller momentum value.

Question 6

Name three ways you can produce a sparse model.

  1. Train the model as usual, then get rid of the tiny weights by setting them to 0. This will not typically lead to a very sparse model, and it may degrade the model's performance.
  2. Apply a strong 1\ell _11 regularization during training, as it pushes the optimizer to zero out as many weights as it can.
  3. Apply Dual Averaging, often called Follow the Regularized Leader (FTRL), a technique proposed by Yuri Neseterov, When used with 1\ell _11 regularization, this technique often leads to very sparse models. Keras implements a variant of FTRL called FTRL Proximal in the FTRL optimizer.

Question 7

Does dropout slow down training? Does it slow down inference (i.e., making predictions on new instances)? What are about MC dropout?

Yes, dropout does slow down training, in general roughly by a factor of two. However, it has no impact on inference speed since it is only turned on during training. MC Dropout is exactly like dropout during training, but it is still active during inference, so each inference is slowed down slightly. More importantly, when using MC Dropout you generally want to run inference 10 times or more to get better predictions. This means that making predictions is slowed down by a factor of 10 or more.

Question 8

Deep Learning

  • Build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function.

  • Using Adam optimization and early stopping, try training it on MNIST but only on digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You will need a softmax output layer with five neurons, and as always make sure to save checkpoints at regular intervals and save the final model so you can reuse it later.

  • Tune the hyperparameters using cross-validation and see what precision you can achieve.

  • Now try adding Batch Normalization and compare the learning curves: is it converging faster than before? Does it produce a better model?

  • Is the model overfitting the training set? Try adding dropout to every layer and try again. Does it help?

import tensorflow as tf
from tensorflow import keras
import numpy as np 
import pandas as pd 

# Load Data
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

print("X_train Shape:",X_train.shape)
print("y_train Shape:",y_train.shape)
print("X_test Shape:",X_test.shape)
print("y_test Shape:",y_test.shape)

y_train_numeric, y_test_numeric = y_train.astype(np.int64), y_test.astype(np.int64)

zero_to_four_train_indices, zero_to_four_test_indices = np.nonzero(y_train_numeric <= 4)[0], np.nonzero(y_test_numeric <= 4)[0]

five_to_nine_train_indices, five_to_nine_test_indices = np.nonzero(y_train_numeric >= 5)[0], np.nonzero(y_test_numeric >= 5)[0]

print("\nEnsuring Correct Splitting of Data:\n-------------------------------------------------")
print("Train Indices Correct:",zero_to_four_train_indices.shape[0] + five_to_nine_train_indices.shape[0] == y_train.shape[0])
print("Test Indices Correct:",zero_to_four_test_indices.shape[0] + five_to_nine_test_indices.shape[0] == y_test.shape[0])


# Get Data for Questin 8
X_train_0_4, X_test_0_4 = X_train[zero_to_four_train_indices,:,:], X_test[zero_to_four_test_indices,:,:]
y_train_0_4, y_test_0_4 = y_train[zero_to_four_train_indices], y_test[zero_to_four_test_indices]

print("y_train_0_4 Unique Values:",pd.Series(np.concatenate((y_train_0_4,y_test_0_4),axis=0)).unique())


# Get Data for Question 9
X_train_5_9, X_test_5_9 = X_train[five_to_nine_train_indices,:,:], X_test[five_to_nine_test_indices,:,:]
y_train_5_9, y_test_5_9 = y_train[five_to_nine_train_indices], y_test[five_to_nine_test_indices]

print("y_train_5_9 Unique Values:",pd.Series(np.concatenate((y_train_5_9,y_test_5_9),axis=0)).unique())

import matplotlib.pyplot as plt 

fig, ax = plt.subplots(1,2,layout="constrained")

ax[0].imshow(X_train_0_4[0,:,:],cmap="gray")
ax[0].set_title("0 to 4 Training Set")
ax[1].imshow(X_train_5_9[0,:,:],cmap="gray")
ax[1].set_title("5 to 9 Training Set")
plt.show()

out[10]

X_train Shape: (60000, 28, 28)
y_train Shape: (60000,)
X_test Shape: (10000, 28, 28)
y_test Shape: (10000,)

Ensuring Correct Splitting of Data:
-------------------------------------------------
Train Indices Correct: True
Test Indices Correct: True
y_train_0_4 Unique Values: [0 4 1 2 3]
y_train_5_9 Unique Values: [5 9 6 7 8]

Jupyter Notebook Image

<Figure size 640x480 with 2 Axes>

import keras_tuner as kt

"""
Since the outvalue values are not one hot encoded
"""
loss = tf.keras.losses.SparseCategoricalCrossentropy()
metrics = ['accuracy']

def model_builder(hp):
    """
    Build a model with hyperparameter tuning
    """
    keras.layers.Input(X_train_0_4.shape[1:])
    model = tf.keras.Sequential()
    model.add(keras.layers.Lambda(lambda x: x / 255)) # Scale the input
    model.add(keras.layers.Flatten(name="Flatten")) # Flatten the input
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden1_0_4"))
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden2_0_4"))
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden3_0_4"))
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden4_0_4"))
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden5_0_4"))
    # Classifying Images 0-4
    model.add(keras.layers.Dense(5, activation="softmax",name="Output"))
    """
    Suggested Learning rates
    """
    hp_learning_rate = hp.Choice('learning_rate',values=[1e-2,1e-3,1e-4])
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),loss=loss,metrics=metrics)
    return model


import os
"""
The `fit()` method accepts  a `callback` argument taht lets you specify a list of objects that Keras will call during training at the start and end f training, at the start and end of each epoch and even before and after processing each batch. The checkpoint_cb below saaves checkpoints of the model at regular intervals during training, by default at the end of each epoch.
"""
sve_model_cb = keras.callbacks.ModelCheckpoint(filepath=os.path.join(os.getcwd(),'mnist_model_0_4.keras'),save_best_only=True,verbose=1)


"""
Visualization Using Tensorboard
--------------------------------------------
> TensorBoard is a great interactive visualization tool that you can use to view the learning curves during training, compare learning curves between multiple runs, vis ualize the computation graph, analyze training statistics, view images generated by your model, visualize complex multidimensional data projected down to 3D and automatically clustered for you, and more! This tool is installed automatically when you install TensorFlow, so you already have it.
To use it, you must modify your program so that it outputs the data that you wantto visualize to special binary log files called event files. Each binary record is called a summary. TensorBoard server will monitor the log directory, and it will automatically pick up the changes and update the visualizations: this allows you to visualize live data (with a short delay), such as the learning curves during training.
"""
root_logdir = os.path.join(os.getcwd(), "ch11_mnist_0_4_logs")
def get_run_logdir():
 """
 Returns a string that represents a path to the log directory + the run id
 """
 import time
 run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
 return os.path.join(root_logdir, run_id)
run_logdir = get_run_logdir()

tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

"""
> if you use a validation set during training, you can set save_best_only=True when creating the ModelCheckpoint. In this case, it will only save your model when its performance on the validation set is the best so far. This way, you do not need to worry about training for too long and overfitting the training set: simply restore the last model saved after training, and this will be the best model on the validation set. This is a simple way to implement early stopping
"""
early_stopping_cb = keras.callbacks.EarlyStopping(patience=10,restore_best_weights=True)

tuner = kt.Hyperband(model_builder,objective="val_accuracy",max_epochs=10)
tuner.search(X_train_0_4,y_train_0_4,epochs=100,validation_split=0.2,callbacks=[early_stopping_cb,tensorboard_cb,sve_model_cb])
out[11]

Trial 3 Complete [00h 00m 06s]
val_accuracy: 0.9851307272911072

Best val_accuracy So Far: 0.9851307272911072
Total elapsed time: 00h 00m 17s

best_hyperparameters = tuner.get_best_hyperparameters(num_trials=1)[0]
out[12]
model = tuner.hypermodel.build(best_hyperparameters)
history = model.fit(X_train_0_4,y_train_0_4,epochs=100,validation_split=0.2,callbacks=[early_stopping_cb,tensorboard_cb,sve_model_cb])
out[13]

Epoch 1/100
759/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9415 - loss: 0.1935
Epoch 1: val_loss improved from inf to 0.05218, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\mnist_model_0_4.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.9418 - loss: 0.1928 - val_accuracy: 0.9837 - val_loss: 0.0522
Epoch 2/100
760/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9826 - loss: 0.0572
Epoch 2: val_loss did not improve from 0.05218
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9826 - loss: 0.0572 - val_accuracy: 0.9802 - val_loss: 0.0570
Epoch 3/100
740/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9867 - loss: 0.0383
Epoch 3: val_loss improved from 0.05218 to 0.04466, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\mnist_model_0_4.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9866 - loss: 0.0385 - val_accuracy: 0.9864 - val_loss: 0.0447
Epoch 4/100
755/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9931 - loss: 0.0242
Epoch 4: val_loss improved from 0.04466 to 0.04189, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\mnist_model_0_4.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9931 - loss: 0.0243 - val_accuracy: 0.9876 - val_loss: 0.0419
Epoch 5/100
748/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9899 - loss: 0.0295
Epoch 5: val_loss improved from 0.04189 to 0.04088, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\mnist_model_0_4.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9899 - loss: 0.0295 - val_accuracy: 0.9886 - val_loss: 0.0409
Epoch 6/100
761/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9931 - loss: 0.0216
Epoch 6: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9931 - loss: 0.0216 - val_accuracy: 0.9840 - val_loss: 0.0594
Epoch 7/100
746/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9940 - loss: 0.0196
Epoch 7: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9940 - loss: 0.0196 - val_accuracy: 0.9892 - val_loss: 0.0425
Epoch 8/100
762/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9966 - loss: 0.0139
Epoch 8: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9966 - loss: 0.0140 - val_accuracy: 0.9887 - val_loss: 0.0450
Epoch 9/100
763/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9941 - loss: 0.0178
Epoch 9: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9941 - loss: 0.0178 - val_accuracy: 0.9886 - val_loss: 0.0423
Epoch 10/100
760/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9958 - loss: 0.0133
Epoch 10: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9958 - loss: 0.0132 - val_accuracy: 0.9879 - val_loss: 0.0543
Epoch 11/100
747/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9949 - loss: 0.0201
Epoch 11: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9949 - loss: 0.0200 - val_accuracy: 0.9899 - val_loss: 0.0413
Epoch 12/100
745/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9970 - loss: 0.0102
Epoch 12: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9970 - loss: 0.0103 - val_accuracy: 0.9891 - val_loss: 0.0543
Epoch 13/100
747/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9955 - loss: 0.0148
Epoch 13: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9955 - loss: 0.0148 - val_accuracy: 0.9902 - val_loss: 0.0431
Epoch 14/100
758/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9980 - loss: 0.0055
Epoch 14: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9980 - loss: 0.0055 - val_accuracy: 0.9905 - val_loss: 0.0412
Epoch 15/100
747/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9968 - loss: 0.0119
Epoch 15: val_loss did not improve from 0.04088
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9967 - loss: 0.0120 - val_accuracy: 0.9866 - val_loss: 0.0739

import pandas as pd
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(-0.1, 1.1) # set the vertical range to [0-1]
plt.gca().set_title("Model History wthout Batch Normalization After Hyperparamter Tuning")
plt.show()
out[14]
Jupyter Notebook Image

<Figure size 800x500 with 1 Axes>

print(best_hyperparameters)
out[15]

<keras_tuner.src.engine.hyperparameters.hyperparameters.HyperParameters object at 0x0000022337FFEDD0>

def model_builder_2(hp):
    """
    Build a model with hyperparameter tuning - including dropout and batch normalization
    """
    keras.layers.Input(X_train_0_4.shape[1:])
    model = tf.keras.Sequential()
    model.add(keras.layers.Lambda(lambda x: x / 255)) # Scale the input
    model.add(keras.layers.Flatten(name="Flatten")) # Flatten the input
    if hp.Boolean("batch_norm_1"):
        model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden1_0_4"))
    if hp.Boolean("dropout_1"):
        model.add(keras.layers.Dropout(rate=0.2))
    if hp.Boolean("batch_norm_2"):
        model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden2_0_4"))
    if hp.Boolean("dropout_2"):
        model.add(keras.layers.Dropout(rate=0.2))
    if hp.Boolean("batch_norm_3"):
        model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden3_0_4"))
    if hp.Boolean("dropout_3"):
        model.add(keras.layers.Dropout(rate=0.2))
    if hp.Boolean("batch_norm_4"):
        model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden4_0_4"))
    if hp.Boolean("dropout_4"):
        model.add(keras.layers.Dropout(rate=0.2))
    if hp.Boolean("batch_norm_5"):
        model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Dense(100,kernel_initializer="he_normal",activation="elu",name="Hidden5_0_4"))
    if hp.Boolean("dropout_5"):
        model.add(keras.layers.Dropout(rate=0.2))
    # Classifying Images 0-4
    model.add(keras.layers.Dense(5, activation="softmax",name="Output"))
    """
    Suggested Learning rates
    """
    hp_learning_rate = hp.Choice('learning_rate',values=[1e-2,1e-3,1e-4])
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),loss=loss,metrics=metrics)
    return model


sve_model_cb_2 = keras.callbacks.ModelCheckpoint(filepath=os.path.join(os.getcwd(),'models','mnist_model_0_4v2.keras'),save_best_only=True,verbose=1)

root_logdir = os.path.join(os.getcwd(), "ch11_mnist_0_4_logs_v2")
def get_run_logdir():
 """
 Returns a string that represents a path to the log directory + the run id
 """
 import time
 run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
 return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()

tensorboard_cb_2 = keras.callbacks.TensorBoard(run_logdir)


tuner = kt.Hyperband(model_builder_2,objective="val_accuracy",max_epochs=10)
tuner.search(X_train_0_4,y_train_0_4,epochs=100,validation_split=0.2,callbacks=[early_stopping_cb,tensorboard_cb_2,sve_model_cb_2])
out[16]

Trial 30 Complete [00h 00m 26s]
val_accuracy: 0.9897058606147766

Best val_accuracy So Far: 0.991830050945282
Total elapsed time: 00h 05m 41s

best_hyperparameters = tuner.get_best_hyperparameters(num_trials=1)[0]
model = tuner.hypermodel.build(best_hyperparameters)
history = model.fit(X_train_0_4,y_train_0_4,epochs=100,validation_split=0.2,callbacks=[early_stopping_cb,tensorboard_cb_2,sve_model_cb_2])
out[17]

Epoch 1/100
755/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9081 - loss: 0.2646
Epoch 1: val_loss improved from inf to 0.05237, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.9087 - loss: 0.2630 - val_accuracy: 0.9840 - val_loss: 0.0524
Epoch 2/100
762/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9779 - loss: 0.0740
Epoch 2: val_loss improved from 0.05237 to 0.04764, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9779 - loss: 0.0740 - val_accuracy: 0.9842 - val_loss: 0.0476
Epoch 3/100
764/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9821 - loss: 0.0564
Epoch 3: val_loss improved from 0.04764 to 0.04191, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9821 - loss: 0.0564 - val_accuracy: 0.9889 - val_loss: 0.0419
Epoch 4/100
756/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9822 - loss: 0.0521
Epoch 4: val_loss did not improve from 0.04191
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9822 - loss: 0.0521 - val_accuracy: 0.9853 - val_loss: 0.0476
Epoch 5/100
753/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9869 - loss: 0.0387
Epoch 5: val_loss did not improve from 0.04191
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9868 - loss: 0.0388 - val_accuracy: 0.9871 - val_loss: 0.0423
Epoch 6/100
744/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9866 - loss: 0.0388
Epoch 6: val_loss improved from 0.04191 to 0.03638, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9866 - loss: 0.0388 - val_accuracy: 0.9904 - val_loss: 0.0364
Epoch 7/100
745/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9880 - loss: 0.0357
Epoch 7: val_loss did not improve from 0.03638
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9880 - loss: 0.0357 - val_accuracy: 0.9891 - val_loss: 0.0390
Epoch 8/100
749/765 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9892 - loss: 0.0315
Epoch 8: val_loss did not improve from 0.03638
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9892 - loss: 0.0316 - val_accuracy: 0.9891 - val_loss: 0.0426
Epoch 9/100
751/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9903 - loss: 0.0336
Epoch 9: val_loss did not improve from 0.03638
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9903 - loss: 0.0336 - val_accuracy: 0.9899 - val_loss: 0.0373
Epoch 10/100
750/765 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9905 - loss: 0.0254
Epoch 10: val_loss did not improve from 0.03638
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9905 - loss: 0.0255 - val_accuracy: 0.9892 - val_loss: 0.0376
Epoch 11/100
765/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9919 - loss: 0.0237
Epoch 11: val_loss did not improve from 0.03638
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9918 - loss: 0.0237 - val_accuracy: 0.9892 - val_loss: 0.0392
Epoch 12/100
763/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9908 - loss: 0.0295
Epoch 12: val_loss improved from 0.03638 to 0.03190, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9908 - loss: 0.0295 - val_accuracy: 0.9918 - val_loss: 0.0319
Epoch 13/100
743/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9916 - loss: 0.0238
Epoch 13: val_loss improved from 0.03190 to 0.03119, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9916 - loss: 0.0238 - val_accuracy: 0.9915 - val_loss: 0.0312
Epoch 14/100
750/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9917 - loss: 0.0231
Epoch 14: val_loss did not improve from 0.03119
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9917 - loss: 0.0231 - val_accuracy: 0.9905 - val_loss: 0.0353
Epoch 15/100
747/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9939 - loss: 0.0171
Epoch 15: val_loss did not improve from 0.03119
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9939 - loss: 0.0172 - val_accuracy: 0.9918 - val_loss: 0.0346
Epoch 16/100
759/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9944 - loss: 0.0177
Epoch 16: val_loss did not improve from 0.03119
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9944 - loss: 0.0177 - val_accuracy: 0.9907 - val_loss: 0.0441
Epoch 17/100
762/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9945 - loss: 0.0162
Epoch 17: val_loss improved from 0.03119 to 0.02794, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9945 - loss: 0.0163 - val_accuracy: 0.9907 - val_loss: 0.0279
Epoch 18/100
748/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9940 - loss: 0.0156
Epoch 18: val_loss improved from 0.02794 to 0.02757, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9941 - loss: 0.0156 - val_accuracy: 0.9928 - val_loss: 0.0276
Epoch 19/100
748/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9949 - loss: 0.0158
Epoch 19: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9949 - loss: 0.0158 - val_accuracy: 0.9913 - val_loss: 0.0404
Epoch 20/100
749/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9940 - loss: 0.0166
Epoch 20: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9940 - loss: 0.0165 - val_accuracy: 0.9912 - val_loss: 0.0335
Epoch 21/100
745/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9950 - loss: 0.0167
Epoch 21: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9950 - loss: 0.0166 - val_accuracy: 0.9931 - val_loss: 0.0278
Epoch 22/100
754/765 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9945 - loss: 0.0164
Epoch 22: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9945 - loss: 0.0164 - val_accuracy: 0.9922 - val_loss: 0.0346
Epoch 23/100
756/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9949 - loss: 0.0152
Epoch 23: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9948 - loss: 0.0152 - val_accuracy: 0.9923 - val_loss: 0.0331
Epoch 24/100
762/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9960 - loss: 0.0115
Epoch 24: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9960 - loss: 0.0115 - val_accuracy: 0.9936 - val_loss: 0.0306
Epoch 25/100
751/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9964 - loss: 0.0117
Epoch 25: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9964 - loss: 0.0117 - val_accuracy: 0.9933 - val_loss: 0.0299
Epoch 26/100
750/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9929 - loss: 0.0209
Epoch 26: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9929 - loss: 0.0208 - val_accuracy: 0.9913 - val_loss: 0.0339
Epoch 27/100
757/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9972 - loss: 0.0092
Epoch 27: val_loss did not improve from 0.02757
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9972 - loss: 0.0092 - val_accuracy: 0.9930 - val_loss: 0.0306
Epoch 28/100
762/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9961 - loss: 0.0116
Epoch 28: val_loss improved from 0.02757 to 0.02610, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9960 - loss: 0.0116 - val_accuracy: 0.9943 - val_loss: 0.0261
Epoch 29/100
743/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9960 - loss: 0.0129
Epoch 29: val_loss improved from 0.02610 to 0.02401, saving model to c:\Users\fmb20\Desktop\CODE\machine-learning\machine_learning\answers\models\mnist_model_0_4v2.keras
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9960 - loss: 0.0129 - val_accuracy: 0.9935 - val_loss: 0.0240
Epoch 30/100
755/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9967 - loss: 0.0104
Epoch 30: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9967 - loss: 0.0105 - val_accuracy: 0.9930 - val_loss: 0.0297
Epoch 31/100
742/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9973 - loss: 0.0091
Epoch 31: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9973 - loss: 0.0092 - val_accuracy: 0.9930 - val_loss: 0.0336
Epoch 32/100
743/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9965 - loss: 0.0103
Epoch 32: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9965 - loss: 0.0104 - val_accuracy: 0.9928 - val_loss: 0.0299
Epoch 33/100
752/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9949 - loss: 0.0131
Epoch 33: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9949 - loss: 0.0130 - val_accuracy: 0.9935 - val_loss: 0.0297
Epoch 34/100
757/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9971 - loss: 0.0083
Epoch 34: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9971 - loss: 0.0083 - val_accuracy: 0.9912 - val_loss: 0.0406
Epoch 35/100
747/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9975 - loss: 0.0089
Epoch 35: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9975 - loss: 0.0089 - val_accuracy: 0.9935 - val_loss: 0.0241
Epoch 36/100
753/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9972 - loss: 0.0096
Epoch 36: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9972 - loss: 0.0097 - val_accuracy: 0.9928 - val_loss: 0.0375
Epoch 37/100
763/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9974 - loss: 0.0103
Epoch 37: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9973 - loss: 0.0103 - val_accuracy: 0.9905 - val_loss: 0.0399
Epoch 38/100
756/765 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9971 - loss: 0.0096
Epoch 38: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9971 - loss: 0.0096 - val_accuracy: 0.9933 - val_loss: 0.0365
Epoch 39/100
760/765 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9971 - loss: 0.0089
Epoch 39: val_loss did not improve from 0.02401
765/765 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9971 - loss: 0.0090 - val_accuracy: 0.9948 - val_loss: 0.0259

Question 9

Transfer learning.

  • Create a new DNN that reuses all the pretrained hidden layers of the previous model, freezes them, and replaces the softmax output layer with a new one.
  • Train this new DNN on digits 5 to 9, using only 100 images per digit, and time how long it takes. Despite this small number of examples, can you achieve high precision?
  • Try caching the frozen layers, and train the model again: how much faster is it now?
  • Try again reusing just four hidden layers instead of five. Can you achieve a higher precision?
  • Now unfreeze the top two hidden layers and continue training: can you get the model to perform even better?
model_0_4_load = keras.models.load_model(os.path.join(os.getcwd(),'models','mnist_model_0_4v2.keras'),safe_mode=False)
model_0_4_load_clone = keras.models.clone_model(model_0_4_load)
"""
Clone Model does not clone the weights
"""
model_0_4_load_clone.set_weights(model_0_4_load.get_weights())
layers = [
    model_0_4_load_clone.get_layer('Hidden1_0_4'),
    model_0_4_load_clone.get_layer('Hidden2_0_4'),
    model_0_4_load_clone.get_layer('Hidden3_0_4'),
    model_0_4_load_clone.get_layer('Hidden4_0_4')
]
model_5_9_on_0_4 = keras.models.Sequential()
model_5_9_on_0_4.add(keras.layers.Lambda(lambda x: x / 255,name="Scale_Inputs")) # Scale the input
model_5_9_on_0_4.add(keras.layers.Flatten(name="Flatten"))

for layer in layers:
    model_5_9_on_0_4.add(layer)
model_5_9_on_0_4.add(keras.layers.Dense(5, activation="softmax",name="Output_5_9"))
for layer in model_5_9_on_0_4.layers[:-1]:
    layer.trainable = False

model_5_9_on_0_4.compile(loss="binary_crossentropy", optimizer="sgd",metrics=["accuracy"])

sve_model_cb_2 = keras.callbacks.ModelCheckpoint(filepath=os.path.join(os.getcwd(),'models','mnist_model_5_9_v1.keras'),save_best_only=True,verbose=1)

root_logdir = os.path.join(os.getcwd(), "ch11_mnist_5_9_logs_v1")
def get_run_logdir():
 """
 Returns a string that represents a path to the log directory + the run id
 """
 import time
 run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
 return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()

tensorboard_cb_2 = keras.callbacks.TensorBoard(run_logdir)

loss = tf.keras.losses.SparseCategoricalCrossentropy()
metrics = ['accuracy']

model_5_9_on_0_4.compile(loss=loss,metrics=metrics,optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3))

"""
Generate a training set so that it contains 100 samples for each digit
"""
X_training_data_100_each = None
y_train_data_100_each = None

for i in range(5,10):
    y_train_indices = np.nonzero(y_train_5_9[y_train_5_9==i])[0]
    y_train_indices = y_train_indices[:100]
    y_train_add = y_train_5_9[y_train_indices]
    X_train_add = X_train_5_9[y_train_indices,:,:]
    if X_training_data_100_each is None:
       X_training_data_100_each = X_train_add
       y_train_data_100_each = y_train_add 
    else:
       X_training_data_100_each = np.vstack((X_training_data_100_each,X_train_add))
       y_train_data_100_each = np.concatenate((y_train_data_100_each,y_train_add),axis=0)

# history = model_5_9_on_0_4.fit(X_training_data_100_each,y_train_data_100_each,epochs=100,validation_split=0.2,callbacks=[early_stopping_cb,tensorboard_cb_2,sve_model_cb_2])
print("I can't figure out what is going wrong here. I am going to move on.")
out[19]

I can't figure out what is going wrong here. I am going to move on.