Deep Learning with Python - Chapters 7 and 8
The chapters "Working with Keras: A Deep Dive" and "Introduction to Deep Learning for Computer Vision" go over the Keras library somewhat in-depth and give an introduction to convolutional neural nets for image classification respectively.
Working with Keras: A deep Dive
In this chapter, you'll get a complete overview of the key ways to work with Keras APIs.
The Keras API is guided by the principle of progressive disclosure of complexity: make it easy to get started, yet make it possible to handle high-complexity uses cases, only requiring incremental learning at each step. Simple uses cases should be easy and approachable, and arbitrarily advanced workflows should be possible: no matter how niche and complex the thing you want to do, there should be a clear path to it.
Three APIs for building models in Keras:
- The Sequential model: the most approachable API . It's limited to simple stacks of layers
- The Functional API - focuses on graph-like model architectures. It represents a nice mid-point between usability and flexibility, and as such, it's the most commonly use model-building API.
- Model subclassing, a low-elevl option where you write everything yourself from scratch. This is ideal f you want full control over every little thing.
"""
The Sequntial Model
"""
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.Dense(64,activation="relu"),
layers.Dense(10, activation="softmax")
])
"""
- Possible to build the same model sequentially
- The layers only get built when they are called for the first time dur to needing to know the shape of the layers' weights
"""
model = keras.Sequential()
model.add(layers.Dense(64,activation="relu"))
model.add(layers.Dense(10, activation="softmax"))
model.build(input_shape=(None,3))
print(model.weights) # Retrieving the model's weights
"""
After the model is built, you can display its contents via the summary() method
"""
print(model.summary())
"""
You can give names to everything in Keras - every model, every layer
"""
model = keras.Sequential(name="my_example_model")
model.add(layers.Dense(64,activation="relu",name="my_first_layer"))
model.add(layers.Dense(10, activation="softmax",name="my_second_layer"))
model.build(input_shape=(None,3))
print(model.summary())
"""
There is a way to have the Sequential model built on the fly: do this via the Input class
"""
model = keras.Sequential()
# Use th Input to declare the shape of the inputs. Note that the shape argument must be the shape of each sample, not the shape of one batch
model.add(keras.Input(shape=(3,)))
model.add(layers.Dense(64,activation="relu",name="a"))
model.add(layers.Dense(10, activation="softmax",name="b"))
print(model.summary())
"""
Most Keras models in the wild use the Fuunctional API (the Sequential API
is too simplistic to represent most models in the wild),
"""
# Start by declaring an input
# It holds information about the shape and dtype of the data that the model will produce
# Such an objetc is called a *symbolic tensor*. It doesn;t contain any actual data
# but it encode the specifications of the actual tensors of data that the model will see
# when you use it. It *stands for* future tensors of data
inputs = keras.Input(shape=(3,),name="my_input")
print("Input Shape:",inputs.shape)
print("Input dtype:",inputs.dtype)
# Create a lyaer and call it on the input
# All Keras layers can be called on real tensors of data and on symbolic tensors
# In the latter case, they return a new symbolic tensor, with updated shape and dtype information
features = layers.Dense(64,activation="relu")(inputs)
print("Features Shape:",features.shape)
# After obtaining the final outputs, we instantiated the mdoel by specifying
# its inputs and outputs in the `Model` constructor
outputs = layers.Dense(10,activation="softmax")(features)
model = keras.Model(inputs=inputs,outputs=outputs)
print(model.summary())
"""
- Most deep learning models don't look like lists - they look like graphs
- They have multiple inputs and outputs for instance
Examples of Multi-Input, Multi-Output Functional Model:
"""
vocabulary_size = 10_000
num_tags = 100
num_departments = 4
"""
Define the model inputs
"""
title = keras.Input(shape=(vocabulary_size,),name="title")
text_body = keras.Input(shape=(vocabulary_size,),name="text_body")
tags = keras.Input(shape=(num_tags,),name="tags")
# Combvine input features into a single tensor, features, by concatenating them
features = layers.Concatenate()([title, text_body, tags])
# Apply an intermediate layer to recombine input features into richer representations
features = layers.Dense(64,activation="relu")(features)
# Define the Model Outputs
priority = layers.Dense(1, activation="sigmoid", name="priority")(features)
department = layers.Dense(num_departments, activation="softmax",name="department")(features)
# Create the model by specifying its inputs and outputs
model = keras.Model(inputs=[title,text_body,tags], outputs=[priority,department])
"""
Training a Mult-Input, Multi-Output model
"""
import numpy as np
num_samples = 1280
# Dummy Input Data
title_data = np.random.randint(0,2, size=(num_samples, vocabulary_size))
text_body_data = np.random.randint(0, 2, size=(num_samples, vocabulary_size))
tags_data = np.random.randint(0, 2, size=(num_samples, num_tags))
# Dummy Target Data
priority_data = np.random.random(size=(num_samples, 1))
department_data = np.random.randint(0,2, size=(num_samples, num_departments))
model.compile(optimizer="rmsprop", loss=["mean_squared_error", "categorical_crossentropy"], metrics=[["mean_absolute_error"], ["accuracy"]])
model.fit([title_data, text_body_data, tags_data],[priority_data, department_data],epochs=1)
model.evaluate([title_data, text_body_data, tags_data],[priority_data, department_data])
priority_preds, department_preds = model.predict([title_data, text_body_data, tags_data])
# Visualize the connectivity of the model just defined - the *topology* of the
# model
# The None, in the tensor shapes represent the batch size: this model
# allows batches of any size
keras.utils.plot_model(model,"ticket_classifier.png",show_shapes=True)
Access to layer connectivity means that you can inspect and reuse individual nodes (layer calls) in the graph. The model.layers model property provides the list of layers that make up the model, and for each layer you can query layer.input and layer.output.
This enables you to do feature extraction: creating mdoels that reuse intermediate features of another model.
print(model.layers)
"""
Creating a new model by reusing intermediate layer outputs
"""
features = model.layers[4].output
difficulty = layers.Dense(3, activation="softmax", name="difficulty")(features)
new_model = keras.Model(
inputs=[title, text_body, tags],
outputs=[priority, department, difficulty]
)
# Plotting the model
keras.utils.plot_model(new_model, "updated_ticket_classifier.png",show_shapes=True)
The Model subclassing model is pretty similar to creating custom models.
- In the __init__() method, define the leayers the model will use
- In the call() method, define the forward pass of the model, reusing the layers previously created.
- Instantiate your subclass, and call it on data to create its weights.
What;s the difference between a Layer subclass and a Model subclass? A "layer" is a building block you can use to create models, and a "model" is the top-level object that you will actually tain, export for inference, etc. In short, a Model has fit(), evaluate(), and predict() methods. (you can also save a model).
Which Model to Use?
In general, the FUnctional APU provides you with a good trade-off between ease of use and flexibility. It also gives you direct access to layer connectivity, which is very powerful for use cases such as model plotting or feature extraction.
"""
A Simple subclassed model
"""
class CustomerTicketModel(keras.Model):
def __init__(self,num_departments):
super().__init__() # Don't forget to call super()
self.concat_layer = layers.Concatenate()
"""
Define the sublayers in the constructor
"""
self.mixing_layer = layers.Dense(64,activation="relu")
self.priority_scorer = layers.Dense(1,activation="sigmoid")
self.department_classifier = layers.Dense(num_departments, activation="softmax")
def call(self,inputs):
"""
Define the forward pass in the call() method
"""
title = inputs["title"]
text_body = inputs["text_body"]
tags = inputs["tags"]
features = self.concat_layer([title, text_body, tags])
features = self.mixing_layer(features)
priority = self.priority_scorer(features)
department = self.department_classifier(features)
return priority, department
model = CustomerTicketModel(num_departments=4)
priority, department = model({"title": title_data, "text_body": text_body_data, "tags": tags_data })
Using Built-in Training and Evaluation Loops
There are a couple ways that you can customize a simple workflow:
- Provide your own custom metrics
- Pass callbcks to the fit() method to schedule actions to be taken at specific points during training
A Keras metric is a subclass of the keras.metrics.Metric class. A metric has an internal state stored in TensorFlow variables.
"""
Example custom metric that measures RMSE
"""
import tensorflow as tf
class RootMeanSquaredError(keras.metrics.Metric): # Subclass the Metric class
def __init__(self,name="rmse",**kargs):
"""
Define the state variables in teh constructor. Like for layers, you have access to the add_weight() method
"""
super().__init__(name=name,**kargs)
self.mse_sum = self.add_weight(name="mse_sum",initializer="zeros")
self.total_samples = self.add_weight(name="total_samples",initializer="zeros",dtype="int32")
def update_state(self,y_true,y_pred,sample_weight=None):
"""
Implement the state update logic in update_state(). The y_true argument is the targtes (or labels) for one batch, while the y_pred represents the corresponding predictions form the model. You can ignore the sample_weight argument - we won't use it here.
To match out MNSIT model, we expect categorical predictions and integer labels
"""
y_true = tf.one_hot(y_true, depth=tf.shape(y_pred)[1])
mse = td.reduce_sum(tf.square(y_true - y_pred))
self.mse_sum.assign_add(mse)
num_samples = tf.shape(y_pred)[0]
self.total_samples.assign_add(num_samples)
def result(self):
"""
Use the `result()` method to return the current value of the metric
"""
return tf.sqrt(self.mse_sum / tf.cast(self.total_samples, tf.float32 ))
def reset_state(self):
"""
You need to expose a way to rest the metric ststae without having to reinstate it - this enables some metric objects to be used across different epocjs of training or across both training and evaluation. This is done with the reset_state() method.
"""
self.mse_sum.assign(0,)
self.total_samples.assign(0)
Using Callbacks
A callback is an object (a class instance implementing specific methods) that is passed to the model in the call to fit() and that is called by the model at various points duringtraining. It has acess to all the available data about the state of the model and its performance, and it can take action: interrupt training, save a model, load a different weight set, or otherwise alter the state of the model:
- Model Checkpointing: saving the current state of the model at different points during training
- Early Stopping: Interrupting training when the validation loss is no longer improving (and saving the best model obtained during training)
- Dynamically Adjusting the value of certain parameters during training:
- Logging training and validation metrics during training, or visualizing the representations learned by the model as they're updated: The fit() program progress bar is actually a callback. The fit() method takes a callbacks karg that accepts a list of callbacks.
Built in Callbacks:
- keras.callbacks.ModelCheckpoint
- Lets you continually save the model during training
- keras.callbacks.EarlyStopping
- Stop training when the validation loss is no longer improving
- The callback interrupts training once a target metric being monitored has stopped improving for a fixed number of epochs. Typicallu used in combination with ModelCheckpoint
- keras.callbacks.LearningRateScheduler
- keras.callbacks.ReduceLROnPlateau
- keras.callbacks.CSVLogger
- keras.callbacks.TensorBoard: use tensorboard, specify where to write logs
# Example list of callbacks
callbacks_list = [
keras.callbacks.EarlyStopping(
monitor="val_accuracy",
patience=2,
),
keras.callbacks.ModelCheckpoint(
filepath="checkpoint_path.keras",
monitor="val_loss",
save_best_only=True,
),
keras.callbacks.TensorBoard(
log_dir="/full_path_to_your_log_dir",
)
]
Writing Your Own Epochs
You can write your own callbacks. You need to subclass the keras.callbacks.Callback class. You can then implement any othe following methods which are called at various points during training:
- on_epoch_begin(epoch, logs): caleld at the start of every epoch
- on_epoch_end(epoch, logs): called at the end of every epoch
- on_batch_begin(batch, logs): called right before processing each batch
- on_batch_end(batch, logs): called right after processing each batch
- on_train_begin(logs): called at start of training
- on_train_end(logs): called at end of training
Add @tf.function before any function that you want to compile. Compiling your TensorFlow code into a computation graph that can be globally optimized in a way that code interpreted line by line cannot helps the code run faster.
Introduction to Deep Leaning for Computer Vision
Computer vision is the earliest and biggest success story of deep learning.
"""
Instantiating a small covnet
"""
from tensorflow import keras
from tensorflow.keras import layers
inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
print(model.summary())
"""
Training the covnet on MNIST images - 229
"""
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype("float32") / 255
model.compile(optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
model.fit(train_images, train_labels, epochs=5, batch_size=64)
"""
Evaluating the Covnet
"""
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc:.3f}")
The Convolutional Operation
The funamental difference between a densely connected layer and a convolutional layer is this: Dense layers learn global patterns in their input feature space (for example, for a MNIST digit, patterns involving all pixels), whereas convolutional layers learn local patterms - in the case of images, patterns found in small 2D windows of the inputs.
COvnet interesting properties:
- The patterns they learn are translation-invariant: After learning a pattern in the lower right hand corner of a picture, a covnet can recognize it anywhere: for example, in the upper right hadn corner. Covnet is efficient when processing images because the visual world is fundamentally translation invariant
- They have spacial hierarchies of power: First conv layer will learn local patterns such as edges, the second conv layer will learn larger patterns made up of features of the first layers, and so on. This allows covnets to efficiently learn increasingly complex and abstract visual concepts, becuase the visual world is fundamentally spatially hierarchal
Convolutions operate over rank-3 tensors called feature maps, with two spatial axes (height and width) as well as a depth axis (also called the channels axis). The convolutional operation extracts patches from its input feature map anbd applies the same transformation to all of these patches, producing an output feature map. The output feature map is still a rank-3 tensor and the different layers in its depth axis no longer stand for specific colors as in RGB input; rather, they stand fro filters. Filters encode specific aspects of the input data: a single filter could encode the conept "presence of a face of the input".
Each of the channels of the output of a filter is a feature map of the filter of the input, indicating the response of that filter pattern at different locations in the input.
That is what the term feature map means: every dimension in the depth axis is a feature (or filter), and the rank-2 tensor output[:, :, n] is the 2D spatial map of the response of this filter over the input.
- Covolutions are defined by (Conv2d(output_depth, (window_height, window_width)):
- Size of the patches extracted from the inputs: typically 3×3 or 5×5 . widow_height, widow_width
- Depth of the output feature map: the number of filters computed by the convolution. output_depth
A convolution works by sliding these windows of size 3×3 or 5×5 over the 3D input feature map, stopping at every possible location, and extracting the 3D patch of surrounding features (window_height, window_depth, input_depth). Each such 3D patch is then transformed into a 1D vector of shape (output_depth,), which is done via a tensor product with a learned wight matrix: called the convolutional kernel - the same kernel is reused across every path. The vectors are then spatially reassembled into a 3D output of shape (height, width, output_depth).
Padding consists of assing an appropriate number of rows and columns on each side of the input feature map so as to make it possible to fit center convolution windows around every input tile. In Conv2D layers, passing can be configured with the padding karg, which can be set to same meaning add no padding - only valid window locations will be used and same which means to "pad" in such a way as to have an output with the same width and height as the input.
The distance between two successive windows is a parameter of the convolution, called its stride, which defaults to 1. It's possible to have a strided convolution: convolutions with a stide higher than 1. Strided convolutions are rarely used in classification problems, but they come in handy in other types of problems. In classification problems, instead of stride, a max-pooling operation to downsample feature maps is used.
The rulw of max pooling layers is to agressively downsample feature maps, much like strided convolutions. Max pooling consists of extracting windows from their input feature maps and outputting the max value of each channel. A big difference from convolution is that max pooling is usually done with 2×2 windows and stride 2, in order to downsample the feature maps by a factor of 2, which convolution is typically done with 3×3 windows.
The reason to use downsampling is to reduce the number of feature-map coefficients to process, as well as to induce a spatial-filter hierarchy by making sucessive convolution layers look at increasing large windows. Features tend to encode the spatial prescence of some pattern or concept over the different tiles of the feature map (hence the term feature map), and it's more informative to look at the maximal presence of different features than to look at their average presence.
Deep learning mdels are repurposeable by nature: you can take an image-classification or speech0to-text model trained on a large-scsale dataset and reuse it on a significantly different problem with only minor changes. Specifically in the case of computer vision, many pretrained models are now publicly available for download and can be used to bootstrap powerful vision models out of very little data.
"""
1. Download the Kaggle key from Kaggle website (settings)
"""
from google.colab import files
files.upload()
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod kaggle.json ~/.kaggle/kaggle.json
!kaggle competitions download -c dogs-vs-cats
# Uncompress (unzip) the training data, which comes in a zip folder, silently (-qq)
!unzip -qq "/content/dogs-vs-cats.zip"
!unzip -qq "/content/test1.zip"
!unzip -qq "/content/train.zip"
"""
- Dogs vs Cats Dataset (https://www.kaggle.com/competitions/dogs-vs-cats/overview)
- The dataset contains 25_000 imagesw of dogs and cats (12_500 from each class) and is 543 MB (compressed)
"""
"""
Copying images to training, validation, and test directories
"""
import os, shutil, pathlib
original_dir = pathlib.Path("/content/train") # Path to the directory where the original dataset was uncompressed
new_base_dir = pathlib.Path("/content/cats_vs_dogs_small") # Directory where we will store our smaller dataset
def make_subset(subset_name, start_index, end_index):
"""
Utility function to copy cat and (and dog) images from index start_index to index end_index to the subdirectory new_base_dir/{subset_name}/cat|dog.
"""
for category in ("cat", "dog"):
dir = new_base_dir / subset_name / category
os.makedirs(dir)
fnames = [f"{category}.{i}.jpg" for i in range(start_index, end_index)]
for i, fname in enumerate(fnames):
shutil.copyfile(src=original_dir / fname, dst=dir / fname)
# Create a training subset with the first 1_000 images of each category
make_subset("train", start_index=0, end_index=1000)
# Create the validation subset with the next 500 images of each category
make_subset("validation", start_index=1000, end_index=1500)
# Create the test subset with the next 1_000 images of each category
make_subset("test", start_index=1500, end_index=2500)
from tensorflow import keras
from tensorflow.keras import layers
"""
The covnet will be a stack of alternetaed convolutional and MaxPooling laers
- The depth of the feature maps progressively increase in the model from 22 to 256, whereas the size of the feature map decreases from 180x180 to 7x7. This is a patterm you will see in almost all covnets
"""
# The model expects RGN images of size 180 x 180
inputs = keras.Input(shape=(180,180,3))
# Rescale inputs to [0,1] range by dividing them by 255
x = layers.Rescaling(1./255)(inputs)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
print(model.summary())
model.compile(loss="binary_crossentropy",optimizer="rmsprop",metrics=["accuracy"])
Data Prepocessing
The steps to convert jpg files to appropriate format:
- Read the picture files
- Decode the JPEG conetent to RGB grid of pixels
- Convert these into floating point tensors
- Resize them to a shared size (we use 180x180)
- Pack them into branches (we use batches of 32 images)
Keras has utilities to take care of these steps automatically. The utility function image_dataset_from_directory() kets you set up a quick datapipeline that can automatically turn image files on disk into batches of preprocessed tensors.
The Dataset object:
the Dataset object is an iterator. You can pass it directly into the fit() method of a Keras model. It handles many features that would be otherwise cumbersome to implement yourself - in particular, asynchronous data prefetching (preprocessing the next batch of data while the previous one is being handled by the model, which keeps execution flowing without interruptions).
from tensorflow.keras.utils import image_dataset_from_directory
train_dataset = image_dataset_from_directory(new_base_dir / "train",image_size=(180, 180),batch_size=32)
validation_dataset = image_dataset_from_directory(new_base_dir / "validation",image_size=(180, 180),batch_size=32)
test_dataset = image_dataset_from_directory(new_base_dir / "test",image_size=(180, 180),batch_size=32)
for data_batch, labels_batch in train_dataset:
print("data batch shape:",data_batch.shape)
print("labels batch shape:",labels_batch.shape)
break
# Save the model after each epoch
callbacks = [
keras.callbacks.ModelCheckpoint(
filepath="/content/models/convert_from_scrath.keras",
save_best_only=True,
monitor="val_loss" # Overwrite the current file when the current value of the val_loss metric is lower than at any previous time during training.
)
]
history = model.fit(train_dataset,epochs=30,validation_data=validation_dataset,callbacks=callbacks)
"""
Plot the loss and accuracy of the model over the training and validation data
"""
import matplotlib.pyplot as plt
accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(accuracy) + 1)
plt.plot(epochs, accuracy, "bo", label="Training accuracy")
plt.plot(epochs, val_accuracy, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()
The plots above are characteristic of overfitting. Because we have few training samples (2000, ), overfitting is a number one concern.
Overfitting is cause by having too few samples to learn from, rendering you unable to train a model that can generalize to new data. Data augmentation takes the approach of generating more trainingdata from existing training samples by augmenting the samples via a number of random transformations that yielf believable-looking images. In Keras, this can be done by adding a number of data augementation layers at the start of the model.
Note that data augmentation does not produce new information - it just remixes existing information. As such, it might not completely get rid of overfitting.
curr_model = "/content/models/convert_from_scrath.keras"
test_model = keras.models.load_model(curr_model)
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")
"""
Define data augementation stage to add to an image model
"""
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"), # Applies horizontal filliping to a random 50% of the images taht go through it
layers.RandomRotation(0.1), # Rotates the input images by a ranodm value in the range `[-10%, 10%]`
layers.RandomZoom(0.2), # Zoomns in or out of the image by a random factor in the range `[-20%, +20%]`
]
)
plt.figure(figsize=(10, 10))
for images, _ in train_dataset.take(1): # take(N) to only sample N batches from the dataset
for i in range(9):
augmented_images = data_augmentation(images) # Apply the augementation stage to the batch of images
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[0].numpy().astype("uint8"))# Display the first image in the output batch. For each of the nine iterations, this is a different augmentation of teh same image
plt.axis("off")
"""
Defining a new covnet that includes image augmentation and dropout
"""
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)
x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss="binary_crossentropy",
optimizer="rmsprop",
metrics=["accuracy"])
"""
Training the regularized covnet
"""
callbacks = [
keras.callbacks.ModelCheckpoint(
filepath="convnet_from_scratch_with_augmentation.keras",
save_best_only=True,
monitor="val_loss")
]
history = model.fit(
train_dataset,
epochs=100,
validation_data=validation_dataset,
callbacks=callbacks)
"""
Plotting the results again
"""
import matplotlib.pyplot as plt
accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(accuracy) + 1)
plt.plot(epochs, accuracy, "bo", label="Training accuracy")
plt.plot(epochs, val_accuracy, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()
"""
Evaluating the model on the test set
"""
test_model = keras.models.load_model(
"convnet_from_scratch_with_augmentation.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")
Leveraging a pretrained model
A common and highly effective approach to deep learning on small image datasets is to use a pretrained model. A pretrained model is a model that was previously trained on a large dataset, typically on a large-scale image-classification task.
If this original dataset is large enough and general enough, the spatial hierarchy of features learned by the pretrained model can effectively act as a generic model of the visual world, and hence, its features can prove useful for many different computer vision problems, even though these new problems may involve completely different classes than those of the original task.
There are two ways to use a pretrained model: feature extraction and fine-tuning.
Feature extraction consists of using the representations learned by a previously trained model to extract interesting features from new samples. These features are then run through a new classifier, which is trained from scratch. Covnets for image clasification typically are comprised of a series of pooling and convolutional layers and end with a densely connected classifier. The first part is called the convolutional base of the model. In the case of covnets, feature extraction consists of taking the convolutional base of apreviously trained network, running the new data through it, and training a new classifier on top of the output.
Representations learned by the convolutional base are likely to be more generic and therefore more reusable: the feature maps of a covnet are presence naps of generic concepts over a picture, which are likely to be useful regard;ess of the computer vision problem at hand.
The VCG16 model, the pretrained model that we will use, comes prepackaged with Keras. You can import it from the keras.application module.
"""
Instantiating the VCG16 convolutional base
"""
conv_base = keras.applications.vgg16.VGG16(
weights="imagenet", # specifies the weight checkpoint from which to initialize the model
include_top=False, # refers to including (or not) the densly connected classifier on top of the network
input_shape=(180, 180, 3)) # Shape of the image tensors taht we feed to the network - optional argument
print(conv_base.summary())
Two ways to proceed:
- Run the convolutional base over our dataset, record its output to a NumPy array on disk, and then use this data as input to a standalone, densely connected classifier similar to those you saw in chapter 4 of this book. This solution is fast and cheap to run, because it only requires running the convolutional base once for every input image, and the convolutional base is by far the most expensive part of the pipeline. But for the same reason, this technique won't allow us to use data augmentation.
- Extend the model we have (conv_base) by adding Dense layers on top, and run the whole thing from end to end on the input data. This will allow us to use data augmentation, because every input image goes through the convolutional base every time it's seen by the model. But for the same reason, this technique is far more expensive than the first.
"""
Exracting the CG16 features and corresponding labels
"""
import numpy as np
def get_features_and_labels(dataset):
all_features = []
all_labels = []
for images, labels in dataset:
preprocessed_images = keras.applications.vgg16.preprocess_input(images)
features = conv_base.predict(preprocessed_images)
all_features.append(features)
all_labels.append(labels)
return np.concatenate(all_features), np.concatenate(all_labels)
train_features, train_labels = get_features_and_labels(train_dataset)
val_features, val_labels = get_features_and_labels(validation_dataset)
test_features, test_labels = get_features_and_labels(test_dataset)
print(train_features.shape)
"""
Defining and traing the densely connetced classifier
"""
inputs = keras.Input(shape=(5, 5, 512))
# Note the use of the Flatten layer before passing the features to a Dense layer
x = layers.Flatten()(inputs)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
optimizer="rmsprop",
metrics=["accuracy"])
callbacks = [
keras.callbacks.ModelCheckpoint(
filepath="feature_extraction.keras",
save_best_only=True,
monitor="val_loss")
]
history = model.fit(
train_features, train_labels,
epochs=20,
validation_data=(val_features, val_labels),
callbacks=callbacks)
# Note that training is very fast because we only have to deal with two Dense layers - an epoch takes less than one second even on CPU
"""
Plotting the results
"""
import matplotlib.pyplot as plt
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, "bo", label="Training accuracy")
plt.plot(epochs, val_acc, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()
The technique above gets us an accuracy of 97% - much better than when we trained a small network from scratch. The plots indicate that we're overfitting form the start - that's because this technique doesn't use data augementation, which is essential for preventing overfitting with small image datasets.
Feature Extraction together With Data Augmentation
Creating a model that chains the conv_base with a new dense classifier, and training it end to end on the inputs. In order to do this, we will first freeze the convolutional base. Freezing a layer or set of layers means preventing the weights from being updated during training. If we don;t do this, then the point of using a pretrained model is destroyed.
Note that Feature Extraction together with data augmentation is a problem that is often intractable on CPUs.
"""
Instantianting and freezing the VCG16 cnvolutional base
"""
conv_base = keras.applications.vgg16.VGG16(weights="imagenet", include_top=False)
# Setting trainable to `False` empties the list of trainable weights ofthe layer or model
conv_base.trainable = False
"""
Printing the list of trainable weights before and after freezing
"""
conv_base.trainable = True
print("This is the number of trainable weights before freezing the conv base:", len(conv_base.trainable_weights))
conv_base.trainable = False
print("This is the number of trainable weights after freezing the conv base:", len(conv_base.trainable_weights))
"""
Adding a data augmentation stage and a classifier to a convolutional base
"""
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.2),
]
)
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs) # Appluy data augementation
x = keras.applications.vgg16.preprocess_input(x) # Apply input value scaling
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
optimizer="rmsprop",
metrics=["accuracy"])
callbacks = [
keras.callbacks.ModelCheckpoint(
filepath="feature_extraction_with_data_augmentation.keras",
save_best_only=True,
monitor="val_loss")
]
history = model.fit(
train_dataset,
epochs=50,
validation_data=validation_dataset,
callbacks=callbacks)
import matplotlib.pyplot as plt
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, "bo", label="Training accuracy")
plt.plot(epochs, val_acc, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()
test_model = keras.models.load_model(
"feature_extraction_with_data_augmentation.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")
Another widely used techniqye for model reuse, complementary to feature extraction, is fine-tuning. Fine tuning consists of unfreexing a few of the top layers of a frozen model base used for feature extraction, and jointly training both the newly added part of the model and these top layers. This is called fine-tuning because it slightly adjusts the mroe abstract representatons of the model being reused in order to make them more relevant for the problem at hand.
Steps for fine tuning network:
- Add your custom network on top of an already-trained base network.
- Freeze the base network.
- Train the part we added.
- Unfreeze some layers in the base network.
- Jointly train both these lauers and the part we just added.
Why not fine-tune the entire convolutional base?
- Earlier layers in the convolutional base encode more generic, reusable features, whereas layers higher up encode mroe specialized features. It's more useful to fine-tune the more specialized features, becuase these are the ones that need to be repurposed on the new problem
- The more parameters you're training, the more you're at risk of overfitting.
"""
Freexing all layers until the fourth fro the last
"""
conv_base.trainable = True
for layer in conv_base.layers[:-4]:
layer.trainable = False
"""
Fine tuning the model
"""
model.compile(loss="binary_crossentropy",
optimizer=keras.optimizers.RMSprop(learning_rate=1e-5), # Note that the reason for using a low learning rate s that we want to limit the magnitude of the modifications we make to the representations of the trhree layers we're fine-tuning
metrics=["accuracy"])
callbacks = [
keras.callbacks.ModelCheckpoint(
filepath="fine_tuning.keras",
save_best_only=True,
monitor="val_loss")
]
history = model.fit(
train_dataset,
epochs=30,
validation_data=validation_dataset,
callbacks=callbacks)
"""
Evaluate the model on the test data
"""
model = keras.models.load_model("fine_tuning.keras")
test_loss, test_acc = model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")