Neural Style Transfer

Going through some TensorFlow tutorials on style transfer to prepare to deploy it on a newly created Flask backend.

2 635

Neural Style Transfer

Going through this TensorFlow tutorial on style transfer to prepare to deploy it on a newly created Flask backend. I might also include the two other tutorials mentioned below to see the differences between them and the possible different applications.

This tutorial uses deep learning to compose one image in the style of another image [...]. This is known as neural style transfer and the technique is outlined in A Neural Algorithm of Artistic Style.

Other Style Transfer Examples

Fast Style Transfer for Arbitrary Styles
- Shows a simple application of style transfer with a pretrained model from TensorFlow
Artistic Style Transfer with TensorFlow Lite
- Shows an example of style transfer with TensorFlow lite

Neural style transfer is an optimization technique to take two images - a content image and a style reference image (such as an artwork by a famous painter) - and blend them together so the output image looks like the content image, but "painted" in the style of the style reference image.

"""
Import and Configure Modules
"""
import os
import tensorflow as tf
# Load compressed models from tensorflow_hub
os.environ['TFHUB_MODEL_LOAD_FORMAT'] = 'COMPRESSED'

import IPython.display as display

import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12, 12)
mpl.rcParams['axes.grid'] = False

import numpy as np
import PIL.Image
import time
import functools

def tensor_to_image(tensor):
  """
  Takes tensor, turns floating point numbers in range (0-1) into ints
  in the range 0-255. Makes sure the image is either a black and white
  or a color image with three channels. Returns
  """
  tensor = tensor*255
  tensor = np.array(tensor, dtype=np.uint8)
  if np.ndim(tensor)>3:
    assert tensor.shape[0] == 1
    tensor = tensor[0]
  """
  https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html#IPython.display.Image
  Creates a PNG/JPEG.GIF image object given raw data
  """
  return PIL.Image.fromarray(tensor)

"""
Download Images and choose a style image and a content image
"""
content_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
style_path = tf.keras.utils.get_file('kandinsky5.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg')

"""
Visualize the input

Define a function to load an image and limit its maximum dimension to 512 pixels
"""
def load_img(path_to_img):
  max_dim = 512
  # Load Image and prepare it
  img = tf.io.read_file(path_to_img)
  img = tf.image.decode_image(img, channels=3)
  img = tf.image.convert_image_dtype(img, tf.float32)
  # Get the maximum dimension of width / height and dvide the max_dim by
  # it to get the scale
  shape = tf.cast(tf.shape(img)[:-1], tf.float32)
  long_dim = max(shape)
  scale = max_dim / long_dim

  new_shape = tf.cast(shape * scale, tf.int32)
  # Return newly scaled image
  img = tf.image.resize(img, new_shape)
  img = img[tf.newaxis, :]
  return img

def imshow(image, title=None):
  """
  Function to display an image
  """
  if len(image.shape) > 3:
    image = tf.squeeze(image, axis=0)

  plt.imshow(image)
  if title:
    plt.title(title)

content_image = load_img(content_path)
style_image = load_img(style_path)

plt.subplot(1, 2, 1)
imshow(content_image, 'Content Image')

plt.subplot(1, 2, 2)
imshow(style_image, 'Style Image')

out[2]

Fast Style Transfer Using TF-Hub

This tutorial demonstrates the original style-transfer algorithm, which optimizes the image content to a particular style. Before getting into the details, let's see how the TensorFlow Hub model does this:

import tensorflow_hub as hub
hub_model = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')
stylized_image = hub_model(tf.constant(content_image), tf.constant(style_image))[0]
tensor_to_image(stylized_image)

out[4]

<PIL.Image.Image image mode=RGB size=512x424>

Define Content and Style Representations

Use the intermediate layers of the model to get the content and style representations of the image. The first few layers of a CNN represent low-level features like edges and textures. The final few layers represent higher-level features - object parts like wheels or eyes. In this case, you are using the VCG19 network architecture, a pretrained image classification network.

"""
Load a VCG19 and test run it on our image to ensure it's used correctly.
"""
x = tf.keras.applications.vgg19.preprocess_input(content_image*255)
x = tf.image.resize(x, (224, 224))
vgg = tf.keras.applications.VGG19(include_top=True, weights='imagenet')
prediction_probabilities = vgg(x)
prediction_probabilities.shape

out[6]

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels.h5
[1m574710816/574710816[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 0us/step

TensorShape([1, 1000])

predicted_top_5 = tf.keras.applications.vgg19.decode_predictions(prediction_probabilities.numpy())[0]
[(class_name, prob) for (number, class_name, prob) in predicted_top_5]

out[7]

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
[1m35363/35363[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step

[('Labrador_retriever', 0.49317262),

('golden_retriever', 0.23665187),

('kuvasz', 0.036357313),

('Chesapeake_Bay_retriever', 0.024182774),

('Greater_Swiss_Mountain_dog', 0.018646035)]

"""
Now load a VCG19 without the classification head, and list the layer names
"""
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
for layer in vgg.layers:
  print(layer.name)

out[8]

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m80134624/80134624[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
input_layer_1
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_conv4
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_conv4
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_conv4
block5_pool

"""
Choose intermediate layers from the network to represent the stule and content
 of the image:
"""
content_layers = ['block5_conv2']

style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']

num_content_layers = len(content_layers)
num_style_layers = len(style_layers)

out[9]

Intermediate Layers for Style and Content

At a high level, in order for a network to perform image classification (which this network has been trained to do), it must understand the image. This requires taking the raw image as input pixels and building an internal representation that converts the raw image pixels into a complex understanding of the features present within the image. COnvolutional neural networks are able to generalize well due to their ability to capture the invarainces and defining featureswithin classes (e.g. dogs vs cats) that are agnostic to background noise and other nuisances. Somewhere between the upper and lower layers, the model serves as a complex feature extractor.

Build the Model

def vgg_layers(layer_names):
  """ Creates a VGG model that returns a list of intermediate output values."""
  # Load our model. Load pretrained VGG, trained on ImageNet data
  vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
  vgg.trainable = False

  outputs = [vgg.get_layer(name).output for name in layer_names]

  model = tf.keras.Model([vgg.input], outputs)
  return model

style_extractor = vgg_layers(style_layers)
style_outputs = style_extractor(style_image*255)

# To Create the Model:
#Look at the statistics of each layer's output
for name, output in zip(style_layers, style_outputs):
  print(name)
  print("  shape: ", output.numpy().shape)
  print("  min: ", output.numpy().min())
  print("  max: ", output.numpy().max())
  print("  mean: ", output.numpy().mean())

out[11]

block1_conv1
shape: (1, 336, 512, 64)
min: 0.0
max: 835.5255
mean: 33.97525
block2_conv1
shape: (1, 168, 256, 128)
min: 0.0
max: 4625.8867
mean: 199.82687
block3_conv1
shape: (1, 84, 128, 256)
min: 0.0
max: 8789.24
mean: 230.78099
block4_conv1
shape: (1, 42, 64, 512)
min: 0.0
max: 21566.133
mean: 791.24005
block5_conv1
shape: (1, 21, 32, 512)
min: 0.0
max: 3189.2532
mean: 59.179478

Calculate Style

The content of an image is represented by the values of the intermediate feature maps. The style of an image can be described by the means and correlations across the different feature maps. Calculate a Gram matrix that includes this information by taking the outer product of the feature vector with itslef at each location, and averaging that outer product over all the locations. The Gram matrix can be caluclated for a particular layer as:

G_{cd}^l=\cfrac{\sum_{ij}F_{ijc}^l(x)F_{ijd}^l(x)}{IJ}

This can be implemented concisely using the tf.linalg.einsum function:

def gram_matrix(input_tensor):
  result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
  input_shape = tf.shape(input_tensor)
  num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
  return result/(num_locations)

out[13]

Etxract Style and Content

Build a model that returns the style and content tensors

class StyleContentModel(tf.keras.models.Model):
  def __init__(self, style_layers, content_layers):
    super(StyleContentModel, self).__init__()
    self.vgg = vgg_layers(style_layers + content_layers)
    self.style_layers = style_layers
    self.content_layers = content_layers
    self.num_style_layers = len(style_layers)
    self.vgg.trainable = False

  def call(self, inputs):
    "Expects float input in [0,1]"
    inputs = inputs*255.0
    preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
    outputs = self.vgg(preprocessed_input)
    style_outputs, content_outputs = (outputs[:self.num_style_layers],
                                      outputs[self.num_style_layers:])

    style_outputs = [gram_matrix(style_output)
                     for style_output in style_outputs]

    content_dict = {content_name: value
                    for content_name, value
                    in zip(self.content_layers, content_outputs)}

    style_dict = {style_name: value
                  for style_name, value
                  in zip(self.style_layers, style_outputs)}

    return {'content': content_dict, 'style': style_dict}

"""
When called on an image, this model returns the gram matrix (style) of the style_layers and content of the content_layers
"""
extractor = StyleContentModel(style_layers, content_layers)

results = extractor(tf.constant(content_image))

print('Styles:')
for name, output in sorted(results['style'].items()):
  print("  ", name)
  print("    shape: ", output.numpy().shape)
  print("    min: ", output.numpy().min())
  print("    max: ", output.numpy().max())
  print("    mean: ", output.numpy().mean())
  print()

print("Contents:")
for name, output in sorted(results['content'].items()):
  print("  ", name)
  print("    shape: ", output.numpy().shape)
  print("    min: ", output.numpy().min())
  print("    max: ", output.numpy().max())
  print("    mean: ", output.numpy().mean())

out[15]

Styles:
block1_conv1
shape: (1, 64, 64)
min: 0.005522848
max: 28014.564
mean: 263.79025

block2_conv1
shape: (1, 128, 128)
min: 0.0
max: 61479.504
mean: 9100.95

block3_conv1
shape: (1, 256, 256)
min: 0.0
max: 545623.44
mean: 7660.9766

block4_conv1
shape: (1, 512, 512)
min: 0.0
max: 4320501.0
mean: 134288.86

block5_conv1
shape: (1, 512, 512)
min: 0.0
max: 110005.38
mean: 1487.0381

Contents:
block5_conv2
shape: (1, 26, 32, 512)
min: 0.0
max: 2410.8796
mean: 13.764152

Run Gradient Descent

With this style and content extractor, you can now implement the style transfer algorithm. Do this by calculating the mean square error for your image's output relative to each target, then take the weighted sum of these losses.

"""
Set your style and content target values
"""
style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']
"""
Define a tf.Variable to contain the image to optimize
"""
image = tf.Variable(content_image)
"""
Since this is a float image, define a function to keep the pixel values between
0 and 1
"""
def clip_0_1(image):
  return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
"""
Create an Optimizer
"""
opt = tf.keras.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
# Weighted combination of two losses to get the total loss
style_weight=1e-2
content_weight=1e4
def style_content_loss(outputs):
    style_outputs = outputs['style']
    content_outputs = outputs['content']
    style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2)
                           for name in style_outputs.keys()])
    style_loss *= style_weight / num_style_layers

    content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2)
                             for name in content_outputs.keys()])
    content_loss *= content_weight / num_content_layers
    loss = style_loss + content_loss
    return loss

@tf.function()
def train_step(image):
  """
  Use tf.GradientTape to update the image
  """
  with tf.GradientTape() as tape:
    outputs = extractor(image)
    loss = style_content_loss(outputs)

  grad = tape.gradient(loss, image)
  opt.apply_gradients([(grad, image)])
  image.assign(clip_0_1(image))

"""
Train a few steps to test:
"""
train_step(image)
train_step(image)
train_step(image)
tensor_to_image(image)

out[17]

<PIL.Image.Image image mode=RGB size=512x422>

"""
Since it appears to be working, run a longer optimization
"""
import time
start = time.time()

epochs = 10
steps_per_epoch = 100

step = 0
for n in range(epochs):
  for m in range(steps_per_epoch):
    step += 1
    train_step(image)
    print(".", end='', flush=True)
  display.clear_output(wait=True)
  display.display(tensor_to_image(image))
  print("Train step: {}".format(step))

end = time.time()
print("Total time: {:.1f}".format(end-start))

out[18]

<PIL.Image.Image image mode=RGB size=512x422>

Train step: 600
........................................................................................

Total Variance Loss

One downside to the basic implementation is that it produces a lot of high ferquency artifacts. Decrease these using an explicit regularization term on the high frequency components of the image. In style transfer, this is often called the total variance loss:

def high_pass_x_y(image):
  x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
  y_var = image[:, 1:, :, :] - image[:, :-1, :, :]

  return x_var, y_var
x_deltas, y_deltas = high_pass_x_y(content_image)

plt.figure(figsize=(14, 10))
plt.subplot(2, 2, 1)
imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Original")

plt.subplot(2, 2, 2)
imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Original")

x_deltas, y_deltas = high_pass_x_y(image)

plt.subplot(2, 2, 3)
imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Styled")

plt.subplot(2, 2, 4)
imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Styled")

out[20]

This shows how high frequency components have increased. This high frequency component is basically an edge detector. You can get similar output from the Sobel edge dector:

plt.figure(figsize=(14, 10))

sobel = tf.image.sobel_edges(content_image)
plt.subplot(1, 2, 1)
imshow(clip_0_1(sobel[..., 0]/4+0.5), "Horizontal Sobel-edges")
plt.subplot(1, 2, 2)
imshow(clip_0_1(sobel[..., 1]/4+0.5), "Vertical Sobel-edges")

out[22]

Rerun the optimization

total_variation_weight=30

@tf.function()
def train_step(image):
  with tf.GradientTape() as tape:
    outputs = extractor(image)
    loss = style_content_loss(outputs)
    loss += total_variation_weight*tf.image.total_variation(image)

  grad = tape.gradient(loss, image)
  opt.apply_gradients([(grad, image)])
  image.assign(clip_0_1(image))

# Reinitialize the image-variable and the optimizer
opt = tf.keras.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
image = tf.Variable(content_image)

# Run the optimization
import time
start = time.time()

epochs = 10
steps_per_epoch = 100

step = 0
for n in range(epochs):
  for m in range(steps_per_epoch):
    step += 1
    train_step(image)
    print(".", end='', flush=True)
  display.clear_output(wait=True)
  display.display(tensor_to_image(image))
  print("Train step: {}".format(step))

end = time.time()
print("Total time: {:.1f}".format(end-start))

# Save the result
file_name = 'stylized-image.png'
tensor_to_image(image).save(file_name)

try:
  from google.colab import files
  files.download(file_name)
except (ImportError, AttributeError):
  pass

out[24]

Fast Style Transfer for Arbitrary Styles

Based on the model code in and the publication .

Setup

Import TF2 and all relevant dependencies.

import functools
import os

from matplotlib import gridspec
import matplotlib.pylab as plt
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub

print("TF Version: ", tf.__version__)
print("TF Hub version: ", hub.__version__)
print("Eager mode enabled: ", tf.executing_eagerly())
print("GPU available: ", tf.config.list_physical_devices('GPU'))

# @title Define image loading and visualization functions  { display-mode: "form" }

def crop_center(image):
  """Returns a cropped square image."""
  shape = image.shape
  new_shape = min(shape[1], shape[2])
  offset_y = max(shape[1] - shape[2], 0) // 2
  offset_x = max(shape[2] - shape[1], 0) // 2
  image = tf.image.crop_to_bounding_box(
      image, offset_y, offset_x, new_shape, new_shape)
  return image

@functools.lru_cache(maxsize=None)
def load_image(image_url, image_size=(256, 256), preserve_aspect_ratio=True):
  """Loads and preprocesses images."""
  # Cache image file locally.
  image_path = tf.keras.utils.get_file(os.path.basename(image_url)[-128:], image_url)
  # Load and convert to float32 numpy array, add batch dimension, and normalize to range [0, 1].
  img = tf.io.decode_image(
      tf.io.read_file(image_path),
      channels=3, dtype=tf.float32)[tf.newaxis, ...]
  img = crop_center(img)
  img = tf.image.resize(img, image_size, preserve_aspect_ratio=True)
  return img

def show_n(images, titles=('',)):
  n = len(images)
  image_sizes = [image.shape[1] for image in images]
  w = (image_sizes[0] * 6) // 320
  plt.figure(figsize=(w * n, w))
  gs = gridspec.GridSpec(1, n, width_ratios=image_sizes)
  for i in range(n):
    plt.subplot(gs[i])
    plt.imshow(images[i][0], aspect='equal')
    plt.axis('off')
    plt.title(titles[i] if len(titles) > i else '')
  plt.show()

# @title Load example images  { display-mode: "form" }

content_image_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/Golden_Gate_Bridge_from_Battery_Spencer.jpg/640px-Golden_Gate_Bridge_from_Battery_Spencer.jpg'  # @param {type:"string"}
style_image_url = 'https://upload.wikimedia.org/wikipedia/commons/0/0a/The_Great_Wave_off_Kanagawa.jpg'  # @param {type:"string"}
output_image_size = 384  # @param {type:"integer"}

# The content image size can be arbitrary.
content_img_size = (output_image_size, output_image_size)
# The style prediction model was trained with image size 256 and it's the
# recommended image size for the style image (though, other sizes work as
# well but will lead to different results).
style_img_size = (256, 256)  # Recommended to keep it at 256.

content_image = load_image(content_image_url, content_img_size)
style_image = load_image(style_image_url, style_img_size)
style_image = tf.nn.avg_pool(style_image, ksize=[3,3], strides=[1,1], padding='SAME')
show_n([content_image, style_image], ['Content image', 'Style image'])

# Load TF Hub module.

hub_handle = 'https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2'
hub_module = hub.load(hub_handle)

out[26]

TF Version: 2.17.0
TF Hub version: 0.16.1
Eager mode enabled: True
GPU available: []
Downloading data from https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/Golden_Gate_Bridge_from_Battery_Spencer.jpg/640px-Golden_Gate_Bridge_from_Battery_Spencer.jpg
[1m71918/71918[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1us/step
Downloading data from https://upload.wikimedia.org/wikipedia/commons/0/0a/The_Great_Wave_off_Kanagawa.jpg
[1m2684586/2684586[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step

Import TF Hub Module

the signature of this hub module for image stylization is:

outputs = hub_module(content_image, style_image)
stylized_image = outputs[0]

out[28]

Where content_image, style_image, and stylized_image are expected to be 4-D Tensors with shapes [batch_size, image_height, image_width, 3]. In the current example we provide only single images and therefore the batch dimension is 1, but one can use the same module to process more images at the same time.The input and output values of the iamegs shiould be in the range [0, 1]. The shapes of content and style image don't have to match. Output images shape is the same as the content image shape.

Demonstrate Image Stylization

# Stylize content image with given style image.
# This is pretty fast within a few milliseconds on a GPU.

outputs = hub_module(tf.constant(content_image), tf.constant(style_image))
stylized_image = outputs[0]
# Visualize input images and the generated stylized image.

show_n([content_image, style_image, stylized_image], titles=['Original content image', 'Style image', 'Stylized image'])

out[30]

Artistic Style Transfer with TensorFlow Lite

One of the most exciting developments in deep learning to come out recently is artistic style transfer, or the ability to create a new image, known as a pastiche, ased on two input images: one representing the artistic style and one representing the context.

Using this technique, you can generate beautiful new artworks in a range of styles.

The TensorFlow Lite model is open-sourced on Github. You can retrain the model with different parameters (e.g. increase content layes' weoghts to make the output image look more like the content image). The Artistic Style Transfer model consists of two submodels:

Style Prediction Model: A MobilenetV2-based neural network that takes an input style image to a 100-dimension bottleneck vector.
Style Transform Model: A neual network that applies a style bottleneck vector to a content image and creates a stylized image.

User Comments

There are currently no comments for this article.