TensorFlow Review

One of my goals for 2025 was to complete a machine learning / deep learning project every day. I missed the first three days after being a little wary about getting into Python after having not used it too much for data analytics in a few months (One of the reasons I created this goal was to keep up with my Python / data analytics skills). I am creating this Jupyter Notebook to review TensorFlow before getting back into doing deep learning problems consistently.

1 13

References

TensorFlow Basics

Overview

TensorFlow is an end-to-end platform for machine learning. It supports the following:

  • Multidimensional-array based numeric computation (similar to NumPy)
  • GPU and distributed processing
  • Automatic differentiation
  • Model construction, training, and export

Tensors

TensorFlow operates on multidimensional arrays or tensors represented as tf.Tensor objects.

import tensorflow as tf 
import numpy as np

x = tf.constant([np.arange(1,4),np.arange(4,7)],dtype=np.float64)
out[2]
print(x)
print(x.shape)
print(x.dtype)
out[3]

tf.Tensor(
[[1. 2. 3.]
[4. 5. 6.]], shape=(2, 3), dtype=float64)
(2, 3)
<dtype: 'float64'>

The most important attributes of a tf.Tensor are its shape and dtype:

  • Tensor.shape: tells you the size of the tensor along each of its axes
  • Tensor.dtype: tells you the type of all the elements in the tensor

Tensor implements standard mathematical operations on tensors, as well as many operations specialized for machine learning.

print(x+x)
print(5*x)
print(x@tf.transpose(x))
print(tf.concat([x,x,x],axis=0))
print(tf.nn.softmax(x,axis=-1))
print(tf.reduce_sum(x))
print(tf.convert_to_tensor([1,2,3]))
print(tf.reduce_sum([1,2,3]))
if tf.config.list_physical_devices('GPU'):
  print("TensorFlow **IS** using the GPU")
else:
  print("TensorFlow **IS NOT** using the GPU")
out[5]

tf.Tensor(
[[ 2. 4. 6.]
[ 8. 10. 12.]], shape=(2, 3), dtype=float64)
tf.Tensor(
[[ 5. 10. 15.]
[20. 25. 30.]], shape=(2, 3), dtype=float64)
tf.Tensor(
[[14. 32.]
[32. 77.]], shape=(2, 2), dtype=float64)
tf.Tensor(
[[1. 2. 3.]
[4. 5. 6.]
[1. 2. 3.]
[4. 5. 6.]
[1. 2. 3.]
[4. 5. 6.]], shape=(6, 3), dtype=float64)
tf.Tensor(
[[0.09003057 0.24472847 0.66524096]
[0.09003057 0.24472847 0.66524096]], shape=(2, 3), dtype=float64)
tf.Tensor(21.0, shape=(), dtype=float64)
tf.Tensor([1 2 3], shape=(3,), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
TensorFlow **IS NOT** using the GPU

Variables

Normal tf.Tensor objects are immutable. To store model weights (or other mutable state) in TensorFlow, use a tf.Variable

var = tf.Variable([0]*3,dtype=np.float64)
print(var)
var.assign([1,2,3])
print(var)
var.assign_add([1,1,1])
print(var)
out[7]

<tf.Variable 'Variable:0' shape=(3,) dtype=float64, numpy=array([0., 0., 0.])>
<tf.Variable 'Variable:0' shape=(3,) dtype=float64, numpy=array([1., 2., 3.])>
<tf.Variable 'Variable:0' shape=(3,) dtype=float64, numpy=array([2., 3., 4.])>

Automatic Differentiation

Gradient Descent and related algorithms are a cornerstone of modern machine learning. To enable this, TensorFlow implements automatic differentiation (autodiff), which uses calculus to compute gradients. Typically you'll use this to calculate the gradient of a model's error or loss with respect to its weights

x = tf.Variable(1.0)

def f(x):
    y = x**2 +2*x - 5
    return y

f(x)
out[9]

<tf.Tensor: shape=(), dtype=float32, numpy=-2.0>

At x=1.0, y=f(x)=(1**2+2*1-5)=-2. The derivative of y is y'=f'(x)=(2*x+2)=4. TensorFlow can calculate this automatically:

with tf.GradientTape() as tape:
    y = f(x)

g_x = tape.gradient(y,x) # g(x) = dy/dx 
g_x
out[11]

<tf.Tensor: shape=(), dtype=float32, numpy=4.0>

This simplified example only takes the derivative with respect to a single scalar (x), but TensorFlow can compute the gradient with respect to any number of non-scalar tensors simultaneously.

Graphs and tf.function

While you can use TensorFlow interactively like any Python library, TensorFlow also provides tools for:

  • performance optimization to speed up training and inference
  • Export so you can save your model when it's done training

These require that you use tf.function to separate your pure-TensorFlow code from Python.

@tf.function
def my_func(x):
  print('Tracing.\n')
  return tf.reduce_sum(x)
out[13]

The first time your run the tf.function, although it executes in Python, it captures a complete, optimized graph representing the TensorFlow computations done within the function.

x = tf.constant([1,2,3])
my_func(x)
out[15]

Tracing.

<tf.Tensor: shape=(), dtype=int32, numpy=6>

On subsequent calls TensorFLow only executes the optimized graph, skipping any non-TensorFlow steps. Below, note that my_func doesn't print tracing since print is a Python function, not a TensorFlow function.

x = tf.constant([10, 9, 8])
my_func(x)
out[17]

<tf.Tensor: shape=(), dtype=int32, numpy=27>

A graph may not be reusable for inputs with a different signature (shape and dtype), so a new graph is generated instead:

x = tf.constant([10.0, 9.1, 8.2], dtype=tf.float32)
my_func(x)

These captured graphs provide two benefits:

  • In many cases, they provide a significant speedup in execution (though not in this trivial example)
  • You can export these graphs, using tf.saved_model, to run on other systems like a server or a mobile device, no Python installation required

Modules, Layers, and Models

tf.Module is a class for managing your tf.Variable objects, and the tf.function objects that operate on them. The tf.Module class is necessary to support two significant features:

  1. You can save and restore the values of your variables using tf.train.Checkpoint.This is useful during training as it is quick to save and restore a model's state.
  2. You can import and export the tf.Variable values and the tf.function graphs using tf.saved_model. This allows you to run your model independently of the Python program that created it.

Example of exporting as simple tf.Module object:

class MyModule(tf.Module):
    def __init__(self,value):
        self.weight = tf.Variable(value)
    
    @tf.function
    def multiply(self,x):
        return x * self.weight 

mod = MyModule(3)
mod.multiply(tf.constant([1,2,3]))
save_path = './saved'
tf.saved_model.save(mod,save_path)
out[21]

INFO:tensorflow:Assets written to: ./saved\assets

INFO:tensorflow:Assets written to: ./saved\assets

The tf.keras.layers.Layer and tf.keras.Model classes build on tf.Module providing additional functionality and convenience methods for building, training and saving models. Some of these are demonstrated in the next section.

Training Loops

Now put this all together to build a basic model and train it from scratch. First, create some example data. This generates a cloud of points that loosely follows a quadratic curve:

import matplotlib
from matplotlib import pyplot as plt 
matplotlib.rcParams['figure.figsize'] = [9,6]
x = tf.linspace(-2,2,201)
x = tf.cast(x,tf.float32)
def f(x):
    y = x**2 + 2*x - 5 
    return y 

y = f(x) + tf.random.normal(shape=[201])

plt.plot(x.numpy(),y.numpy(),'.',label='Data')
plt.plot(x,f(x),label='Ground Data')
plt.legend()
plt.show()
out[23]
Jupyter Notebook Image

<Figure size 900x600 with 1 Axes>

Create a ground model with randomly initialized weights and bias:

class Model(tf.Module):
    def __init__(self):
        # Randomly generate weight and bias terms 
        rand_init = tf.random.uniform(shape=[3],minval=0.,maxval=5.,seed=22)
        # Initialize model parameters 
        self.w_q = tf.Variable(rand_init[0])
        self.w_l = tf.Variable(rand_init[1])
        self.b = tf.Variable(rand_init[2])
    @tf.function 
    def __call__(self,x):
        # Quadratic Model: quadratic_weight * x^2 + linear_weight * x + bias 
        return self.w_q * (x**2) + self.w_l * x + self.b
    
quad_model = Model()
def plot_preds(x, y, f, model, title):
  plt.figure()
  plt.plot(x, y, '.', label='Data')
  plt.plot(x, f(x), label='Ground truth')
  plt.plot(x, model(x), label='Predictions')
  plt.title(title)
  plt.legend()
plot_preds(x, y, f, quad_model, 'Before training')

out[25]
Jupyter Notebook Image

<Figure size 900x600 with 1 Axes>

Define a loss for the model. Given that the model is intended to predict continuous values, the mean squared error (MSE) is a good choice for the loss function. Given a vector of predictions, y^\hat{y}y^, and a vector of true targets yyy, the MSE is defined as the mean of the squared differences between the predicted values and the ground truth.

MSE=1mi=1m(y^iyi)2\text{MSE}=\cfrac{1}{m}\sum_{i=1}^m (\hat{y}_i -y_i)^2MSE=m1i=1m(y^iyi)2

Write a basic training loop for the model. The loop will make use of the MSE loss function and its gradients with respect to the input in order to iteratively update the model's parameters. Using mini-batches for training provides both memory efficiency and faster convergence. The tf.data.Dataset API has useful functions for batching and shuffling.

def mse_loss(y_pred, y):
  return tf.reduce_mean(tf.square(y_pred - y))

batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices((x, y))
dataset = dataset.shuffle(buffer_size=x.shape[0]).batch(batch_size)

# Set training parameters
epochs = 100
learning_rate = 0.01
losses = []

# Format training loop
for epoch in range(epochs):
  for x_batch, y_batch in dataset:
    with tf.GradientTape() as tape:
      batch_loss = mse_loss(quad_model(x_batch), y_batch)
    # Update parameters with respect to the gradient calculations
    grads = tape.gradient(batch_loss, quad_model.variables)
    for g,v in zip(grads, quad_model.variables):
        v.assign_sub(learning_rate*g)
  # Keep track of model loss per epoch
  loss = mse_loss(quad_model(x), y)
  losses.append(loss)
  if epoch % 10 == 0:
    print(f'Mean squared error for step {epoch}: {loss.numpy():0.3f}')

# Plot model results
print("\n")
plt.plot(range(epochs), losses)
plt.xlabel("Epoch")
plt.ylabel("Mean Squared Error (MSE)")
plt.title('MSE loss vs training iterations')
plt.show()

plot_preds(x, y, f, quad_model, 'After training')
out[27]

Mean squared error for step 0: 55.909
Mean squared error for step 10: 9.697
Mean squared error for step 20: 3.922
Mean squared error for step 30: 1.939
Mean squared error for step 40: 1.250
Mean squared error for step 50: 1.012
Mean squared error for step 60: 0.936
Mean squared error for step 70: 0.904
Mean squared error for step 80: 0.892
Mean squared error for step 90: 0.888


Jupyter Notebook Image

<Figure size 900x600 with 1 Axes>

Jupyter Notebook Image

<Figure size 900x600 with 1 Axes>

That's working, but remember that implementations of common training utilities are available in the tf.keras module. So, consider using those before writing your own. To start with, the Model.compile and Model.fit methods implement a training loop for you:

Begin by creating a sequential model in Keras using tf.keras.Sequential. One of the simplest Keras layers is the dense layer, which can be instantiated with tf.keras.layers.Dense. The dense layer is able to learn multidimensional linear relationships of the form Y=WX+b\text{Y} = \text{W}\text{X} + \overrightarrow{b}Y=WX+b . In order to learn a nonlinear equation of the form w1x2+w2x+bw_1 x^2 + w_2 x + bw1x2+w2x+b the dense layer's input should be a data matrix with x2x^2x2 and xxx ad features. The lambda layer, tf.keras.layers.Lambda, can be used to perform this stacking transformation.

new_model = tf.keras.Sequential([
    tf.keras.layers.Lambda(lambda x: tf.stack([x, x**2], axis=1)),
    tf.keras.layers.Dense(units=1, kernel_initializer=tf.random.normal)])

new_model.compile(
    loss=tf.keras.losses.MSE,
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01))

history = new_model.fit(x, y,
                        epochs=100,
                        batch_size=32,
                        verbose=0)

new_model.save('./my_new_model.keras')

plot_preds(x, y, f, new_model, 'After Training: Keras')
out[29]
Jupyter Notebook Image

<Figure size 900x600 with 1 Axes>

Tensors

Tensors are multi-dimensional arrays with a uniform type (called a dtype). You can sell all supported dtypes at tf.dtypes. All tensors are immutable: you can never update the contents of a tensor, only create a new one.

  • A scalar is a rank-0 tensor. A scalar contains a single value and no axes
  • A vector is a rank-1 tensor and is like a list of values. It has one axis
  • A matrix is a rank-2 tensor and has two axes

Tensors may have more thantwo axes. You can convert a tensor to a NumPy array using the tensor.numpy() method.

Tensors often contain floats and ints, but have many other types, including:

  • complex numbers
  • strings

The base tf.Tensor class requires tensors to be "rectangular" - that is, along each axis, every element is the same size. However, there are specialized types of tensors that can handle different shapes:

  • Ragged tensors
  • Sparse tensors

You can do basic math on tensors, including addition, element-wise multiplication, and matrix multiplication.

rank_0_tensor = tf.constant(4) # Scalar
print(rank_0_tensor,rank_0_tensor.shape)
rank_1_tensor = tf.constant([2.0, 3.0, 4.0]) # Vector
print(rank_1_tensor,rank_1_tensor.shape)
rank_2_tensor = tf.constant([[1, 2],[3, 4],[5, 6]], dtype=tf.float16) # Matrix
print(rank_2_tensor,rank_2_tensor.shape)
print(type(rank_2_tensor.numpy()))
a = tf.constant([[1, 2],[3, 4]])
b = tf.constant([[1, 1],[1, 1]]) # Could have also said `tf.ones([2,2], dtype=tf.int32)`

print(tf.add(a, b), "\n")
print(tf.multiply(a, b), "\n")
print(tf.matmul(a, b), "\n")
print(a + b, "\n") # element-wise addition
print(a * b, "\n") # element-wise multiplication
print(a @ b, "\n") # matrix multiplication

# Tensors used in all kinds of operations 
c = tf.constant([[4.0, 5.0], [10.0, 1.0]])

# Find the largest value
print(tf.reduce_max(c))
# Find the index of the largest value
print(tf.math.argmax(c))
# Compute the softmax
print(tf.nn.softmax(c))
out[31]

tf.Tensor(4, shape=(), dtype=int32) ()
tf.Tensor([2. 3. 4.], shape=(3,), dtype=float32) (3,)
tf.Tensor(
[[1. 2.]
[3. 4.]
[5. 6.]], shape=(3, 2), dtype=float16) (3, 2)
<class 'numpy.ndarray'>
tf.Tensor(
[[2 3]
[4 5]], shape=(2, 2), dtype=int32)

tf.Tensor(
[[1 2]
[3 4]], shape=(2, 2), dtype=int32)

tf.Tensor(
[[3 3]
[7 7]], shape=(2, 2), dtype=int32)

tf.Tensor(
[[2 3]
[4 5]], shape=(2, 2), dtype=int32)

tf.Tensor(
[[1 2]
[3 4]], shape=(2, 2), dtype=int32)

tf.Tensor(
[[3 3]
[7 7]], shape=(2, 2), dtype=int32)

tf.Tensor(10.0, shape=(), dtype=float32)
tf.Tensor([1 0], shape=(2,), dtype=int64)
tf.Tensor(
[[2.6894143e-01 7.3105854e-01]
[9.9987662e-01 1.2339458e-04]], shape=(2, 2), dtype=float32)

print(tf.convert_to_tensor([1,2,3]))
print(tf.reduce_max([1,2,3]))
print(tf.reduce_max(np.array([1,2,3])))
out[32]

tf.Tensor([1 2 3], shape=(3,), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)

Tensors have shapes. Some vocabulary:

  • Shape: The length (number of elements) of each of the axes of a tensor
  • Rank: Number of tensor axes. A scalar has rank 0, a vector has rank 1, matrix 2
  • Axis or Dimension: A particular dimension of a tensor
  • Size: The total number of items in the tensor, the product of the shape of the vector's elements.

Tensors and tf.TensorShape objects have convenient properties for accessing these. While axes are often referred to by their indices, you should always keep track of the meaning of each. Often axes are ordered from global to local: The batch axis first, followed by spatial dimensions, and features for each location last. THis way feature vectors are contiguous regions of memory.

Typical Axis order

rank_4_tensor = tf.zeros([3, 2, 4, 5])
print("Type of every element:", rank_4_tensor.dtype)
print("Number of axes:", rank_4_tensor.ndim)
print("Shape of tensor:", rank_4_tensor.shape)
print("Elements along axis 0 of tensor:", rank_4_tensor.shape[0])
print("Elements along the last axis of tensor:", rank_4_tensor.shape[-1])
print("Total number of elements (3*2*4*5): ", tf.size(rank_4_tensor).numpy())
print("Rank:",tf.rank(rank_4_tensor))
print("Rank 4 Shape:",tf.shape(rank_4_tensor))
out[34]

Type of every element: <dtype: 'float32'>
Number of axes: 4
Shape of tensor: (3, 2, 4, 5)
Elements along axis 0 of tensor: 3
Elements along the last axis of tensor: 5
Total number of elements (3*2*4*5): 120
Rank: tf.Tensor(4, shape=(), dtype=int32)
Rank 4 Shape: tf.Tensor([3 2 4 5], shape=(4,), dtype=int32)

TensorFlow follows standard Python indexing rules, similar to indexing a list or string in Python and the basic rules for NumPy indexing:

  • indices start at 0
  • negative indices count backward from the end
  • colons, :, are used for slices: start:stop:step
# Shape returns a `TensorShape` object that shows the size along each axis
x = tf.constant([[1], [2], [3]])
print(x.shape)
# You can reshape a tensor to a new shape.
# Note that you're passing in a list
reshaped = tf.reshape(x, [1, 3])
print(x.shape)
print(reshaped.shape)
out[36]

(3, 1)
(3, 1)
(1, 3)

Variables

A TensorFlow variable is the recommended way to represent shared, persistent state your program manipulates. Variables are created and tracked via the tf.Variable class. A tf.Variable represents a tensor whose value can be changed by running ops on it. Specific ops allow you to read and modify the values of this tensor. Keras uses tf.Variable to store model parameters. Variables must be of the same dtype, like Tensors. Most tensor operations work on variables as expected, although variables cannot be reshaped. Variables are backed by tensors. If you use a variable like a tensor in operations, you will usually operate on the backing tensor. Creating new variables from existing variables duplicates the backing tensors. Two variables will not share the same memory. In Python-based TensorFlow, tf.Variable instance have the same lifecycle as other Python objects. When there are no references to a variable it is automatically deallocated. Variable names are preserved when saving and loading models. By default, variables in models will acquire unique variable names automatically, so you don;t need to assign them yourself unless you want to. Although variables are important for differentiation, some variables will not need to be differentiated. You can turn off gradients for a variable by setting trainable to false at creation. For better performance, TensorFlow will attempt to place tensors and variables on the fastest device compatible with its dtype. This means most variables are placed on a GPU if one is available.

my_tensor = tf.constant([[1.0, 2.0], [3.0, 4.0]])
my_variable = tf.Variable(my_tensor)

# Variables can be all kinds of types, just like tensors
bool_variable = tf.Variable([False, False, False, True])
complex_variable = tf.Variable([5 + 4j, 6 + 1j])

print("Shape: ", my_variable.shape)
print("DType: ", my_variable.dtype)
print("As NumPy: ", my_variable.numpy())

print("A variable:", my_variable)
print("\nViewed as a tensor:", tf.convert_to_tensor(my_variable))
print("\nIndex of highest value:", tf.math.argmax(my_variable))

# This creates a new tensor; it does not reshape the variable.
print("\nCopying and reshaping: ", tf.reshape(my_variable, [1,4]))

a = tf.Variable([2.0, 3.0])
# This will keep the same dtype, float32
a.assign([1, 2]) 
# Not allowed as it resizes the variable: 
try:
  a.assign([1.0, 2.0, 3.0])
except Exception as e:
  print(f"{type(e).__name__}: {e}")

a = tf.Variable([2.0, 3.0])
# Create b based on the value of a
b = tf.Variable(a)
a.assign([5, 6])

# a and b are different
print(a.numpy())
print(b.numpy())

# There are other versions of assign
print(a.assign_add([2,3]).numpy())  # [7. 9.]
print(a.assign_sub([7,9]).numpy())  # [0. 0.]

# Create a and b; they will have the same name but will be backed by
# different tensors.
a = tf.Variable(my_tensor, name="Mark")
# A new variable with the same name, but different value
# Note that the scalar add is broadcast
b = tf.Variable(my_tensor + 1, name="Mark")

# These are elementwise-unequal, despite having the same name
print(a == b)

step_counter = tf.Variable(1, trainable=False)
out[38]

Shape: (2, 2)
DType: <dtype: 'float32'>
As NumPy: [[1. 2.]
[3. 4.]]
A variable: <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[1., 2.],
[3., 4.]], dtype=float32)>

Viewed as a tensor: tf.Tensor(
[[1. 2.]
[3. 4.]], shape=(2, 2), dtype=float32)

Index of highest value: tf.Tensor([1 1], shape=(2,), dtype=int64)

Copying and reshaping: tf.Tensor([[1. 2. 3. 4.]], shape=(1, 4), dtype=float32)
ValueError: Cannot assign value to variable ' Variable:0': Shape mismatch.The variable shape (2,), and the assigned value shape (3,) are incompatible.
[5. 6.]
[2. 3.]
[7. 9.]
[0. 0.]
tf.Tensor(
[[False False]
[False False]], shape=(2, 2), dtype=bool)

Automatic Differentiation

Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the forward pass. Then, during the backward pass, TensorFlow traverses this list of operations in verse order to compute gradients. TensorFlow provides the tf.GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf.Variables. TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.Once you've recorded some operations, use GradientTape.gradient(target,sources) to calculate the gradient of some target (often a loss) relative to some source (often the model's variables):

x = tf.Variable(3.0)

with tf.GradientTape() as tape:
  y = x**2

# dy = 2x * dx
dy_dx = tape.gradient(y, x)
dy_dx.numpy()

# tf.GradientTape works easily on any tensor
w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1., 2., 3.]]

with tf.GradientTape(persistent=True) as tape:
  y = x @ w + b
  loss = tf.reduce_mean(y**2)
out[40]

To get the gradient of loss with respect to both variables, you can pass both as sources to the gradient method. The tape is flexible about how sources are passed and will accept any nested combination of lists and dictionaries and return the gradient structured the same way. The gradient with respect to each source has the same shape of the source.

[dl_dw, dl_db] = tape.gradient(loss, [w, b])
print(w.shape)
print(dl_dw.shape)

# Here is the gradient calculation again, this time passing a dictionary of variables
my_vars = {
    'w': w,
    'b': b
}

grad = tape.gradient(loss, my_vars)
grad['b']
out[42]

(3, 2)
(3, 2)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-2.944363 , 1.9557912], dtype=float32)>

It is common to collect tf.Variables into a tf.Module or one of its subclasses (layers.Layer, keras.Model) for checkpointing and exporting. In most cases, you will want to calculate gradients with respect to a models' trainable variables. Since all subclasses of tf.Module aggregate their variables in the Module.trainable_variables property, you can calculate these gradients in a few lines of code:

layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])

with tf.GradientTape() as tape:
  # Forward pass
  y = layer(x)
  loss = tf.reduce_mean(y**2)

# Calculate gradients with respect to every trainable variable
grad = tape.gradient(loss, layer.trainable_variables)

for var, g in zip(layer.trainable_variables, grad):
  print(f'{var.name}, shape: {g.shape}')
out[44]

kernel, shape: (3, 2)
bias, shape: (2,)

The default behavior is to record all operations after accessing a trainable tf.Variable. The reasons for this are:

  • The tape needs to know which operations to record in the forward pass to calculate the gradients in the backwards pass.
  • The tape holds references to intermediate outputs, so you don't want to record unnecessary operations
  • The most common use case involves calculating the gradient of a loss with respect to all a model's trainable parameters

You can list the variables being watched by the tape using the GradientTape.watched_variables method:

# A trainable variable
x0 = tf.Variable(3.0, name='x0')
# Not trainable
x1 = tf.Variable(3.0, name='x1', trainable=False)
# Not a Variable: A variable + tensor returns a tensor.
x2 = tf.Variable(2.0, name='x2') + 1.0
# Not a variable
x3 = tf.constant(3.0, name='x3')

with tf.GradientTape() as tape:
  y = (x0**2) + (x1**2) + (x2**2)

grad = tape.gradient(y, [x0, x1, x2, x3])

for g in grad:
  print(g)

print([var.name for var in tape.watched_variables()])
out[46]

tf.Tensor(6.0, shape=(), dtype=float32)
None
None
None
['x0:0']

Graphs and Functions

Graph execution enables portability outside Python and tends to offer better performance. Graph execution means that tensor computatuions are executed as a TensorFlow graph, sometimes referred to as a tf.Graph or simply a "graph". Graphs are data structures that contain a set of tf.Operation objects, which represent units of computation; and tf.Tensor objects, which represnet the units of data that flow between operations. They are defined in tf.Graph context. Since these graphs are data structures, they can be saved, run, and restored all without the original Python code. This is what a TensorFLow graph representing a two-layer neural network looks like when visualized in TensorBoard:

TensorFlow Graph Visualized

With a graph, you have a great deal of flexibility. You can use your TensorFLow graph in environments that don't have a Python interpreter, like mobile applications, embedded devices, and backend servers. TensorFlow uses graphs as the format for saved models when it exports them from Python.

Graphs can be easily optimized, allowing the compiler to do transformations like:

  • Statically infer the value of tensors by folding constant nodes in your computation ("constant folding")
  • Separate sub-parts of a computation that are independent and split them between threads or devices
  • Simplify arithmetic operations by eliminating common subexpressions

In short, graphs are extremely useful and let your TensorFlow run fast, run in parallel, and run efficiently on multiple devices.

You can create and run a graph in TensorFlow by using tf.function, either as a direct call or a decorator. A** PolymorphicFunction is a Python callable that builds TensorFlow graphs from the Python function. You use a tf.function in the same way as its Python equivalent.**

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC