The Unreasonable Effectiveness of Recurrent Neural Networks

I am reading this paper because it was recommended as part of Ilya Sutskever's approx. 30 papers that he recommended to John Carmack to learn what really matters for machine learning / AI today. This blog post is about Karpathy sharing the "magic" of Recurrent Neural Networks (RNNs).

Reference Anrej Karpathy Blog Post

Date Created:
Last Edited:
1 17

0.1 References

  • Code on GitHub: Allows you to train character-level language models based on multi-layer LSTMs.

0.2 Notes

This post is about sharing the magic of Recurrent Neural Networks (RNNs).

A glaring limitation of Vanilla Neural Networks is that their API is too constrained: they accept a fixed-sized vector as input and produce a fixed-size vector as output. Not only that, these models perform this mapping using a fixed amount of computational steps. RNNs are more exciting because they allow us to operate over sequences of vectors. Sequences in the input, the output, or in the most general case both.

PIC

Each Rectangle in the image above represents functions (matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN’s state. From left to right of the image: The vanilla mode of processing without RNN, form fixed size input to fixed sized output (image classification); Sequence output (image captioning takes an image and outputs a sentence of words); Sequence input (sentiment analysis where a given sentence is classified as expressing positive or negative sentiment); 4. Sequence Input and Sequence Output (Machine Translation); 5. Synced sequence input and output (video classification where we wish to label each frame of the video)

RNNs combine the input vector with their state vector with a fixed (but learned) function to produce a new state vector. Even if your data is not in the form of sequences, you can still formulate and train powerful models that learn to process it sequentially. At the core, RNNs have a deceptively simple API: They accept an input vector x and give you an output vector y. However, this output vector’s contents are influenced not only by the input you just fed in, but also the entire history of inputs you’ve fed in the past.

rnn = RNN() 
y = rnn.step(x)

The RNN class has some internal state that it gets to update every time step is called. In the simplest case, this state consists of a single hidden vector h. Here is the implementation of the step function in a Vanilla RNN:

class RNN: 
   # ... 
   def step(self, x): 
       # update the hidden state 
       self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x)) 
       # compute the output vector 
       y = np.dot(self.W_hy, self.h) 
     return y

The above specifies the forward pass of a vanilla RNN. This RNN’s parameters are three matrices W_hh, W_xh, W_hy. The hidden state self.h is initialized with a zero vector. The tanh function implements a non-linearity that squashes the activations to the range [-1, 1]. We initialize the matrixes of the RNN with random numbers and the bulk of work during training goes into finding the matrices that give rise to desirable behavior, as measured with some loss function that expresses your preference to what kinds of outputs y you’d like to see to your input sequences x. RNNs are neural networks and everything works monotonically better if you start stacking models - the output of 1 RNN goes on to be the input of another RNN. The Long ShortTerm Memory (LSTM) network is a particular type of recurrent network that works slightly better in practice, owing to its more powerful update equation and brackpropogation dynamics.

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC