Pointer Networks
I am reading this paper because it was recommended as part of Ilya Sutskever's approx. 30 papers that he recommended to John Carmack to learn what really matters for machine learning / AI today. This paper introduces a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence.
Reference Link to PDF of Research Paper
IntrOduce a neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence to sequence and Neural Turing machines because the number of target classes in each step of the output depends on the length of the input, which is variable. This model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each encoding step, it uses attention as a pointer to select a member of the input sequence as an output. This architecture is called Pointer Net (Ptr-Net).
RNNs have been used for learning functions over sequences, but they have a constrained input and output sequence length. The recently introduced sequence-to-sequence paradigm removed these constraints by using one RNN to map an input sequence to an embedding and another RNN to map the embedding to an output sequence. The decoder was augmented by propagating extra contextual information from the input using a context-based attentional mechanism. These methods require the size of the output dictionary to be fixed. This paper removes this limitation by repurposing the attention mechanism to create pointers to input elements.
The main contributions of this work are the following;
- Proposal of new architecture, Pointer Net, which is simple and effective. It deals with the fundamental problem of representing variable length dictionaries by using a softmax probability distribution as a ”pointer”.
- The Pointer Net model is applied to three distinct non-trivial algorithmic problems involving geometry.
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.