Order Matters: Sequence to Sequence for Sets

I am reading this paper because it was recommended as part of Ilya Sutskever's approx. 30 papers that he recommended to John Carmack to learn what really matters for machine learning / AI today. This paper shows that the order in which input/output data sequences are organized matters significantly when learning an underlying model.

Reference Link to PDF of Paper

DOWNLOAD TEX

Date Created: 48 18, 2024

Last Edited: 03 09, 2025

1 81

Sequences have become first class citizens in supervised learning tasks thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations can now be formulated with the sequence-to-sequence framework which employs the chain rule to efficiently represent the joint probability of sequences. In many cases, however, variable sized inputs and/or outputs might not be naturally expressed as sequences. This paper shows that the order in which we organize input and/or output data matters significantly when learning an underlying model. The paper then discusses an extension of the seq2seq framework that goes beyond sequences and handles input sets in a principled way. The paper also proposes a loss which, by searching over possible orders during trraining, deals with the lack of structure of output sets.

Approaches to sequence-to-sequence (seq2seq) modeling involve reading the input completely using an encoder, which is either an LSTM when the input is a sequence, or a convolutional neural network for images. The final state of the encoder is then fed to a decoder LSTM whose purpose is to produce the target sequence, one token at a time. How should we represent data - inputs or outputs - for problems where an obvious order cannot be determined. This paper shows that order matters, and that there might be a better ordering than the natural ordering of a sequence, and it proposes two approaches to consider sets either as inputs and/or outputs in models and evaluate how they perform on various artificial and real datasets.

Comments

User Comments

There are currently no comments for this article.