Jacob Devlin - RobustFill: Neural Program Learning under Noisy I/O (2017)

History / Edit / PDF / EPUB / BIB /
Created: April 22, 2017 / Updated: November 2, 2024 / Status: finished / 4 min read (~617 words)
Machine learning

  • What is so special about GetSpan() that it is used to solve about 20% of the test instances?

  • 2 competing approaches for automatic program learning have received significant attention
    • Neural program synthesis, where a neural network is conditioned on input/output examples and learns to generate a program
    • Neural program induction, where a neural network generates new outputs directly using a latent program representation

  • The primary task evaluated for this work is a Programming By Example (PBE) system for string transformations similar to FlashFill
  • We develop a novel variants of the attentional RNN architecture to encode a variable-length unordered set of input-output examples
  • For program representation, we have developed a domain-specific language (DSL) that defines an expressive class of regular expression-based string transformations
  • The neural network is then used to generate a program in the DSL (for synthesis) or an output string (for induction)

  • A set of input-output string examples (I1,O1),,(In,On)
  • A set of input strings Iy1,,Iym
  • The goal is to generate the corresponding output strings Oy1,,Oyn
  • For each example set, we assume there exists at least one program P that will correctly transform all these examples
    • P(I1)O1,,P(In)On,P(Iy1)Oy1,,P(Iyn)Oyn
  • The success metric is whether a generated program generalizes to the corresponding assessment examples, P(Iyj)Oyj

  • GetSpan(r1,i1,y1,r2,i2,y2): returns the substring between the ith1 occurrence of regex r1 and the ith2 ocurrence of regex r2, where y1 and y2 denotes either the start or end of the corresponding regex matches
    • GetSpan() seems like a generalization (the encapsulation) of the composition of 2 regexes search with starting position for each regex, in other words, a much more advanced/complex form of SubStr()

  • Using the DSL, sample programs are generated and verified to ensure they are executable, do not throw exceptions, etc.
  • Given these programs, a set of input strings are generated, and their associated output strings computed using the sampled program

  • We model program synthesis as a sequence-to-sequence generation task
  • The observed input-output are encoded using a series of recurrent neural networks, and generate P using another RNN one token at a time

  • Late pooling allows us to effectively incorporate powerful attention mechanism into the model
  • The previous SotA (Parisotto et al. 2017) performed pooling at the I/O encoding level and as such, it could not exploit the attention mechanisms developed here
  • The DSL used here is more expressive, especially the GetSpan() function, which was required to solve approximately 20% of the test instances

  • It is possible to model both approaches (synthesis and induction) using nearly-identical network architectures
  • The induction model evaluated is identical to synthesis Attention-A with late pooling, except for the following two modifications:
    • Instead of generating P, the system generates the new output string Oy character-by-character
    • There is an additional LSTM to encode Iy. The decoder layer Oy uses double attention on Oj and Iy
  • Induction is comparable to synthesis with a Beam = 1
  • The induction model uses a beam of 3, and does not improve with a larger search because there is no way to evaluate candidates after decoding