Review for NeurIPS paper: Learning Compositional Rules via Neural Program Synthesis

Neural Information Processing Systems 

The paper claims that the model "learn[s] entire rule systems from a small set of examples". I'm not convinced that this is the case in this work and neither in the previous work which this one extends (i.e. Both methods heavily rely on the supporting set and the specific neural attention architecture of the encoder and decoder which allow for the replacement of individual tokens. This allows the model to exploit a certain pattern in the support set e.g. "a b c - a c a" by replacing the "a" and "b" on-the-fly and execute the abstract rule given by the supporting set.