Goto

Collaborating Authors

 extended neural gpu model


Reviews: Can Active Memory Replace Attention?

Neural Information Processing Systems

The contributions of this paper comes from the proposed Extended Neural GPU model and from the empirical results demonstrating that it performs on par with an attention mechanism. The contribution of extending the model by modeling the output sequence dependencies has not been applied to the Neural GPU specifically, but it is well-established in the literature (e.g. On the other hand, the experimental contribution of making the Extended Neural GPU model work effectively on a machine translation task is useful, and it is especially interesting to see that such an architecture may yield the same advantages as an attention mechanism,. The need for a variable-sized memory is partly supported by (Cho et al., 2014), who demonstrate that the performance of an encoder-decoder translation model, where the encoder is a convolutional neural network, also degrades with sentence length. This adds evidence to the paper's argument that the memory should not be restricted to a fixed-sized vector, but instead allowed to grow with the input sequence length.