Reviews: Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation

Neural Information Processing Systems 

Original Review: This work builds directly off of Transformer networks. They make two contributions to that kind of architecture. The first is to suggest running the encoder and decoder stacks layer by layer instead of running the encoder stack and passing information to the decoder stack. The second is to actually tie the weights of the encoder and decoder. Running a decoder layer right after its corresponding encoder layer processes (rather than running the next encoder layer) is also an interesting augmentation to Transformer networks.