Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation
He, Tianyu, Tan, Xu, Xia, Yingce, He, Di, Qin, Tao, Chen, Zhibo, Liu, Tie-Yan
–Neural Information Processing Systems
Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures. In this paper, we propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually fromlow level to high level. Specifically, we design a layer-wise attention and mixed attention mechanism, and further share the parameters of each layer between the encoder and decoder to regularize and coordinate the learning. Experiments showthat combined with the state-of-the-art Transformer model, layer-wise coordination achieves improvements on three IWSLT and two WMT translation tasks. More specifically, our method achieves 34.43 and 29.01 BLEU score on WMT16 English-Romanian and WMT14 English-German tasks, outperforming the Transformer baseline.
Neural Information Processing Systems
Dec-31-2018
- Country:
- Europe (0.93)
- North America > United States
- California > San Francisco County > San Francisco (0.14)
- Technology: