Goto

Collaborating Authors

 Machine Translation




A The Architecture of Decoder Adapters We mainly follow [ 34

Neural Information Processing Systems

In the main content, we also report the inference latency of different models in Table 1. We list the statistics of datasets utilized in the neural machine translation tasks in Table 5. The underlined words indicate the masked words in the next iteration. While preprocessing, we use the same vocabulary of BERT models to decode the dataset.






Appendix for Data Diversification: A Simple Strategy For Neural Machine Translation Xuan-Phi Nguyen

Neural Information Processing Systems

Finally, we describe the training setup for our back-translation experiments. We continue to differentiate our method from other existing works. Our method does not train multiple peer models with EM training either. In each round, a forward (or backward) model takes turn to play the "back-translation" role to train The role is switched in the next round. In other words, source and target are identical.