Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning Bei Li

Neural Information Processing Systems 

On the WMT'14 English-German and English-French tasks, our model achieved BLEU scores of 30.95 and 44.27, respectively.