bleu score
When does label smoothing help?
Rafael Müller, Simon Kornblith, Geoffrey E. Hinton
To explain these observations, we visualize how label smoothing changes therepresentations learned bythepenultimate layerofthenetwork. We show that label smoothing encourages the representations of training examples from thesame class togroup intight clusters. This results inloss ofinformation inthe logits about resemblances between instances ofdifferent classes, which isnecessary for distillation, but does not hurt generalization or calibration of the model'spredictions.
Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation
Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu
Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures. In this paper, we propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer,gradually from lowleveltohigh level.