When does label smoothing help?

Rafael Müller, Simon Kornblith, Geoffrey E. Hinton

Neural Information Processing Systems 

It is widely known that neural network training is sensitive to the loss that is minimized.