MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

Open in new window