Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

Recent work on neural machine translation and other text generation tasks has trained models directly to minimize perplexity/negative log-likelihood of observed sequences. While this has shown very promising results, the setup ignores the fact that in practice the model is conditioning on generated symbols as opposed to gold symbols, and may therefore be conditioning on contexts that are quite different from the contexts seen in the gold data. This paper attempts to remedy this problem with by utilizing generated sequences at training time. Instead of conditioning on the gold context it utilizes the generated context. Unfortunately at early rounds of the algorithm this produces junk, so they introduce a "scheduled sampling" approach that alternates between the two training methods based on a predefined decay schedule inspired by curriculum learning.