Predicting the Order of Upcoming Tokens Improves Language Modeling

Open in new window