Efficient Training of Language Models with Compact and Consistent Next Token Distributions