Efficient Training of Language Models with Compact and Consistent Next Token Distributions

Open in new window