Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries

Open in new window