Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

Open in new window