Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

Open in new window