Optimizing Pretraining Data Mixtures with LLM-Estimated Utility

Open in new window