Optimizing Pretraining Data Mixtures with LLM-Estimated Utility