Scaling Laws for Optimal Data Mixtures