Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models