Do ReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining Sang Michael Xie