Do ReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining Sang Michael Xie

Neural Information Processing Systems 

We then resample a dataset with these domain weights and train a larger, full-sized model.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found