Unsupervised Topic Models are Data Mixers for Pre-training Language Models