Balanced Data Sampling for Language Model Training with Clustering