On Pretraining Data Diversity for Self-Supervised Learning