On the Diversity of Synthetic Data and its Impact on Training Large Language Models