Towards Understanding Bias in Synthetic Data for Evaluation