Appendix Can You Rely on Y our Model A Case for Synthetic Data Based Model Evaluation