Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach