Learning Evaluation Models from Large Language Models for Sequence Generation