SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic Mistakes