On the Blind Spots of Model-Based Evaluation Metrics for Text Generation

Open in new window