Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy

Open in new window