Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition

Open in new window