Quantifying Label-Induced Bias in Large Language Model Self- and Cross-Evaluations

Open in new window