Benchmarking Cognitive Biases in Large Language Models as Evaluators

Open in new window