An Empirical Analysis on Large Language Models in Debate Evaluation

Open in new window