A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Open in new window