Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation

Open in new window