On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation

Open in new window