Finding Blind Spots in Evaluator LLMs with Interpretable Checklists

Open in new window