Finding Blind Spots in Evaluator LLMs with Interpretable Checklists