Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

Open in new window