Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?