Test Set Quality in Multilingual LLM Evaluation