Average Is Not Enough: Caveats of Multilingual Evaluation

Open in new window