Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results