Average Is Not Enough: Caveats of Multilingual Evaluation
Pikuliak, Matúš, Šimko, Marián
–arXiv.org Artificial Intelligence
We believe that this to improvements of various multilingual technologies, is an often overlooked tool in our research toolkit such as machine translation (Arivazhagan that should be used more to ensure that we are et al., 2019), multilingual language models (Devlin able to properly interpret results from multilingual et al., 2019; Conneau and Lample, 2019), crosslingual evaluation and detect various linguistic biases and transfer learning (Pikuliak et al., 2021) or problems. In addition to this discussion, which language independent representations (Ruder et al., we consider a contribution in itself, we also propose 2019). It is now possible to create well-performing a visualization based on URIEL typological multilingual methods for many tasks. When dealing database (Littell et al., 2017) as an example of such with multilingual methods, we need to be able qualitative analysis, and we show that it is able to to evaluate how good they really are, i.e. how effective discover linguistic biases in published results.
arXiv.org Artificial Intelligence
Jan-3-2023
- Country:
- North America
- Europe
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Italy > Tuscany
- Florence (0.04)
- Greece > Attica
- Athens (0.04)
- Spain > Valencian Community
- Asia
- Genre:
- Research Report > New Finding (0.34)
- Technology: