The Geometry of Multilingual Language Models: An Equality Lens
Shah, Cheril, Chandak, Yashashree, Suri, Manan
–arXiv.org Artificial Intelligence
Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. We use the XNLI-15way dataset Conneau et al. (2018) and sample 300 parallel sentences across the 15 languages for our analysis.
arXiv.org Artificial Intelligence
May-13-2023
- Country:
- North America > United States > Minnesota (0.28)
- Genre:
- Research Report > New Finding (0.34)
- Technology: