The Geometry of Multilingual Language Models: An Equality Lens

Shah, Cheril, Chandak, Yashashree, Suri, Manan

arXiv.org Artificial Intelligence 

Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. We use the XNLI-15way dataset Conneau et al. (2018) and sample 300 parallel sentences across the 15 languages for our analysis.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found