The Geometry of Multilingual Language Models: An Equality Lens

Shah, Cheril, Chandak, Yashashree, Suri, Manan

May-13-2023–arXiv.org Artificial Intelligence

Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. We use the XNLI-15way dataset Conneau et al. (2018) and sample 300 parallel sentences across the 15 languages for our analysis.

artificial intelligence, natural language, principal component, (16 more...)

arXiv.org Artificial Intelligence

May-13-2023

arXiv.org PDF

Add feedback

Country:
- Africa > Niger (0.04)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Belgium
  - Brussels-Capital Region > Brussels (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence > Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found