Less is More: The Effectiveness of Compact Typological Language Representations
Ng, York Hay, Hoang, Phuong Hanh, Lee, En-Shiun Annie
–arXiv.org Artificial Intelligence
Linguistic feature datasets such as URIEL+ are valuable for modelling cross-lingual relationships, but their high dimensionality and sparsity, especially for low-resource languages, limit the effectiveness of distance metrics. We propose a pipeline to optimize the URIEL+ typological feature space by combining feature selection and imputation, producing compact yet interpretable typological representations. We evaluate these feature subsets on linguistic distance alignment and downstream tasks, demonstrating that reduced-size representations of language typology can yield more informative distance metrics and improve performance in multilingual NLP applications.
arXiv.org Artificial Intelligence
Sep-25-2025
- Country:
- North America
- United States (0.68)
- Canada > Ontario (0.28)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Technology: