AITopics | uriel

Collaborating Authors

uriel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations

Irawan, Patrick Amadeus, Diandaru, Ryandito, Syuhada, Belati Jagad Bintang, Suchrady, Randy Zakya, Aji, Alham Fikri, Winata, Genta Indra, Koto, Fajri, Cahyawijaya, Samuel

arXiv.org Artificial IntelligenceSep-8-2025

We introduce Entropy2Vec, a novel framework for deriving cross-lingual language representations by leveraging the entropy of monolingual language models. Unlike traditional typological inventories that suffer from feature sparsity and static snapshots, Entropy2Vec uses the inherent uncertainty in language models to capture typological relationships between languages. By training a language model on a single language, we hypothesize that the entropy of its predictions reflects its structural similarity to other languages: Low entropy indicates high similarity, while high entropy suggests greater divergence. This approach yields dense, non-sparse language embeddings that are adaptable to different timeframes and free from missing values. Empirical evaluations demonstrate that Entropy2Vec embeddings align with established typological categories and achieved competitive performance in downstream multilingual NLP tasks, such as those addressed by the LinguAlchemy framework.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.0506

Country:

Europe (1.00)
North America (0.68)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.64)

Add feedback

URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base

Khan, Aditya, Shipton, Mason, Anugraha, David, Duan, Kaiyao, Hoang, Phuong H., Khiu, Eric, Doğruöz, A. Seza, Lee, En-Shiun Annie

arXiv.org Artificial IntelligenceDec-19-2024

URIEL is a knowledge base offering geographical, phylogenetic, and typological vector representations for 7970 languages. It includes distance measures between these vectors for 4005 languages, which are accessible via the lang2vec tool. Despite being frequently cited, URIEL is limited in terms of linguistic inclusion and overall usability. To tackle these challenges, we introduce URIEL+, an enhanced version of URIEL and lang2vec that addresses these limitations. In addition to expanding typological feature coverage for 2898 languages, URIEL+ improves the user experience with robust, customizable distance calculations to better suit the needs of users. These upgrades also offer competitive performance on downstream tasks and provide distances that better align with linguistic distance studies.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.18472

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Indonesia > Bali (0.04)
(14 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.61)

Add feedback

A Reproducibility Study on Quantifying Language Similarity: The Impact of Missing Values in the URIEL Knowledge Base

Toossi, Hasti, Huai, Guo Qing, Liu, Jinyu, Khiu, Eric, Doğruöz, A. Seza, Lee, En-Shiun Annie

arXiv.org Artificial IntelligenceMay-17-2024

In the pursuit of supporting more languages around the world, tools that characterize properties of languages play a key role in expanding the existing multilingual NLP research. In this study, we focus on a widely used typological knowledge base, URIEL, which aggregates linguistic information into numeric vectors. Specifically, we delve into the soundness and reproducibility of the approach taken by URIEL in quantifying language similarity. Our analysis reveals URIEL's ambiguity in calculating language distances and in handling missing values. Moreover, we find that URIEL does not provide any information about typological features for 31\% of the languages it represents, undermining the reliabilility of the database, particularly on low-resource languages. Our literature review suggests URIEL and lang2vec are used in papers on diverse NLP tasks, which motivates us to rigorously verify the database as the effectiveness of these works depends on the reliability of the information the tool provides.

feature vector, uriel, vector, (14 more...)

arXiv.org Artificial Intelligence

2405.11125

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Indonesia > Bali (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.61)

Add feedback