AITopics | typological feature

Collaborating Authors

typological feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Less is More: The Effectiveness of Compact Typological Language Representations

Ng, York Hay, Hoang, Phuong Hanh, Lee, En-Shiun Annie

arXiv.org Artificial IntelligenceSep-25-2025

Linguistic feature datasets such as URIEL+ are valuable for modelling cross-lingual relationships, but their high dimensionality and sparsity, especially for low-resource languages, limit the effectiveness of distance metrics. We propose a pipeline to optimize the URIEL+ typological feature space by combining feature selection and imputation, producing compact yet interpretable typological representations. We evaluate these feature subsets on linguistic distance alignment and downstream tasks, demonstrating that reduced-size representations of language typology can yield more informative distance metrics and improve performance in multilingual NLP applications.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.20129

Country:

North America > United States (0.68)
North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Beyond cognacy

Jäger, Gerhard

arXiv.org Artificial IntelligenceJul-8-2025

Computational phylogenetics has become an established tool in historical linguistics, with many language families now analyzed using likelihood-based inference. However, standard approaches rely on expert-annotated cognate sets, which are sparse, labor-intensive to produce, and limited to individual language families. This paper explores alternatives by comparing the established method to two fully automated methods that extract phylogenetic signal directly from lexical data. One uses automatic cognate clustering with unigram/concept features; the other applies multiple sequence alignment (MSA) derived from a pair-hidden Markov model. Both are evaluated against expert classifications from Glottolog and typological data from Grambank. Also, the intrinsic strengths of the phylogenetic signal in the characters are compared. Results show that MSA-based inference yields trees more consistent with linguistic classifications, better predicts typological variation, and provides a clearer phylogenetic signal, suggesting it as a promising, scalable alternative to traditional cognate-based methods. This opens new avenues for global-scale language phylogenies beyond expert annotation bottlenecks.

artificial intelligence, language family, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.03005

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.40)
North America > United States (0.14)
North America > Mexico > Mexico City > Mexico City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback

Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging

Rice, Enora, Marashian, Ali, Haynie, Hannah, von der Wense, Katharina, Palmer, Alexis

arXiv.org Artificial IntelligenceMar-25-2025

Cross-lingual transfer learning is an invaluable tool for overcoming data scarcity, yet selecting a suitable transfer language remains a challenge. The precise roles of linguistic typology, training data, and model architecture in transfer language choice are not fully understood. We take a holistic approach, examining how both dataset-specific and fine-grained typological features influence transfer language selection for part-of-speech tagging, considering two different sources for morphosyntactic features. While previous work examines these dynamics in the context of bilingual biLSTMS, we extend our analysis to a more modern transfer learning pipeline: zero-shot prediction with pretrained multilingual models. We train a series of transfer language ranking systems and examine how different feature inputs influence ranker performance across architectures. Word overlap, type-token ratio, and genealogical distance emerge as top features across all architectures. Our findings reveal that a combination of typological and dataset-dependent features leads to the best rankings, and that good performance can be obtained with either feature group on its own.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.19979

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(7 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

A Principled Framework for Evaluating on Typologically Diverse Languages

Ploeger, Esther, Poelman, Wessel, Høeg-Petersen, Andreas Holck, Schlichtkrull, Anders, de Lhoneux, Miryam, Bjerva, Johannes

arXiv.org Artificial IntelligenceJul-6-2024

Beyond individual languages, multilingual natural language processing (NLP) research increasingly aims to develop models that perform well across languages generally. However, evaluating these systems on all the world's languages is practically infeasible. To attain generalizability, representative language sampling is essential. Previous work argues that generalizable multilingual evaluation sets should contain languages with diverse typological properties. However, 'typologically diverse' language samples have been found to vary considerably in this regard, and popular sampling methods are flawed and inconsistent. We present a language sampling framework for selecting highly typologically diverse languages given a sampling frame, informed by language typology. We compare sampling methods with a range of metrics and find that our systematic methods consistently retrieve more typologically diverse language selections than previous methods in NLP. Moreover, we provide evidence that this affects generalizability in multilingual model evaluation, emphasizing the importance of diverse language sampling in NLP evaluation.

computational linguistic, diversity, typology, (14 more...)

arXiv.org Artificial Intelligence

2407.05022

Country:

Europe > Denmark > North Jutland > Aalborg (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(22 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

Hlavnova, Ester, Ruder, Sebastian

arXiv.org Artificial IntelligenceJul-11-2023

We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight Figure 1: Top: Comparison of state-of-the-art models generalization failures to specific typological on M2C tests in a selected set of languages. Models characteristics such as temporal expressions perform well on English but poorly on certain tests in in Swahili and compounding possessives other languages. Bottom: Even the largest models fail in Finish. Our findings motivate the development on tests probing language-specific features, e.g., the distinction of models that address these blind spots.

artificial intelligence, computational linguistic, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.05454

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

A study of conceptual language similarity: comparison and evaluation

Ye, Haotian, Liu, Yihong, Schütze, Hinrich

arXiv.org Artificial IntelligenceMay-22-2023

An interesting line of research in natural language processing (NLP) aims to incorporate linguistic typology to bridge linguistic diversity and assist the research of low-resource languages. While most works construct linguistic similarity measures based on lexical or typological features, such as word order and verbal inflection, recent work has introduced a novel approach to defining language similarity based on how they represent basic concepts, which is complementary to existing similarity measures. In this work, we study the conceptual similarity in detail and evaluate it extensively on a binary classification task.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.13401

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
South America (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.31)

Add feedback

Language Embeddings Sometimes Contain Typological Generalizations

Östling, Robert, Kurfalı, Murathan

arXiv.org Artificial IntelligenceJan-19-2023

To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features obtained through annotation projection. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most of our models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations. Careful attention to details in the evaluation turns out to be essential to avoid false positives. Furthermore, to encourage continued work in this field, we release several resources covering most or all of the languages in our data: (i) multiple sets of language representations, (ii) multilingual word embeddings, (iii) projected and predicted syntactic and morphological features, (iv) software to provide linguistically sound evaluations of language representations.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2301.08115

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Saxony > Leipzig (0.05)
South America > Paraguay (0.04)
(17 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Does Typological Blinding Impede Cross-Lingual Sharing?

Bjerva, Johannes, Augenstein, Isabelle

arXiv.org Artificial IntelligenceJan-28-2021

Bridging the performance gap between high- and low-resource languages has been the focus of much previous work. Typological features from databases such as the World Atlas of Language Structures (WALS) are a prime candidate for this, as such data exists even for very low-resource languages. However, previous work has only found minor benefits from using typological information. Our hypothesis is that a model trained in a cross-lingual setting will pick up on typological cues from the input data, thus overshadowing the utility of explicitly using such features. We verify this hypothesis by blinding a model to typological information, and investigate how cross-lingual sharing and performance is impacted. Our model is based on a cross-lingual architecture in which the latent weights governing the sharing between languages is learnt during training. We show that (i) preventing this model from exploiting typology severely reduces performance, while a control experiment reaffirms that (ii) encouraging sharing according to typology somewhat improves performance.

augenstein, computational linguistic, typological feature, (15 more...)

arXiv.org Artificial Intelligence

2101.11888

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task

Gutkin, Alexander, Sproat, Richard

arXiv.org Artificial IntelligenceOct-12-2020

This paper describes the NEMO submission to SIGTYP 2020 shared task which deals with prediction of linguistic typological features for multiple languages using the data derived from World Atlas of Language Structures (WALS). We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features. We describe two submitted ridge regression-based configurations which ranked second and third overall in the constrained task. Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.

classifier, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2010.05985

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
(18 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Why You Should Do NLP Beyond English

#artificialintelligenceAug-2-2020, 19:58:07 GMT

Natural language processing (NLP) research predominantly focuses on developing methods that work well for English despite the many positive benefits of working on other languages. These benefits range from an outsized societal impact to modelling a wealth of linguistic features to avoiding overfitting as well as interesting challenges for machine learning (ML). There are around 7,000 languages spoken around the world. The map above (see the interactive version at Langscape) gives an overview of languages spoken around the world, with each green circle representing a native language. Most of the world's languages are spoken in Asia, Africa, the Pacific region and the Americas.

information, machine learning, natural language, (19 more...)

#artificialintelligence

Country:

Asia (0.25)
Africa (0.25)
Europe > Italy (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback