AITopics | Steinert-Threlkeld, Shane

Collaborating Authors

Steinert-Threlkeld, Shane

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Minimization of Boolean Complexity in In-Context Concept Learning

Wang, Leroy Z., McCoy, R. Thomas, Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceDec-3-2024

What factors contribute to the relative success and corresponding difficulties of in-context learning for Large Language Models (LLMs)? Drawing on insights from the literature on human concept learning, we test LLMs on carefully designed concept learning tasks, and show that task performance highly correlates with the Boolean complexity of the concept. This suggests that in-context learning exhibits a learning bias for simplicity in a way similar to humans.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.02823

Country: Asia > Middle East (0.28)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence

Patil, Abhinav, Jumelet, Jaap, Chiu, Yu Ying, Lapastora, Andy, Shen, Peter, Wang, Lexie, Willrich, Clevis, Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceMay-24-2024

This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpora that target a wide range of linguistic phenomena. Our results show that while transformers are better qua LMs (as measured by perplexity), both models perform equally and surprisingly well on linguistic generalization measures, suggesting that they are capable of generalizing from indirect evidence.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.1575

Country:

Asia (0.92)
North America > United States > Minnesota (0.14)
North America > United States > Louisiana (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Targeted Multilingual Adaptation for Low-resource Language Families

Downey, C. M., Blevins, Terra, Serai, Dhwani, Parikh, Dwija, Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceMay-20-2024

The "massively-multilingual" training of multilingual models is known to limit their utility in any one language, and they perform particularly poorly on low-resource languages. However, there is evidence that low-resource languages can benefit from targeted multilinguality, where the model is trained on closely related languages. To test this approach more rigorously, we systematically study best practices for adapting a pre-trained model to a language family. Focusing on the Uralic family as a test case, we adapt XLM-R under various configurations to model 15 languages; we then evaluate the performance of each experimental setting on two downstream tasks and 11 evaluation languages. Our adapted models significantly outperform mono- and multilingual baselines. Furthermore, a regression analysis of hyperparameter effects reveals that adapted vocabulary size is relatively unimportant for low-resource languages, and that low-resource languages can be aggressively up-sampled during training at little detriment to performance in high-resource languages. These results introduce new best practices for performing language adaptation in a targeted setting.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.12413

Country:

Europe > Russia (0.28)
North America > United States (0.28)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation

Guerin, Nicolas, Steinert-Threlkeld, Shane, Chemla, Emmanuel

arXiv.org Artificial IntelligenceMar-26-2024

Unsupervised on-the-fly back-translation, in conjunction with multilingual pretraining, is the dominant method for unsupervised neural machine translation. Theoretically, however, the method should not work in general. We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical, syntactic, and semantic properties. We find, contrary to popular belief, that (i) parallel word frequency distributions, (ii) partially shared vocabulary, and (iii) similar syntactic structure across languages are not sufficient to explain the success of back-translation. We show however that even crude semantic signal (similar lexical fields across languages) does improve alignment of two languages through back-translation. We conjecture that rich semantic dependencies, parallel across languages, are at the root of the success of unsupervised methods based on back-translation. Overall, the success of unsupervised machine translation was far from being analytically guaranteed. Instead, it is another proof that languages of the world share deep similarities, and we hope to show how to identify which of these similarities can serve the development of unsupervised, cross-linguistic tools.

artificial intelligence, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2403.18031

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages

Downey, C. M., Blevins, Terra, Goldfine, Nora, Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceOct-26-2023

Additionally, the informationtheoretic For languages other than English and a handful tokenization modules for cross-lingual of other very high-resource languages, pre-trained models are usually under-optimized for any given multilingual language models form the backbone language, and especially low-resource languages of most current NLP systems. These models address (Ács, 2019; Conneau and Lample, 2019, i.a.) the relative data scarcity in most non-English For this reason, we propose several simple techniques languages by pooling text data across many languages to replace the large cross-lingual vocabulary to train a single model that (in theory) covers of a pre-trained model with a compact, languagespecific all training languages (Devlin, 2019; Conneau one during model specialization. Training and Lample, 2019; Conneau et al., 2020; Liu et al., a new SentencePiece or BPE tokenizer poses no 2020; Scao et al., 2023, i.a.).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.04679

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages

Wang, Shunjie, Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceOct-19-2023

Despite the fact that Transformers perform well in NLP tasks, recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages. These findings motivated us to think about their implications in modeling natural language, which is hypothesized to be mildly context-sensitive. We test the Transformer's ability to learn mildly context-sensitive languages of varying complexities, and find that they generalize well to unseen in-distribution data, but their ability to extrapolate to longer strings is worse than that of LSTMs. Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior, which may have helped the models solve the languages.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.00857

Country:

Europe (0.93)
North America > United States > California (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Learning to translate by learning to communicate

Downey, C. M., Zhou, Xuhui, Liu, Leo Z., Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceOct-19-2023

We formulate and test a technique to use Emergent Communication (EC) with a pre-trained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages. It has been argued that the current dominant paradigm in NLP of pre-training on text-only corpora will not yield robust natural language understanding systems, and the need for grounded, goal-oriented, and interactive language learning has been high lighted. In our approach, we embed a multilingual model (mBART, Liu et al., 2020) into an EC image-reference game, in which the model is incentivized to use multilingual generations to accomplish a vision-grounded task. The hypothesis is that this will align multiple languages to a shared task space. We present two variants of EC Fine-Tuning (Steinert-Threlkeld et al., 2022), one of which outperforms a backtranslation-only baseline in all four languages investigated, including the low-resource language Nepali.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2207.07025

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Weighted M\"obius Score: A Unified Framework for Feature Attribution

Jiang, Yifan, Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceMay-16-2023

Feature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction. Recent work has extended feature attribution to interactions between multiple features. However, the lack of a unified framework has led to a proliferation of methods that are often not directly comparable. This paper introduces a parameterized attribution framework -- the Weighted M\"obius Score -- and (i) shows that many different attribution methods for both individual features and feature interactions are special cases and (ii) identifies some new methods. By studying the vector space of attribution methods, our framework utilizes standard linear algebra tools and provides interpretations in various fields, including cooperative game theory and causal mediation analysis. We empirically demonstrate the framework's versatility and effectiveness by applying these attribution methods to feature interactions in sentiment analysis and chain-of-thought prompting.

attribution method, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.09204

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Paying Attention to Function Words

Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceSep-24-2019

All natural languages exhibit a distinction between content words (like nouns and adjectives) and function words (like determiners, auxiliaries, prepositions). Yet surprisingly little has been said about the emergence of this universal architectural feature of natural languages. Why have human languages evolved to exhibit this division of labor between content and function words? How could such a distinction have emerged in the first place? This paper takes steps towards answering these questions by showing how the distinction can emerge through reinforcement learning in agents playing a signaling game across contexts which contain multiple objects that possess multiple perceptually salient gradable properties.

artificial intelligence, dimension, neural network, (18 more...)

arXiv.org Artificial Intelligence

1909.1106

Country:

North America > Canada (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers

Pezzelle, Sandro, Steinert-Threlkeld, Shane, Bernardi, Raffaela, Szymanik, Jakub

arXiv.org Artificial IntelligenceJun-1-2018

We study the role of linguistic context in predicting quantifiers (`few', `all'). We collect crowdsourced data from human participants and test various models in a local (single-sentence) and a global context (multi-sentence) condition. Models significantly out-perform humans in the former setting and are only slightly better in the latter. While human performance improves with more linguistic context (especially on proportional quantifiers), model performance suffers. Models are very effective in exploiting lexical and morpho-syntactic patterns; humans are better at genuinely understanding the meaning of the (global) context.

deep learning, neural network, quantifier, (21 more...)

arXiv.org Artificial Intelligence

1806.00354

Country: Europe (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback