AITopics | Cotterell, Ryan

Collaborating Authors

Cotterell, Ryan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Speakers Fill Lexical Semantic Gaps with Context

Pimentel, Tiago, Maudslay, Rowan Hall, Blasi, Damián, Cotterell, Ryan

arXiv.org Artificial IntelligenceMay-28-2024

Lexical ambiguity is widespread in language, allowing for the reuse of economical word forms and therefore making language more efficient. If ambiguous words cannot be disambiguated from context, however, this gain in efficiency might make language less clear -- resulting in frequent miscommunication. For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average. To investigate whether this is the case, we operationalise the lexical ambiguity of a word as the entropy of meanings it can take, and provide two ways to estimate this -- one which requires human annotation (using WordNet), and one which does not (using BERT), making it readily applicable to a large number of languages. We validate these measures by showing that, on six high-resource languages, there are significant Pearson correlations between our BERT-based estimate of ambiguity and the number of synonyms a word has in WordNet (e.g. $\rho = 0.40$ in English). We then test our main hypothesis -- that a word's lexical ambiguity should negatively correlate with its contextual uncertainty -- and find significant correlations on all 18 typologically diverse languages we analyse. This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2010.02172

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

A Simple Joint Model for Improved Contextual Neural Lemmatization

Malaviya, Chaitanya, Wu, Shijie, Cotterell, Ryan

arXiv.org Artificial IntelligenceMay-28-2024

English verbs have multiple forms. For instance, talk may also appear as talks, talked or talking, depending on the context. The NLP task of lemmatization seeks to map these diverse forms back to a canonical one, known as the lemma. We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora. Our paper describes the model in addition to training and decoding procedures. Error analysis indicates that joint morphological tagging and lemmatization is especially Figure 1: Our structured neural model shown as a hybrid helpful in low-resource lemmatization and languages (directed-undirected) graphical model (Koller and that display a larger degree of morphological Friedman, 2009).

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

1904.02306

Country:

Europe > Germany (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Transformer with Stack Attention

Li, Jiaoda, White, Jennifer C., Sachan, Mrinmaya, Cotterell, Ryan

arXiv.org Artificial IntelligenceMay-13-2024

Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.04515

Country:

North America > United States > Texas (0.14)
North America > United States > Rhode Island (0.14)
North America > United States > California (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Natural Language Counterfactuals through Representation Surgery

Avitan, Matan, Cotterell, Ryan, Goldberg, Yoav, Ravfogel, Shauli

arXiv.org Artificial IntelligenceMay-7-2024

Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior. Such methods are employed, for example, to eliminate or alter the encoding of demographic information such as gender within the model's representations and, in so doing, create a counterfactual representation. However, because the intervention operates within the representation space, understanding precisely what aspects of the text it modifies poses a challenge. In this paper, we give a method to convert representation counterfactuals into string counterfactuals. We demonstrate that this approach enables us to analyze the linguistic alterations corresponding to a given representation space intervention and to interpret the features utilized to encode a specific concept. Moreover, the resulting counterfactuals can be used to mitigate bias in classification through data augmentation.

intervention, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.11355

Country:

North America > United States (1.00)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Role of $n$-gram Smoothing in the Age of Neural Networks

Malagutti, Luca, Buinovskij, Andrius, Svete, Anej, Meister, Clara, Amini, Afra, Cotterell, Ryan

arXiv.org Artificial IntelligenceApr-30-2024

For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task. The key to their success lay in the application of various smoothing techniques that served to combat overfitting. However, when neural language models toppled $n$-gram models as the best performers, $n$-gram smoothing techniques became less relevant. Indeed, it would hardly be an understatement to suggest that the line of inquiry into $n$-gram smoothing techniques became dormant. This paper re-opens the role classical $n$-gram smoothing techniques may play in the age of neural language models. First, we draw a formal equivalence between label smoothing, a popular regularization technique for neural language models, and add-$\lambda$ smoothing. Second, we derive a generalized framework for converting any $n$-gram smoothing technique into a regularizer compatible with neural language models. Our empirical results find that our novel regularizers are comparable to and, indeed, sometimes outperform label smoothing on language modeling and machine translation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.1724

Country:

Europe (1.00)
North America > Canada (0.67)
North America > United States > California (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Towards Explainability in Legal Outcome Prediction Models

Valvoda, Josef, Cotterell, Ryan

arXiv.org Artificial IntelligenceApr-15-2024

Current legal outcome prediction models - a staple of legal NLP - do not explain their reasoning. However, to employ these models in the real world, human legal actors need to be able to understand the model's decisions. In the case of common law, legal practitioners reason towards the outcome of a case by referring to past case law, known as precedent. We contend that precedent is, therefore, a natural way of facilitating explainability for legal NLP models. In this paper, we contribute a novel method for identifying the precedent employed by legal outcome prediction models. Furthermore, by developing a taxonomy of legal precedent, we are able to compare human judges and neural models with respect to the different types of precedent they rely on. We find that while the models learn to predict outcomes reasonably well, their use of precedent is unlike that of human judges.

machine learning, natural language, precedent, (18 more...)

arXiv.org Artificial Intelligence

2403.16852

Country:

Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.66)

Add feedback

Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields

Cotterell, Ryan, Duh, Kevin

arXiv.org Artificial IntelligenceApr-14-2024

Low-resource named entity recognition is still an open problem in NLP. Most state-of-the-art systems require tens of thousands of annotated sentences in order to obtain high performance. However, for most of the world's languages, it is unfeasible to obtain such annotation. In this paper, we present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly. Learning character representations for multiple related languages allows transfer among the languages, improving F1 by up to 9.8 points over a loglinear CRF baseline.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.09383

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Labeled Morphological Segmentation with Semi-Markov Models

Cotterell, Ryan, Müller, Thomas, Fraser, Alexander, Schütze, Hinrich

arXiv.org Artificial IntelligenceApr-13-2024

We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks. From an annotation standpoint, we additionally introduce a new hierarchy of morphotactic tagsets. Finally, we develop \modelname, a discriminative morphological segmentation system that, contrary to previous work, explicitly models morphotactics. We show that \textsc{chipmunk} yields improved performance on three tasks for all six languages: (i) morphological segmentation, (ii) stemming and (iii) morphological tag classification. On morphological segmentation, our method shows absolute improvements of 2--6 points $F_1$ over the baseline.

machine learning, natural language, segmentation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/K15-1017

2404.08997

Country:

Europe (1.00)
North America > United States > Ohio (0.14)
North America > United States > New York (0.14)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Choshen, Leshem, Cotterell, Ryan, Hu, Michael Y., Linzen, Tal, Mueller, Aaron, Ross, Candace, Warstadt, Alex, Wilcox, Ethan, Williams, Adina, Zhuang, Chengxu

arXiv.org Artificial IntelligenceApr-9-2024

After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques. Second, we are relaxing the rules around pretraining data, and will now allow participants to construct their own datasets provided they stay within the 100M-word or 10M-word budget. Third, we introduce a multimodal vision-and-language track, and will release a corpus of 50% text-only and 50% image-text multimodal data as a starting point for LM model training. The purpose of this CfP is to provide rules for this year's challenge, explain these rule changes and their rationale in greater detail, give a timeline of this year's competition, and provide answers to frequently asked questions from last year's challenge.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.06214

Country: Europe > Switzerland (0.15)

Genre:

Research Report (0.50)
Frequently Asked Questions (FAQ) (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)

Add feedback

NARRATE: Versatile Language Architecture for Optimal Control in Robotics

Ismail, Seif, Arbues, Antonio, Cotterell, Ryan, Zurbrügg, René, Alonso, Carmen Amo

arXiv.org Artificial IntelligenceMar-15-2024

The impressive capabilities of Large Language Models (LLMs) have led to various efforts to enable robots to be controlled through natural language instructions, opening exciting possibilities for human-robot interaction The goal is for the motor-control task to be performed accurately, efficiently and safely while also enjoying the flexibility imparted by LLMs to specify and adjust the task through natural language. In this work, we demonstrate how a careful layering of an LLM in combination with a Model Predictive Control (MPC) formulation allows for accurate and flexible robotic control via natural language while taking into consideration safety constraints. In particular, we rely on the LLM to effectively frame constraints and objective functions as mathematical expressions, which are later used in the motor-control module via MPC. The transparency of the optimization formulation allows for interpretability of the task and enables adjustments through human feedback. We demonstrate the validity of our method through extensive experiments on long-horizon reasoning, contact-rich, and multi-object interaction tasks. Our evaluations show that NARRATE outperforms current existing methods on these benchmarks and effectively transfers to the real world on two different embodiments. Videos, Code and Prompts at narrate-mpc.github.io

constraint, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.10762

Country: Europe (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback