AITopics | Svete, Anej

Collaborating Authors

Svete, Anej

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unique Hard Attention: A Tale of Two Sides

Jerad, Selim, Svete, Anej, Li, Jiaoda, Cotterell, Ryan

arXiv.org Artificial IntelligenceMar-18-2025

Understanding the expressive power of transformers has recently attracted attention, as it offers insights into their abilities and limitations. Many studies analyze unique hard attention transformers, where attention selects a single position that maximizes the attention scores. When multiple positions achieve the maximum score, either the rightmost or the leftmost of those is chosen. In this paper, we highlight the importance of this seeming triviality. Recently, finite-precision transformers with both leftmost- and rightmost-hard attention were shown to be equivalent to Linear Temporal Logic (LTL). We show that this no longer holds with only leftmost-hard attention -- in that case, they correspond to a \emph{strictly weaker} fragment of LTL. Furthermore, we show that models with leftmost-hard attention are equivalent to \emph{soft} attention, suggesting they may better approximate real-world transformers than right-attention models. These findings refine the landscape of transformer expressivity and underscore the role of attention directionality.

contextual representation, representation, transformer, (14 more...)

arXiv.org Artificial Intelligence

2503.14615

Country: North America > Mexico > Mexico City (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)

Add feedback

An $\mathbf{L^*}$ Algorithm for Deterministic Weighted Regular Languages

Pasti, Clemente, Karagöz, Talu, Svete, Anej, Nowak, Franz, Boumasmoud, Reda, Cotterell, Ryan

arXiv.org Artificial IntelligenceDec-19-2024

Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors. To support this pursuit, we present a weighted variant of Angluin's (1987) $\mathbf{L^*}$ algorithm for learning FSAs. We stay faithful to the original algorithm, devising a way to exactly learn deterministic weighted FSAs whose weights support division. Furthermore, we formulate the learning process in a manner that highlights the connection with FSA minimization, showing how $\mathbf{L^*}$ directly learns a minimal automaton for the target language.

artificial intelligence, automaton, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.emnlp-main.468

2411.06228

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Gumbel Counterfactual Generation From Language Models

Ravfogel, Shauli, Svete, Anej, Snæbjarnarson, Vésteinn, Cotterell, Ryan

arXiv.org Artificial IntelligenceDec-13-2024

Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery -- e.g., model ablations or manipulation of linear subspaces tied to specific concepts -- to \emph{intervene} on these models. To understand the impact of interventions precisely, it is useful to examine counterfactuals -- e.g., how a given sentence would have appeared had it been generated by the model following a specific intervention. We highlight that counterfactual reasoning is conceptually distinct from interventions, as articulated in Pearl's causal hierarchy. Based on this observation, we propose a framework for generating true string counterfactuals by reformulating language models as a structural equation model using the Gumbel-max trick, which we called Gumbel counterfactual generation. This reformulation allows us to model the joint distribution over original strings and their counterfactuals resulting from the same instantiation of the sampling noise. We develop an algorithm based on hindsight Gumbel sampling that allows us to infer the latent noise variables and generate counterfactuals of observed strings. Our experiments demonstrate that the approach produces meaningful counterfactuals while at the same time showing that commonly used intervention techniques have considerable undesired side effects.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.0718

Country:

Asia (1.00)
Africa (0.92)
Oceania > Australia (0.68)
(4 more...)

Genre: Research Report (0.83)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Education (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Training Neural Networks as Recognizers of Formal Languages

Butoi, Alexandra, Khalighinejad, Ghazal, Svete, Anej, Valvoda, Josef, Cotterell, Ryan, DuSell, Brian

arXiv.org Artificial IntelligenceNov-11-2024

Characterizing the computational power of neural network architectures in terms of formal language theory remains a crucial line of research, as it describes lower and upper bounds on the reasoning capabilities of modern AI. However, when empirically testing these bounds, existing work often leaves a discrepancy between experiments and the formal claims they are meant to support. The problem is that formal language theory pertains specifically to recognizers: machines that receive a string as input and classify whether it belongs to a language. On the other hand, it is common to instead use proxy tasks that are similar in only an informal sense, such as language modeling or sequence-to-sequence transduction. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings, using a general method that can be applied to a wide variety of languages. As part of this, we extend an algorithm recently proposed by Sn{\ae}bjarnarson et al. (2024) to do length-controlled sampling of strings from regular languages, with much better asymptotic time complexity than previous methods. We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures: a simple RNN, an LSTM, and a causally-masked transformer. We find that the RNN and LSTM often outperform the transformer, and that auxiliary training objectives such as language modeling can help, although no single objective uniformly improves performance across languages and architectures. Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work. We have released our datasets as a benchmark called FLaRe (Formal Language Recognition), along with our code.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.07107

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Transformers Learn $n$-gram Language Models?

Svete, Anej, Borenstein, Nadav, Zhou, Mike, Augenstein, Isabelle, Cotterell, Ryan

arXiv.org Artificial IntelligenceOct-3-2024

Much theoretical work has described the ability of transformers to represent formal languages. However, linking theoretical results to empirical performance is not straightforward due to the complex interplay between the architecture, the learning algorithm, and training data. To test whether theoretical lower bounds imply \emph{learnability} of formal languages, we turn to recent work relating transformers to $n$-gram language models (LMs). We study transformers' ability to learn random $n$-gram LMs of two kinds: ones with arbitrary next-symbol probabilities and ones where those are defined with shared parameters. We find that classic estimation techniques for $n$-gram LMs such as add-$\lambda$ smoothing outperform transformers on the former, while transformers perform better on the latter, outperforming methods specifically designed to learn $n$-gram LMs.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.03001

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Transformers Can Represent $n$-gram Language Models

Svete, Anej, Cotterell, Ryan

arXiv.org Artificial IntelligenceJun-20-2024

Existing work has analyzed the representational capacity of the transformer architecture by means of formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We contend that this is an ill-suited problem in the study of \emph{language models} (LMs), which are definitionally \emph{probability distributions} over strings. In this paper, we focus on the relationship between transformer LMs and $n$-gram LMs, a simple and historically relevant class of language models. We show that transformer LMs using the hard or sparse attention mechanisms can exactly represent any $n$-gram LM, giving us a concrete lower bound on their probabilistic representational capacity. This provides a first step towards understanding the mechanisms that transformer LMs can use to represent probability distributions over strings.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.14994

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

Nowak, Franz, Svete, Anej, Butoi, Alexandra, Cotterell, Ryan

arXiv.org Artificial IntelligenceJun-20-2024

The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM's computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparing LMs to Turing machines, however, introduces a category error - Turing machines decide language membership, whereas LMs define distributions over strings. To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.14197

Country:

Europe (0.67)
North America > United States > Maryland (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Efficiently Representing Regular Languages as RNNs

Svete, Anej, Chan, Robin Shing Moon, Cotterell, Ryan

arXiv.org Artificial IntelligenceJun-18-2024

Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not inherently limited to hierarchical structures. This poses a natural question: What other classes of LMs can RNNs efficiently represent? To this end, we generalize Hewitt et al.'s (2020) construction and show that RNNs can efficiently represent a larger class of LMs than previously claimed -- specifically, those that can be represented by a pushdown automaton with a bounded stack and a specific stack update function. Altogether, the efficiency of representing this diverse class of LMs with RNN LMs suggests novel interpretations of their inductive bias.

machine learning, natural language, rnn, (20 more...)

arXiv.org Artificial Intelligence

2402.15814

Country:

Europe (0.67)
North America > United States > Maryland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Lower Bounds on the Expressivity of Recurrent Neural Language Models

Svete, Anej, Nowak, Franz, Sahabdeen, Anisha Mohamed, Cotterell, Ryan

arXiv.org Artificial IntelligenceJun-18-2024

The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their computational ability. Describing their computational abilities through LMs' \emph{representational capacity} is a lively area of research. However, investigation into the representational capacity of neural LMs has predominantly focused on their ability to \emph{recognize} formal languages. For example, recurrent neural networks (RNNs) with Heaviside activations are tightly linked to regular languages, i.e., languages defined by finite-state automata (FSAs). Such results, however, fall short of describing the capabilities of RNN \emph{language models} (LMs), which are definitionally \emph{distributions} over strings. We take a fresh look at the representational capacity of RNN LMs by connecting them to \emph{probabilistic} FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.

machine learning, natural language, pfsa, (18 more...)

arXiv.org Artificial Intelligence

2405.19222

Country:

Asia > Middle East > Qatar (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Tan, Naaman, Valvoda, Josef, Svete, Anej, Liu, Tianyu, Qin, Yanxia, Min-Yen, Kan, Cotterell, Ryan

arXiv.org Artificial IntelligenceJun-14-2024

The relationship between the quality of a string and its probability $p(\boldsymbol{y})$ under a language model has been influential in the development of techniques to build good text generation systems. For example, several decoding algorithms have been motivated to manipulate $p(\boldsymbol{y})$ to produce higher-quality text. In this work, we examine the probability--quality relationship in language models explicitly aligned to human preferences, e.g., through Reinforcement Learning through Human Feedback (RLHF). We find that, given a general language model and its aligned version, for corpora sampled from an aligned language model, there exists a trade-off between the average reward and average log-likelihood of the strings under the general language model. We provide a formal treatment of this issue and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.10203

Country:

Asia > Middle East > UAE (0.14)
North America > Canada (0.14)
Asia > Singapore (0.14)
(3 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback