AITopics | Cotterell, Ryan

Collaborating Authors

Cotterell, Ryan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LEACE: Perfect linear concept erasure in closed form

Belrose, Nora, Schneider-Joseph, David, Ravfogel, Shauli, Cotterell, Ryan, Raff, Edward, Biderman, Stella

arXiv.org Artificial IntelligenceOct-29-2023

Concept erasure aims to remove specified features from a representation. It can improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible, as measured by a broad class of norms. We apply LEACE to large language models with a novel procedure called "concept scrubbing," which erases target concept information from every layer in the network. We demonstrate our method on two tasks: measuring the reliance of language models on part-of-speech information, and reducing gender bias in BERT embeddings. Code is available at https://github.com/EleutherAI/concept-erasure.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.03819

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.68)
Health & Medicine (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages

Butoi, Alexandra, Vieira, Tim, Cotterell, Ryan, Chiang, David

arXiv.org Artificial IntelligenceOct-23-2023

The class of tree-adjoining languages can be characterized by various two-level formalisms, consisting of a context-free grammar (CFG) or pushdown automaton (PDA) controlling another CFG or PDA. These four formalisms are equivalent to tree-adjoining grammars (TAG), linear indexed grammars (LIG), pushdown-adjoining automata (PAA), and embedded pushdown automata (EPDA). We define semiring-weighted versions of the above two-level formalisms, and we design new algorithms for computing their stringsums (the weight of all derivations of a string) and allsums (the weight of all derivations). From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA. For LIG, our algorithm is more time-efficient by a factor of $\mathcal{O}(n|\mathcal{N}|)$ (where $n$ is the string length and $|\mathcal{N}|$ is the size of the nonterminal set) and more space-efficient by a factor of $\mathcal{O}(|\Gamma|)$ (where $|\Gamma|$ is the size of the stack alphabet) than the algorithm of Vijay-Shanker and Weir (1989). For EPDA, our algorithm is both more space-efficient and time-efficient than the algorithm of Alonso et al. (2001) by factors of $\mathcal{O}(|\Gamma|^2)$ and $\mathcal{O}(|\Gamma|^3)$, respectively. Finally, we give the first PAA stringsum and allsum algorithms.

artificial intelligence, natural language, pop computation, (18 more...)

arXiv.org Artificial Intelligence

2310.15276

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Higher-order Derivatives of Weighted Finite-state Machines

Zmigrod, Ran, Vieira, Tim, Cotterell, Ryan

arXiv.org Artificial IntelligenceSep-27-2023

Weighted finite-state machines are a fundamental building block of NLP systems. They have withstood the test of time -- from their early use in noisy channel models in the 1990s up to modern-day neurally parameterized conditional random fields. This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. In the case of second-order derivatives, our scheme runs in the optimal $\mathcal{O}(A^2 N^4)$ time where $A$ is the alphabet size and $N$ is the number of states. Our algorithm is significantly faster than prior algorithms. Additionally, our approach leads to a significantly faster algorithm for computing second-order expectations, such as covariance matrices and gradients of first-order expectations.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2106.00749

Country: Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Discriminative Class Tokens for Text-to-Image Diffusion Models

Schwartz, Idan, Snæbjarnarson, Vésteinn, Chefer, Hila, Cotterell, Ryan, Belongie, Serge, Wolf, Lior, Benaim, Sagie

arXiv.org Artificial IntelligenceSep-10-2023

Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images. In this work, we propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text while achieving high accuracy through discriminative signals from a pretrained classifier. This is done by iteratively modifying the embedding of an added input token of a text-to-image diffusion model, by steering generated images toward a given target class according to a classifier. Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images or retraining of a noise-tolerant classifier. We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier. The code is available at \url{https://github.com/idansc/discriminative_class_tokens}.

artificial intelligence, machine learning, method, (18 more...)

arXiv.org Artificial Intelligence

2303.17155

Country:

Asia > Middle East (0.15)
Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Kernelized Concept Erasure

Ravfogel, Shauli, Vargas, Francisco, Goldberg, Yoav, Cotterell, Ryan

arXiv.org Artificial IntelligenceSep-6-2023

The representation space of neural models for textual data emerges in an unsupervised manner during training. Understanding how those representations encode human-interpretable concepts is a fundamental problem. One prominent approach for the identification of concepts in neural representations is searching for a linear subspace whose erasure prevents the prediction of the concept from the representations. However, while many linear erasure algorithms are tractable and interpretable, neural networks do not necessarily represent concepts in a linear manner. To identify non-linearly encoded concepts, we propose a kernelization of a linear minimax game for concept erasure. We demonstrate that it is possible to prevent specific non-linear adversaries from predicting the concept. However, the protection does not transfer to different nonlinear adversaries. Therefore, exhaustively erasing a non-linearly encoded concept remains an open problem.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2201.12191

Country:

Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

An Analysis of On-the-fly Determinization of Finite-state Automata

Baburin, Ivan, Cotterell, Ryan

arXiv.org Artificial IntelligenceAug-27-2023

In this paper we establish an abstraction of on-the-fly determinization of finite-state automata using transition monoids and demonstrate how it can be applied to bound the asymptotics. We present algebraic and combinatorial properties that are sufficient for a polynomial state complexity of the deterministic automaton constructed on-the-fly. A special case of our findings is that automata with many non-deterministic transitions almost always admit a determinization of polynomial complexity. Furthermore, we extend our ideas to weighted finite-state automata.

artificial intelligence, automata, complexity, (16 more...)

arXiv.org Artificial Intelligence

2308.14077

Country: Europe > Portugal (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

A Structured Span Selector

Liu, Tianyu, Jiang, Yuchen Eleanor, Cotterell, Ryan, Sachan, Mrinmaya

arXiv.org Artificial IntelligenceAug-23-2023

Many natural language processing tasks, e.g., coreference resolution and semantic role labeling, require selecting text spans and making decisions about them. A typical approach to such tasks is to score all possible spans and greedily select spans for task-specific downstream processing. This approach, however, does not incorporate any inductive bias about what sort of spans ought to be selected, e.g., that selected spans tend to be syntactic constituents. In this paper, we propose a novel grammar-based structured span selection model which learns to make use of the partial span-level annotation provided for such problems. Compared to previous approaches, our approach gets rid of the heuristic greedy span selection scheme, allowing us to model the downstream task on an optimal set of spans. We evaluate our model on two popular span prediction tasks: coreference resolution and semantic role labeling. We show empirical improvements on both.

artificial intelligence, natural language, span, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.naacl-main.189

2205.03977

Country:

Europe (1.00)
North America > United States > Louisiana (0.14)
North America > United States > Colorado (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

A Measure-Theoretic Characterization of Tight Language Models

Du, Li, Hennigen, Lucas Torroba, Pimentel, Tiago, Meister, Clara, Eisner, Jason, Cotterell, Ryan

arXiv.org Artificial IntelligenceAug-21-2023

Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings. In most cases, the estimated distribution sums to 1 over all finite strings. However, in some pathological cases, probability mass can ``leak'' onto the set of infinite sequences. In order to characterize the notion of leakage more precisely, this paper offers a measure-theoretic treatment of language modeling. We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense. We also generalize characterizations of tightness proposed in previous works.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.acl-long.543

2212.10502

Country:

North America > United States > New York (0.14)
North America > United States > Louisiana (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

Benchmarking Compositionality with Formal Languages

Valvoda, Josef, Saphra, Naomi, Rawski, Jonathan, Williams, Adina, Cotterell, Ryan

arXiv.org Artificial IntelligenceAug-1-2023

Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability. Whether large neural models in NLP can acquire this ability while learning from data is an open question. In this paper, we investigate this problem from the perspective of formal languages. We use deterministic finite-state transducers to make an unbounded number of datasets with controllable properties governing compositionality. By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network. We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.

logic & formal reasoning, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2208.08195

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Probing for the Usage of Grammatical Number

Lasri, Karim, Pimentel, Tiago, Lenci, Alessandro, Poibeau, Thierry, Cotterell, Ryan

arXiv.org Artificial IntelligenceJul-31-2023

A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious-i.e., the model might not rely on it when making predictions. In this paper, we try to find encodings that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without using the linguistic property. Then, we attempt to remove the property by intervening on the model's representations. We contend that, if an encoding is used by the model, its removal should harm the performance on the chosen behavioral task. As a case study, we focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task. Experimentally, we find that BERT relies on a linear encoding of grammatical number to produce the correct behavioral output. We also find that BERT uses a separate encoding of grammatical number for nouns and verbs. Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb.

information, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.acl-long.603

2204.08831

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback