AITopics | Chiang, David

Collaborating Authors

Chiang, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transformers as Transducers

Strobl, Lena, Angluin, Dana, Chiang, David, Rawski, Jonathan, Sabharwal, Ashish

arXiv.org Artificial IntelligenceApr-2-2024

We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence functions and show that it computes exactly the first-order rational functions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular functions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular functions. Finally, we show that masked average-hard attention transformers can simulate S-RASP. A corollary of our results is a new proof that transformer decoders are Turing-complete.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.0204

Country:

Europe > Germany (0.14)
North America > United States (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

DuSell, Brian, Chiang, David

arXiv.org Artificial IntelligenceJan-24-2024

Attention, specifically scaled dot-product attention, has proven effective for natural language, but it does not have a mechanism for handling hierarchical patterns of arbitrary nesting depth, which limits its ability to recognize certain syntactic structures. To address this shortcoming, we propose stack attention: an attention operator that incorporates stacks, inspired by their theoretical connections to context-free languages (CFLs). We show that stack attention is analogous to standard attention, but with a latent model of syntax that requires no syntactic supervision. We propose two variants: one related to deterministic pushdown automata (PDAs) and one based on nondeterministic PDAs, which allows transformers to recognize arbitrary CFLs. We show that transformers with stack attention are very effective at learning CFLs that standard transformers struggle on, achieving strong results on a CFL with theoretically maximal parsing difficulty. We also show that stack attention is more effective at natural language modeling under a constrained parameter budget, and we include results on machine translation.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.01749

Country:

Europe (1.00)
Asia (0.68)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages

Angluin, Dana, Chiang, David, Yang, Andy

arXiv.org Artificial IntelligenceJan-17-2024

We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that the class of languages recognized by these networks is exactly the star-free languages. Adding position embeddings increases the class of recognized languages to other well-studied classes. A key technique in these proofs is Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the star-free languages, we relate transformers to first-order logic, temporal logic, and algebraic automata theory.

logic & formal reasoning, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.13897

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines

Bothwell, Stephen, DeBenedetto, Justin, Crnkovich, Theresa, Müller, Hildegund, Chiang, David

arXiv.org Artificial IntelligenceNov-30-2023

Rhetoric, both spoken and written, involves not only content but also style. One common stylistic tool is $\textit{parallelism}$: the juxtaposition of phrases which have the same sequence of linguistic ($\textit{e.g.}$, phonological, syntactic, semantic) features. Despite the ubiquity of parallelism, the field of natural language processing has seldom investigated it, missing a chance to better understand the nature of the structure, meaning, and intent that humans convey. To address this, we introduce the task of $\textit{rhetorical parallelism detection}$. We construct a formal definition of it; we provide one new Latin dataset and one adapted Chinese dataset for it; we establish a family of metrics to evaluate performance on it; and, lastly, we create baseline systems and novel sequence labeling schemes to capture it. On our strictest metric, we attain $F_{1}$ scores of $0.40$ and $0.43$ on our Latin and Chinese datasets, respectively.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2312.001

Country:

Europe (1.00)
Asia (1.00)
North America > Canada (0.67)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)

Add feedback

Tighter Bounds on the Expressivity of Transformer Encoders

Chiang, David, Cholak, Peter, Pillay, Anand

arXiv.org Artificial IntelligenceNov-13-2023

Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.

artificial intelligence, machine learning, transformer encoder, (13 more...)

arXiv.org Artificial Intelligence

2301.10743

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

BERTwich: Extending BERT's Capabilities to Model Dialectal and Noisy Text

Srivastava, Aarohi, Chiang, David

arXiv.org Artificial IntelligenceOct-31-2023

Real-world NLP applications often deal with nonstandard text (e.g., dialectal, informal, or misspelled text). However, language models like BERT deteriorate in the face of dialect variation or noise. How do we push BERT's modeling capabilities to encompass nonstandard text? Fine-tuning helps, but it is designed for specializing a model to a task and does not seem to bring about the deeper, more pervasive changes needed to adapt a model to nonstandard language. In this paper, we introduce the novel idea of sandwiching BERT's encoder stack between additional encoder layers trained to perform masked language modeling on noisy text. We find that our approach, paired with recent work on including character-level noise in fine-tuning data, can promote zero-shot transfer to dialectal text, as well as reduce the distance in the embedding space between words and their noisy counterparts.

artificial intelligence, model dialectal and noisy text, natural language, (3 more...)

arXiv.org Artificial Intelligence

2311.00116

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Transformers as Recognizers of Formal Languages: A Survey on Expressivity

Strobl, Lena, Merrill, William, Weiss, Gail, Chiang, David, Angluin, Dana

arXiv.org Artificial IntelligenceOct-31-2023

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring questions such as this will help to compare transformers with other models, and transformer variants with one another, for various tasks. Work in this subarea has made considerable progress in recent years. Here, we undertake a comprehensive survey of this work, documenting the diverse assumptions that underlie different results and providing a unified framework for harmonizing seemingly contradictory findings.

artificial intelligence, formal language, natural language, (4 more...)

arXiv.org Artificial Intelligence

2311.00208

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Software > Programming Languages (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.60)

Add feedback

Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages

Butoi, Alexandra, Vieira, Tim, Cotterell, Ryan, Chiang, David

arXiv.org Artificial IntelligenceOct-23-2023

The class of tree-adjoining languages can be characterized by various two-level formalisms, consisting of a context-free grammar (CFG) or pushdown automaton (PDA) controlling another CFG or PDA. These four formalisms are equivalent to tree-adjoining grammars (TAG), linear indexed grammars (LIG), pushdown-adjoining automata (PAA), and embedded pushdown automata (EPDA). We define semiring-weighted versions of the above two-level formalisms, and we design new algorithms for computing their stringsums (the weight of all derivations of a string) and allsums (the weight of all derivations). From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA. For LIG, our algorithm is more time-efficient by a factor of $\mathcal{O}(n|\mathcal{N}|)$ (where $n$ is the string length and $|\mathcal{N}|$ is the size of the nonterminal set) and more space-efficient by a factor of $\mathcal{O}(|\Gamma|)$ (where $|\Gamma|$ is the size of the stack alphabet) than the algorithm of Vijay-Shanker and Weir (1989). For EPDA, our algorithm is both more space-efficient and time-efficient than the algorithm of Alonso et al. (2001) by factors of $\mathcal{O}(|\Gamma|^2)$ and $\mathcal{O}(|\Gamma|^3)$, respectively. Finally, we give the first PAA stringsum and allsum algorithms.

artificial intelligence, natural language, pop computation, (18 more...)

arXiv.org Artificial Intelligence

2310.15276

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Universal Automatic Phonetic Transcription into the International Phonetic Alphabet

Taguchi, Chihiro, Sakai, Yusuke, Haghani, Parisa, Chiang, David

arXiv.org Artificial IntelligenceAug-7-2023

This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language documentation, and even partially automating this process has the potential to drastically speed up the documentation of endangered languages. Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use training data from seven languages from CommonVoice 11.0, transcribed into IPA semi-automatically. Although this training dataset is much smaller than Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or better results. Furthermore, we show that the quality of our universal speech-to-IPA models is close to that of human annotators.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.03917

Country:

Europe (0.68)
Asia > Myanmar > Chin State (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.47)

Add feedback

Convergence and Diversity in the Control Hierarchy

Butoi, Alexandra, Cotterell, Ryan, Chiang, David

arXiv.org Artificial IntelligenceJun-6-2023

Weir has defined a hierarchy of language classes whose second member ($\mathcal{L}_2$) is generated by tree-adjoining grammars (TAG), linear indexed grammars (LIG), combinatory categorial grammars, and head grammars. The hierarchy is obtained using the mechanism of control, and $\mathcal{L}_2$ is obtained using a context-free grammar (CFG) whose derivations are controlled by another CFG. We adapt Weir's definition of a controllable CFG to give a definition of controllable pushdown automata (PDAs). This yields three new characterizations of $\mathcal{L}_2$ as the class of languages generated by PDAs controlling PDAs, PDAs controlling CFGs, and CFGs controlling PDAs. We show that these four formalisms are not only weakly equivalent but equivalent in a stricter sense that we call d-weak equivalence. Furthermore, using an even stricter notion of equivalence called d-strong equivalence, we make precise the intuition that a CFG controlling a CFG is a TAG, a PDA controlling a PDA is an embedded PDA, and a PDA controlling a CFG is a LIG. The fourth member of this family, a CFG controlling a PDA, does not correspond to any formalism we know of, so we invent one and call it a Pushdown Adjoining Automaton.

artificial intelligence, natural language, transition, (17 more...)

arXiv.org Artificial Intelligence

2306.03628

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback