AITopics | Wattenberg, Martin

Plotting

Wattenberg, Martin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Li, Kenneth, Patel, Oam, Viégas, Fernanda, Pfister, Hanspeter, Wattenberg, Martin

arXiv.org Artificial IntelligenceOct-19-2023

We introduce Inference-Time Intervention (ITI), a technique designed to enhance the "truthfulness" of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from 32.5% to 65.1%. We identify a trade-off between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.03341

Country:

South America (1.00)
Oceania (1.00)
North America > United States > New York (1.00)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Personal > Honors (1.00)

Industry:

Transportation > Air (1.00)
Media > Music (1.00)
Media > Film (1.00)
(21 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Emergent Linear Representations in World Models of Self-Supervised Sequence Models

Nanda, Neel, Lee, Andrew, Wattenberg, Martin

arXiv.org Artificial IntelligenceSep-7-2023

How do sequence models represent their decision-making process? Prior work suggests that Othello-playing neural network learned nonlinear models of the board state (Li et al., 2023). In this work, we provide evidence of a closely related linear representation of the board. In particular, we show that probing for "my colour" vs. "opponent's colour" may be a simple yet powerful way to interpret the model's internal state. This precise understanding of the internal representations allows us to control the model's behaviour with simple vector arithmetic. Linear representations enable significant interpretability progress, which we demonstrate with further exploration of how the world model is computed.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2309.00941

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.71)

Add feedback

Linearity of Relation Decoding in Transformer Language Models

Hernandez, Evan, Sharma, Arnab Sen, Haklay, Tal, Meng, Kevin, Wattenberg, Martin, Andreas, Jacob, Belinkov, Yonatan, Bau, David

arXiv.org Artificial IntelligenceAug-17-2023

Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations: relations between words and their synonyms, entities and their attributes, etc. We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation. Linear relation representations may be obtained by constructing a first-order approximation to the LM from a single prompt, and they exist for a variety of factual, commonsense, and linguistic relations. However, we also identify many cases in which LM predictions capture relational knowledge accurately, but this knowledge is not linearly encoded in their representations. Our results thus reveal a simple, interpretable, but heterogeneously deployed knowledge representation strategy in transformer LMs.

machine learning, natural language, relation, (18 more...)

arXiv.org Artificial Intelligence

2308.09124

Country:

Europe (0.46)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

AttentionViz: A Global View of Transformer Attention

Yeh, Catherine, Chen, Yida, Wu, Aoyu, Chen, Cynthia, Viégas, Fernanda, Wattenberg, Martin

arXiv.org Artificial IntelligenceAug-9-2023

Figure 1: AttentionViz, our interactive visualization tool, allows users to explore transformer self-attention at scale by creating a joint embedding space for queries and keys. Each point in the scatterplot represents the query or key version of a word, as denoted by point color. Users can explore individual attention heads (left) or zoom out for a "global" view of attention (right). Abstract--Transformer models are revolutionizing machine learning, but their inner workings remain mysterious. In this work, we present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers that allows these models to learn rich, contextual relationships between elements of a sequence. The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention. Unlike previous attention visualization techniques, our approach enables the analysis of global patterns across multiple input sequences. We create an interactive visualization tool, AttentionViz (demo: http://attentionviz.com), based on these joint query-key embeddings, and use it to study attention mechanisms in both language and vision transformers. We demonstrate the utility of our approach in improving model understanding and offering new insights about query-key interactions through several application scenarios and expert feedback. The transformer neural network architecture [52] is having a major impact In this work, we describe a new visualization technique aimed at on fields ranging from natural language processing (NLP) [13, 42] better comprehending how transformers operate. Indeed, transformers are now deployed in introduction to transformers in Sec. However, the mechanisms these models to learn and use a rich set of relationships between input behind this success remain somewhat mysterious, especially as elements.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.0321

Country:

Europe (0.28)
North America > United States (0.14)

Genre:

Research Report (0.82)
Personal > Interview (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The System Model and the User Model: Exploring AI Dashboard Design

Viégas, Fernanda, Wattenberg, Martin

arXiv.org Artificial IntelligenceMay-3-2023

This is a speculative essay on interface design and artificial intelligence. Recently there has been a surge of attention to chatbots based on large language models, including widely reported unsavory interactions. We contend that part of the problem is that text is not all you need: sophisticated AI systems should have dashboards, just like all other complicated devices. Assuming the hypothesis that AI systems based on neural networks will contain interpretable models of aspects of the world around them, we discuss what data such dashboards might display. We conjecture that, for many systems, the two most important models will be of the user and of the system itself. We call these the System Model and User Model. We argue that, for usability and safety, interfaces to dialogue-based AI systems should have a parallel display based on the state of the System Model and the User Model. Finding ways to identify, interpret, and display these two models should be a core part of interface research for AI.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.02469

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.64)

Industry:

Transportation (0.69)
Media (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.50)

Add feedback

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

Li, Kenneth, Hopkins, Aspen K., Bau, David, Viégas, Fernanda, Pfister, Hanspeter, Wattenberg, Martin

arXiv.org Artificial IntelligenceFeb-27-2023

Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question in a synthetic setting by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. Although the network has no a priori knowledge of the game or its rules, we uncover evidence of an emergent nonlinear internal representation of the board state. Interventional experiments indicate this representation can be used to control the output of the network. By leveraging these intervention techniques, we produce "latent saliency maps" that help explain predictions. Recent language models have shown an intriguing range of capabilities. Networks trained on a simple "next-word" prediction task are apparently capable of many other things, such as solving logic puzzles or writing basic code. Yet how this type of performance emerges from sequence predictions remains a subject of current debate. Some have suggested that training on a sequence modeling task is inherently limiting. The arguments range from philosophical (Bender & Koller, 2020) to mathematical (Merrill et al., 2021). A common theme is that seemingly good performance might result from memorizing "surface statistics," i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence. This issue is of practical concern, since relying on spurious correlations may lead to problems on out-of-distribution data (Bender et al., 2021; Floridi & Chiriatti, 2020). On the other hand, some tantalizing clues suggest language models may do more than collect spurious correlations, instead building interpretable world models--that is, understandable models of the process producing the sequences they are trained on.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.13382

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Visualizing and Measuring the Geometry of BERT

Reif, Emily, Yuan, Ann, Wattenberg, Martin, Viegas, Fernanda B., Coenen, Andy, Pearce, Adam, Kim, Been

Neural Information Processing SystemsMar-19-2020, 00:03:22 GMT

artificial intelligence, geometry, natural language, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

The What-If Tool: Interactive Probing of Machine Learning Models

Wexler, James, Pushkarna, Mahima, Bolukbasi, Tolga, Wattenberg, Martin, Viegas, Fernanda, Wilson, Jimbo

arXiv.org Machine LearningJul-9-2019

A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, we created the What-If Tool, an open-source application that allows practitioners to probe, visualize, and analyze ML systems, with minimal coding. The What-If Tool lets practitioners test performance in hypothetical situations, analyze the importance of different data features, and visualize model behavior across multiple models and subsets of input data. It also lets practitioners measure systems according to multiple ML fairness metrics. We describe the design of the tool, and report on real-life usage at different organizations.

instructional theory, neural network, threshold, (18 more...)

arXiv.org Machine Learning

1907.04135

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry:

Education (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Visualizing and Measuring the Geometry of BERT

Coenen, Andy, Reif, Emily, Yuan, Ann, Kim, Been, Pearce, Adam, Viégas, Fernanda, Wattenberg, Martin

arXiv.org Machine LearningJun-6-2019

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.

deep learning, neural network, representation, (19 more...)

arXiv.org Machine Learning

1906.02715

Country:

North America > United States (0.14)
Europe > Spain (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Do Neural Networks Show Gestalt Phenomena? An Exploration of the Law of Closure

Kim, Been, Reif, Emily, Wattenberg, Martin, Bengio, Samy

arXiv.org Machine LearningMar-20-2019

One characteristic of human visual perception is the presence of `Gestalt phenomena,' that is, that the whole is something other than the sum of its parts. A natural question is whether image-recognition networks show similar effects. Our paper investigates one particular type of Gestalt phenomenon, the law of closure, in the context of a feedforward image classification neural network (NN). This is a robust effect in human perception, but experiments typically rely on measurements (e.g., reaction time) that are not available for artificial neural nets. We describe a protocol for identifying closure effect in NNs, and report on the results of experiments with simple visual stimuli. Our findings suggest that NNs trained with natural images do exhibit closure, in contrast to networks with randomized weights or networks that have been trained on visually random data. Furthermore, the closure effect reflects something beyond good feature extraction; it is correlated with the network's higher layer features and ability to generalize.

closure effect, neural network, neurology, (19 more...)

arXiv.org Machine Learning

1903.01069

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback