Goto

Collaborating Authors

 Wingate, David


Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning

arXiv.org Artificial Intelligence

Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations by learning a sparse and overcomplete decomposition of the network's internal activations. However, SAEs are traditionally trained considering only activation values and not the effect those activations have on downstream computations. This limits the information available to learn features, and biases the autoencoder towards neglecting features which are represented with small activation values but strongly influence model outputs. To address this, we introduce Gradient SAEs (g-SAEs), which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function to rely on the gradients of the input activation when selecting the $k$ elements. For a given sparsity level, g-SAEs produce reconstructions that are more faithful to original network performance when propagated through the network. Additionally, we find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts. By considering the downstream effects of activations, our approach leverages the dual nature of neural network features as both $\textit{representations}$, retrospectively, and $\textit{actions}$, prospectively. While previous methods have approached the problem of feature discovery primarily focused on the former aspect, g-SAEs represent a step towards accounting for the latter as well.


AI Chat Assistants can Improve Conversations about Divisive Topics

arXiv.org Artificial Intelligence

A rapidly increasing amount of human conversation occurs online. But divisiveness and conflict can fester in text-based interactions on social media platforms, in messaging apps, and on other digital forums. Such toxicity increases polarization and, importantly, corrodes the capacity of diverse societies to develop efficient solutions to complex social problems that impact everyone. Scholars and civil society groups promote interventions that can make interpersonal conversations less divisive or more productive in offline settings, but scaling these efforts to the amount of discourse that occurs online is extremely challenging. We present results of a large-scale experiment that demonstrates how online conversations about divisive topics can be improved with artificial intelligence tools. Specifically, we employ a large language model to make real-time, evidence-based recommendations intended to improve participants' perception of feeling understood in conversations. We find that these interventions improve the reported quality of the conversation, reduce political divisiveness, and improve the tone, without systematically changing the content of the conversation or moving people's policy attitudes. These findings have important implications for future research on social media, political deliberation, and the growing community of scholars interested in the place of artificial intelligence within computational social science.


Towards Coding Social Science Datasets with Language Models

arXiv.org Artificial Intelligence

Researchers often rely on humans to code (label, annotate, etc.) large sets of texts. This kind of human coding forms an important part of social science research, yet the coding process is both resource intensive and highly variable from application to application. In some cases, efforts to automate this process have achieved human-level accuracies, but to achieve this, these attempts frequently rely on thousands of hand-labeled training examples, which makes them inapplicable to small-scale research studies and costly for large ones. Recent advances in a specific kind of artificial intelligence tool - language models (LMs) - provide a solution to this problem. Work in computer science makes it clear that LMs are able to classify text, without the cost (in financial terms and human effort) of alternative methods. To demonstrate the possibilities of LMs in this area of political science, we use GPT-3, one of the most advanced LMs, as a synthetic coder and compare it to human coders. We find that GPT-3 can match the performance of typical human coders and offers benefits over other machine learning methods of coding text. We find this across a variety of domains using very different coding procedures. This provides exciting evidence that language models can serve as a critical advance in the coding of open-ended texts in a variety of applications.


Leveraging Large Language Models for Multiple Choice Question Answering

arXiv.org Artificial Intelligence

While large language models (LLMs) like GPT-3 have achieved impressive results on multiple choice question answering (MCQA) tasks in the zero, one, and few-shot settings, they generally lag behind the MCQA state of the art (SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks. An LLM is conditioned on a question (without the associated answer options) and its chosen option is the one assigned the highest probability after normalization (for length, etc.). A more natural prompting approach is to present the question and answer options to the LLM jointly and have it output the symbol (e.g., "A") associated with its chosen answer option. This approach allows the model to explicitly compare answer options, reduces computational costs, and mitigates the effects of tokenization scheme and answer option representations on answer selection. For the natural approach to be effective, the LLM it is used with must be able to associate answer options with the symbols that represent them. The LLM needs what we term multiple choice symbol binding (MCSB) ability. This ability varies greatly by model. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse datasets and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated. Current state of the art (SOTA) methods on many multiple choice question answering (MCQA) tasks involve specialized models, extensive per-task engineering, and individualized tuning in general. What if one model could do just as well as each of these models does individually? This is part of a general vision for so-called foundation models (Bommasani et al., 2021). Foundation models include large pre-trained language models (LLMs) that have derived enough broad knowledge (spanning, for example, linguistic, factual, and commonsense (Liu et al., 2019; Amrami & Goldberg, 2018; Petroni et al., 2020; Bosselut et al.; Bouraoui et al.; Zuo et al., 2018; Bhagavatula et al., 2019)) to transfer from a simple language modelling objective to a huge array of natural language tasks. Interestingly, while LLMs have achieved SOTA results on many tasks, they generally fall short on MCQA. Why is this the case, given their general language modelling prowess as suggested by the low cross-entropy loss they attain with all their parameters, data, and compute (Kaplan et al., 2020; Henighan et al., 2020; Hernandez et al., 2021)?


Towards Neural Programming Interfaces

arXiv.org Artificial Intelligence

It is notoriously difficult to control the behavior of artificial neural networks such as generative neural language models. We recast the problem of controlling natural language generation as that of learning to interface with a pretrained language model, just as Application Programming Interfaces (APIs) control the behavior of programs by altering hyperparameters. In this new paradigm, a specialized neural network (called a Neural Programming Interface or NPI) learns to interface with a pretrained language model by manipulating the hidden activations of the pretrained model to produce desired outputs. Importantly, no permanent changes are made to the weights of the original model, allowing us to re-purpose pretrained models for new tasks without overwriting any aspect of the language model. We also contribute a new data set construction algorithm and GAN-inspired loss function that allows us to train NPI models to control outputs of autoregressive transformers. In experiments against other state-of-the-art approaches, we demonstrate the efficacy of our methods using OpenAI's GPT-2 model, successfully controlling noun selection, topic aversion, offensive speech filtering, and other aspects of language while largely maintaining the controlled model's fluency under deterministic settings.


Human-robot co-manipulation of extended objects: Data-driven models and control from analysis of human-human dyads

arXiv.org Artificial Intelligence

Human teams are able to easily perform collaborative manipulation tasks. However, for a robot and human to simultaneously manipulate an extended object is a difficult task using existing methods from the literature. Our approach in this paper is to use data from human-human dyad experiments to determine motion intent which we use for a physical human-robot co-manipulation task. We first present and analyze data from human-human dyads performing co-manipulation tasks. We show that our human-human dyad data has interesting trends including that interaction forces are non-negligible compared to the force required to accelerate an object and that the beginning of a lateral movement is characterized by distinct torque triggers from the leader of the dyad. We also examine different metrics to quantify performance of different dyads. We also develop a deep neural network based on motion data from human-human trials to predict human intent based on past motion. We then show how force and motion data can be used as a basis for robot control in a human-robot dyad. Finally, we compare the performance of two controllers for human-robot co-manipulation to human-human dyad performance.


Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning

arXiv.org Artificial Intelligence

A BSTRACT In the multi-objective reinforcement learning (MORL) paradigm, the relative importance of each environment objective is often unknown prior to training, so agents must learn to specialize their behavior to optimize different combinations of environment objectives that are specified post-training. These are typically linear combinations, so the agent is effectively parameterized by a weight vector that describes how to balance competing environment objectives. However, many real world behaviors require nonlinear combinations of objectives. Additionally, the conversion between desired behavior and weightings is often unclear. In this work, we explore the use of a language based on propositional logic with quantitative semantics-in place of weight vectors-for specifying nonlinear behaviors in an interpretable way. We use a recurrent encoder to encode logical combinations of objectives, and train a MORL agent to generalize over these encodings. We test our agent in several grid worlds with various objectives and show that our agent can generalize to many never-before-seen specifications with performance comparable to single policy baseline agents. We also demonstrate our agent's ability to generate meaningful policies when presented with novel specifications and quickly specialize to novel specifications. 1 I NTRODUCTION Reinforcement Learning (RL) is a method for learning behavior policies by maximizing expected reward through interactions with an environment. RL has grown in popularity as RL agents have excelled at increasingly complex tasks, including board games (Silver et al., 2016), video games (Mnih et al., 2015), robotic control (Haarnoja et al., 2018), and other high dimensional, complex tasks.


Wasserstein Neural Processes

arXiv.org Machine Learning

Neural Processes (NPs) are a class of models that learn a mapping from a context set of input-output pairs to a distribution over functions. They are traditionally trained using maximum likelihood with a KL divergence regularization term. We show that there are desirable classes of problems where NPs, with this loss, fail to learn any reasonable distribution. We also show that this drawback is solved by using approximations of Wasserstein distance which calculates optimal transport distances even for distributions of disjoint support. We give experimental justification for our method and demonstrate performance. These Wasserstein Neural Processes (WNPs) maintain all of the benefits of traditional NPs while being able to approximate a new class of function mappings.


Graph Neural Processes: Towards Bayesian Graph Neural Networks

arXiv.org Machine Learning

We introduce Graph Neural Processes (GNP), inspired by the recent work in conditional and latent neural processes. A Graph Neural Process is defined as a Conditional Neural Process that operates on arbitrary graph data. It takes features of sparsely observed context points as input, and outputs a distribution over target points. We demonstrate graph neural processes in edge imputation and discuss benefits and drawbacks of the method for other application areas. One major benefit of GNPs is the ability to quantify uncertainty in deep learning on graph structures. An additional benefit of this method is the ability to extend graph neural networks to inputs of dynamic sized graphs.


Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs

arXiv.org Artificial Intelligence

As autonomous agents become more ubiquitous, they will eventually have to reason about the mental state of other agents, including those agents' beliefs, desires and goals - so-called theory of mind reasoning. We introduce a collection of increasingly complex theory of mind models of a "chaser" pursuing a "runner", known as the Chaser-Runner model. We show that our implementation is a relatively straightforward theory of mind model that can capture a variety of rich behaviors, which in turn, increase runner detection rates relative to basic (non-theory-of-mind) models. In addition, our paper demonstrates that (1) using a planning-as-inference formulation based on nested importance sampling results in agents simultaneously reasoning about other agents' plans and crafting counter-plans, (2) probabilistic programming is a natural way to describe models in which each uses complex primitives such as path planners to make decisions, and (3) allocating additional computation to perform nested reasoning about agents result in lower-variance estimates of expected utility.