AITopics | Jensen, David

Collaborating Authors

Jensen, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability

Nainani, Jatin, Vaidyanathan, Sankaran, Yeung, AJ, Gupta, Kartik, Jensen, David

arXiv.org Artificial IntelligenceDec-5-2024

Mechanistic interpretability aims to understand the inner workings of large neural networks by identifying circuits, or minimal subgraphs within the model that implement algorithms responsible for performing specific tasks. These circuits are typically discovered and analyzed using a narrowly defined prompt format. However, given the abilities of large language models (LLMs) to generalize across various prompt formats for the same task, it remains unclear how well these circuits generalize. For instance, it is unclear whether the models generalization results from reusing the same circuit components, the components behaving differently, or the use of entirely different components. In this paper, we investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small, which is well-studied and believed to implement a simple, interpretable algorithm. We evaluate its performance on prompt variants that challenge the assumptions of this algorithm. Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges. Notably, the circuit generalizes even to prompt variants where the original algorithm should fail; we discover a mechanism that explains this which we term S2 Hacking. Our findings indicate that circuits within LLMs may be more flexible and general than previously recognized, underscoring the importance of studying circuit generalization to better understand the broader capabilities of these models.

base ioi circuit, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.16105

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Automated Discovery of Functional Actual Causes in Complex Environments

Chuck, Caleb, Vaidyanathan, Sankaran, Giguere, Stephen, Zhang, Amy, Jensen, David, Niekum, Scott

arXiv.org Artificial IntelligenceApr-16-2024

Reinforcement learning (RL) algorithms often struggle to learn policies that generalize to novel situations due to issues such as causal confusion, overfitting to irrelevant factors, and failure to isolate control of state factors. These issues stem from a common source: a failure to accurately identify and exploit state-specific causal relationships in the environment. While some prior works in RL aim to identify these relationships explicitly, they rely on informal domain-specific heuristics such as spatial and temporal proximity. Actual causality offers a principled and general framework for determining the causes of particular events. However, existing definitions of actual cause often attribute causality to a large number of events, even if many of them rarely influence the outcome. Prior work on actual causality proposes normality as a solution to this problem, but its existing implementations are challenging to scale to complex and continuous-valued RL environments. This paper introduces functional actual cause (FAC), a framework that uses context-specific independencies in the environment to restrict the set of actual causes. We additionally introduce Joint Optimization for Actual Cause Inference (JACI), an algorithm that learns from observational data to infer functional actual causes. We demonstrate empirically that FAC agrees with known results on a suite of examples from the actual causality literature, and JACI identifies actual causes with significantly higher accuracy than existing heuristic methods in a set of complex, continuous-valued environments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2404.10883

Country:

Europe > United Kingdom > England (0.28)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (1.00)

Industry:

Law (0.67)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
(3 more...)

Add feedback

Brittle AI, Causal Confusion, and Bad Mental Models: Challenges and Successes in the XAI Program

Druce, Jeff, Niehaus, James, Moody, Vanessa, Jensen, David, Littman, Michael L.

arXiv.org Artificial IntelligenceJun-10-2021

The advances in artificial intelligence enabled by deep learning architectures are undeniable. In several cases, deep neural network driven models have surpassed human level performance in benchmark autonomy tasks. The underlying policies for these agents, however, are not easily interpretable. In fact, given their underlying deep models, it is impossible to directly understand the mapping from observations to actions for any reasonably complex agent. Producing this supporting technology to "open the black box" of these AI systems, while not sacrificing performance, was the fundamental goal of the DARPA XAI program. In our journey through this program, we have several "big picture" takeaways: 1) Explanations need to be highly tailored to their scenario; 2) many seemingly high performing RL agents are extremely brittle and are not amendable to explanation; 3) causal models allow for rich explanations, but how to present them isn't always straightforward; and 4) human subjects conjure fantastically wrong mental models for AIs, and these models are often hard to break. This paper discusses the origins of these takeaways, provides amplifying information, and suggestions for future work.

computer game, deep learning, explanation, (22 more...)

arXiv.org Artificial Intelligence

2106.05506

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Government > Regional Government > North America Government > United States Government (0.67)
Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

A Simulation-Based Test of Identifiability for Bayesian Causal Inference

Witty, Sam, Jensen, David, Mansinghka, Vikash

arXiv.org Artificial IntelligenceFeb-23-2021

This paper introduces a procedure for testing the identifiability of Bayesian models for causal inference. Although the do-calculus is sound and complete given a causal graph, many practical assumptions cannot be expressed in terms of graph structure alone, such as the assumptions required by instrumental variable designs, regression discontinuity designs, and within-subjects designs. We present simulation-based identifiability (SBI), a fully automated identification test based on a particle optimization scheme with simulated observations. This approach expresses causal assumptions as priors over functions in a structural causal model, including flexible priors using Gaussian processes. We prove that SBI is asymptotically sound and complete, and produces practical finite-sample bounds. We also show empirically that SBI agrees with known results in graph-based identification as well as with widely-held intuitions for designs in which graph-based methods are inconclusive.

bayesian inference, health & medicine, identifiability, (16 more...)

arXiv.org Artificial Intelligence

2102.11761

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Causal Inference using Gaussian Processes with Structured Latent Confounders

Witty, Sam, Takatsu, Kenta, Jensen, David, Mansinghka, Vikash

arXiv.org Machine LearningJul-14-2020

Latent confounders---unobserved variables that influence both treatment and outcome---can bias estimates of causal effects. In some cases, these confounders are shared across observations, e.g. all students taking a course are influenced by the course's difficulty in addition to any educational interventions they receive individually. This paper shows how to semiparametrically model latent confounders that have this structure and thereby improve estimates of causal effects. The key innovations are a hierarchical Bayesian model, Gaussian processes with structured latent confounders (GP-SLC), and a Monte Carlo inference algorithm for this model based on elliptical slice sampling. GP-SLC provides principled Bayesian uncertainty estimates of individual treatment effect with minimal assumptions about the functional forms relating confounders, covariates, treatment, and outcome. Finally, this paper shows GP-SLC is competitive with or more accurate than widely used causal inference techniques on three benchmark datasets, including the Infant Health and Development Program and a dataset showing the effect of changing temperatures on state-wide energy consumption across New England.

bayesian inference, confounder, health & medicine, (17 more...)

arXiv.org Machine Learning

2007.07127

Country:

Europe (0.67)
North America > United States > New York (0.46)
North America > United States > Massachusetts (0.46)

Genre: Research Report (1.00)

Industry:

Education (0.93)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data

Gentzel, Amanda, Garant, Dan, Jensen, David

arXiv.org Artificial IntelligenceOct-11-2019

Causal inference is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. An active community of researchers develops and enhances algorithms that learn causal models from data, and this work has produced a series of impressive technical advances. However, evaluation techniques for causal modeling algorithms have remained somewhat primitive, limiting what we can learn from experimental studies of algorithm performance, constraining the types of algorithms and model representations that researchers consider, and creating a gap between theory and practice. We argue for more frequent use of evaluation techniques that examine interventional measures rather than structural or observational measures, and that evaluate those measures on empirical data rather than synthetic data. We survey the current practice in evaluation and show that these are rarely used in practice. We show that such techniques are feasible and that data sets are available to conduct such evaluations. We also show that these techniques produce substantially different results than using structural measures and synthetic data.

artificial intelligence, expert system, interventional measure and empirical data, (1 more...)

arXiv.org Artificial Intelligence

1910.05387

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.53)

Add feedback

Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning

Tosch, Emma, Clary, Kaleigh, Foley, John, Jensen, David

arXiv.org Machine LearningMay-7-2019

While ALE has enabled demonstration and evaluation of much more complex behaviors of deep RL agents, it Evaluation of deep reinforcement learning (RL) presents challenges as a suite of evaluation environments is inherently challenging. In particular, learned for topics on the frontier of deep RL. policies are largely opaque, and hypotheses about Challenge: Limited variation within games. Very little about the behavior of deep RL agents are difficult to individual games can be systematically altered, so ALE is test in black-box environments. Considerable effort poorly suited to testing how changes in the environment has gone into addressing opacity, but almost affect training and performance. New benchmarks such as no effort has been devoted to producing highquality OpenAI's Sonic the Hedgehog emulator and CoinRun inject environments for experimental evaluation environmental variation into the training schedule, while of agent behavior.

artificial intelligence, computer game, toybox, (16 more...)

arXiv.org Machine Learning

1905.02825

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

Clary, Kaleigh, Tosch, Emma, Foley, John, Jensen, David

arXiv.org Artificial IntelligenceApr-12-2019

Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variability in reinforcement learning agents that make reporting common summary statistics an unsound metric for performance. Our experiments demonstrate the variability of common agents used in the popular OpenAI Baselines repository. We make the case for reporting post-training agent performance as a distribution, rather than a point estimate.

artificial intelligence, reinforcement learning, variability, (18 more...)

arXiv.org Artificial Intelligence

1904.06312

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.16)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Measuring and Characterizing Generalization in Deep Reinforcement Learning

Witty, Sam, Lee, Jun Ki, Tosch, Emma, Atrey, Akanksha, Littman, Michael, Jensen, David

arXiv.org Artificial IntelligenceDec-11-2018

Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though those states are not selected adversarially. Taken together, these results call into question the extent to which deep Q-networks learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported.

agent, artificial intelligence, computer game, (19 more...)

arXiv.org Artificial Intelligence

1812.02868

Country: North America > United States > Massachusetts (0.14)

Genre:

Research Report > Experimental Study (0.35)
Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents

Foley, John, Tosch, Emma, Clary, Kaleigh, Jensen, David

arXiv.org Artificial IntelligenceDec-10-2018

It is a widely accepted principle that software without tests has bugs. Testing reinforcement learning agents is especially difficult because of the stochastic nature of both agents and environments, the complexity of state-of-the-art models, and the sequential nature of their predictions. Recently, the Arcade Learning Environment (ALE) has become one of the most widely used benchmark suites for deep learning research, and state-of-the-art Reinforcement Learning (RL) agents have been shown to routinely equal or exceed human performance on many ALE tasks. Since ALE is based on emulation of original Atari games, the environment does not provide semantically meaningful representations of internal game state. This means that ALE has limited utility as an environment for supporting testing or model introspection. We propose TOYBOX, a collection of reimplementations of these games that solves this critical problem and enables robust testing of RL agents.

agent, computer game, deep learning, (18 more...)

arXiv.org Artificial Intelligence

1812.0285

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.15)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback