AITopics | Milani, Stephanie

Collaborating Authors

Milani, Stephanie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Jucys, Karolis, Adamopoulos, George, Hamidi, Mehrab, Milani, Stephanie, Samsami, Mohammad Reza, Zholus, Artem, Joseph, Sonia, Richards, Blake, Rish, Irina, Şimşek, Özgür

arXiv.org Artificial IntelligenceJul-16-2024

Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task - crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2407.12161

Country:

North America > United States (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games > Computer Games (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(2 more...)

Add feedback

PATIENT-{\Psi}: Using Large Language Models to Simulate Patients for Training Mental Health Professionals

Wang, Ruiyi, Milani, Stephanie, Chiu, Jamie C., Zhi, Jiayin, Eack, Shaun M., Labrum, Travis, Murphy, Samuel M., Jones, Nev, Hardy, Kate, Shen, Hong, Fang, Fei, Chen, Zhiyu Zoey

arXiv.org Artificial IntelligenceJun-18-2024

Mental illness remains one of the most critical public health issues. Despite its importance, many mental health professionals highlight a disconnect between their training and actual real-world patient practice. To help bridge this gap, we propose PATIENT-{\Psi}, a novel patient simulation framework for cognitive behavior therapy (CBT) training. To build PATIENT-{\Psi}, we construct diverse patient cognitive models based on CBT principles and use large language models (LLMs) programmed with these cognitive models to act as a simulated therapy patient. We propose an interactive training scheme, PATIENT-{\Psi}-TRAINER, for mental health trainees to practice a key skill in CBT -- formulating the cognitive model of the patient -- through role-playing a therapy session with PATIENT-{\Psi}. To evaluate PATIENT-{\Psi}, we conducted a comprehensive user study of 13 mental health trainees and 20 experts. The results demonstrate that practice using PATIENT-{\Psi}-TRAINER enhances the perceived skill acquisition and confidence of the trainees beyond existing forms of training such as textbooks, videos, and role-play with non-patients. Based on the experts' perceptions, PATIENT-{\Psi} is perceived to be closer to real patient interactions than GPT-4, and PATIENT-{\Psi}-TRAINER holds strong promise to improve trainee competencies. Our code and data are released at \url{https://github.com/ruiyiw/patient-psi}.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2405.1966

Country:

Europe (0.67)
North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Unifying Interpretability and Explainability for Alzheimer's Disease Progression Prediction

Ali, Raja Farrukh, Milani, Stephanie, Woods, John, Adenij, Emmanuel, Farooq, Ayesha, Mansel, Clayton, Burns, Jeffrey, Hsu, William

arXiv.org Artificial IntelligenceJun-11-2024

Reinforcement learning (RL) has recently shown promise in predicting Alzheimer's disease (AD) progression due to its unique ability to model domain knowledge. However, it is not clear which RL algorithms are well-suited for this task. Furthermore, these methods are not inherently explainable, limiting their applicability in real-world clinical scenarios. Our work addresses these two important questions. Using a causal, interpretable model of AD, we first compare the performance of four contemporary RL algorithms in predicting brain cognition over 10 years using only baseline (year 0) data. We then apply SHAP (SHapley Additive exPlanations) to explain the decisions made by each algorithm in the model. Our approach combines interpretability with explainability to provide insights into the key factors influencing AD progression, offering both global and individual, patient-level analysis. Our findings show that only one of the RL methods is able to satisfactorily model disease progression, but the post-hoc explanations indicate that all methods fail to properly capture the importance of amyloid accumulation, one of the pathological hallmarks of Alzheimer's disease. Our work aims to merge predictive accuracy with transparency, assisting clinicians and researchers in enhancing disease progression modeling for informed healthcare decisions. Code is available at https://github.com/rfali/xrlad.

machine learning, prediction, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2406.07777

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks

Milani, Stephanie, Kanervisto, Anssi, Ramanauskas, Karolis, Schulhoff, Sander, Houghton, Brandon, Shah, Rohin

arXiv.org Artificial IntelligenceDec-4-2023

The MineRL BASALT competition has served to catalyze advances in learning from human feedback through four hard-to-specify tasks in Minecraft, such as create and photograph a waterfall. Given the completion of two years of BASALT competitions, we offer to the community a formalized benchmark through the BASALT Evaluation and Demonstrations Dataset (BEDD), which serves as a resource for algorithm development and performance assessment. BEDD consists of a collection of 26 million image-action pairs from nearly 14,000 videos of human players completing the BASALT tasks in Minecraft. It also includes over 3,000 dense pairwise human evaluations of human and algorithmic agents. These comparisons serve as a fixed, preliminary leaderboard for evaluating newly-developed algorithms. To enable this comparison, we present a streamlined codebase for benchmarking new algorithms against the leaderboard. In addition to presenting these datasets, we conduct a detailed analysis of the data from both datasets to guide algorithm development and evaluation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.02405

Country:

North America > United States (0.14)
Europe (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Games (0.90)

Add feedback

Bi-level Latent Variable Model for Sample-Efficient Multi-Agent Reinforcement Learning

Venugopal, Aravind, Milani, Stephanie, Fang, Fei, Ravindran, Balaraman

arXiv.org Artificial IntelligenceApr-12-2023

Despite their potential in real-world applications, multi-agent reinforcement learning (MARL) algorithms often suffer from high sample complexity. To address this issue, we present a novel model-based MARL algorithm, BiLL (Bi-Level Latent Variable Model-based Learning), that learns a bi-level latent variable model from high-dimensional inputs. At the top level, the model learns latent representations of the global state, which encode global information relevant to behavior learning. At the bottom level, it learns latent representations for each agent, given the global latent representations from the top level. The model generates latent trajectories to use for policy learning. We evaluate our algorithm on complex multi-agent tasks in the challenging SMAC and Flatland environments. Our algorithm outperforms state-of-the-art model-free and model-based baselines in sample efficiency, including on two extremely challenging Super Hard SMAC maps.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2304.06011

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.54)
Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Milani, Stephanie, Kanervisto, Anssi, Ramanauskas, Karolis, Schulhoff, Sander, Houghton, Brandon, Mohanty, Sharada, Galbraith, Byron, Chen, Ke, Song, Yan, Zhou, Tianze, Yu, Bingquan, Liu, He, Guan, Kai, Hu, Yujing, Lv, Tangjie, Malato, Federico, Leopold, Florian, Raut, Amogh, Hautamäki, Ville, Melnik, Andrew, Ishida, Shu, Henriques, João F., Klassert, Robert, Laurito, Walter, Novoseller, Ellen, Goecks, Vinicius G., Waytowich, Nicholas, Watkins, David, Miller, Josh, Shah, Rohin

arXiv.org Artificial IntelligenceMar-23-2023

To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2303.13512

Country: Europe (0.46)

Genre:

Contests & Prizes (0.52)
Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Navigates Like Me: Understanding How People Evaluate Human-Like AI in Video Games

Milani, Stephanie, Juliani, Arthur, Momennejad, Ida, Georgescu, Raluca, Rzpecki, Jaroslaw, Shaw, Alison, Costello, Gavin, Fang, Fei, Devlin, Sam, Hofmann, Katja

arXiv.org Artificial IntelligenceMar-2-2023

We aim to understand how people assess human likeness in navigation produced by people and artificially intelligent (AI) agents in a video game. To this end, we propose a novel AI agent with the goal of generating more human-like behavior. We collect hundreds of crowd-sourced assessments comparing the human-likeness of navigation behavior generated by our agent and baseline AI agents with human-generated behavior. Our proposed agent passes a Turing Test, while the baseline agents do not. By passing a Turing Test, we mean that human judges could not quantitatively distinguish between videos of a person and an AI agent navigating. To understand what people believe constitutes human-like navigation, we extensively analyze the justifications of these assessments. This work provides insights into the characteristics that people consider human-like in the context of goal-directed video game navigation, which is a key step for further improving human interactions with AI agents.

agent, artificial intelligence, participant, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3544548.3581348

2303.0216

Country:

Europe (0.95)
North America > United States > New York > New York County > New York City (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)
Questionnaire & Opinion Survey (0.92)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Games (1.00)

Add feedback

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Carroll, Micah, Lin, Jessy, Paradise, Orr, Georgescu, Raluca, Sun, Mingfei, Bignell, David, Milani, Stephanie, Hofmann, Katja, Hausknecht, Matthew, Dragan, Anca, Devlin, Sam

arXiv.org Artificial IntelligenceDec-9-2022

Note: This is paper is superseded by the full version (Carroll et al., 2022). Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest. Masked language modeling (Devlin et al., 2018) is a key technique in natural language processing (NLP). Under this paradigm, models are trained to predict randomly-masked subsets of tokens in a sequence.

machine learning, natural language, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2204.13326

Country: Asia (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

UniMASK: Unified Inference in Sequential Decision Problems

Carroll, Micah, Paradise, Orr, Lin, Jessy, Georgescu, Raluca, Sun, Mingfei, Bignell, David, Milani, Stephanie, Hofmann, Katja, Hausknecht, Matthew, Dragan, Anca, Devlin, Sam

arXiv.org Artificial IntelligenceNov-19-2022

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline reinforcement learning, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the Uni[MASK] framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single Uni[MASK] model is often capable of carrying out many tasks with performance similar to or better than single-task models. Additionally, after fine-tuning, our Uni[MASK] models consistently outperform comparable single-task models. Our code is publicly available here.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2211.10869

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The MineRL BASALT Competition on Learning from Human Feedback

Shah, Rohin, Wild, Cody, Wang, Steven H., Alex, Neel, Houghton, Brandon, Guss, William, Mohanty, Sharada, Kanervisto, Anssi, Milani, Stephanie, Topin, Nicholay, Abbeel, Pieter, Russell, Stuart, Dragan, Anca

arXiv.org Artificial IntelligenceJul-5-2021

The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have been proposed, in this competition we focus on one in particular: learning from human feedback. Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve. The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. These tasks are defined by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants must train a separate agent for each task, using any method they want. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline that leverages these demonstrations. Our hope is that this competition will improve our ability to build AI systems that do what their designers intend them to do, even when the intent cannot be easily formalized. Besides allowing AI to solve more tasks, this can also enable more effective regulation of AI systems, as well as making progress on the value alignment problem.

competition, computer game, deep learning, (20 more...)

arXiv.org Artificial Intelligence

2107.01969

Country: North America > United States > Maryland (0.28)

Genre:

Research Report (0.64)
Personal > Honors (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)
Government > Military (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback