AITopics | Ammanabrolu, Prithviraj

Collaborating Authors

Ammanabrolu, Prithviraj

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Lin, Bill Yuchen, Fu, Yicheng, Yang, Karina, Brahman, Faeze, Huang, Shiyu, Bhagavatula, Chandra, Ammanabrolu, Prithviraj, Choi, Yejin, Ren, Xiang

arXiv.org Artificial IntelligenceDec-6-2023

We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage integrates the strengths of behavior cloning and prompting large language models (LLMs) to enhance task completion performance. The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. The Swift module is a small encoder-decoder LM fine-tuned on the oracle agent's action trajectories, while the Sage module employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process. In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflexion, demonstrating its effectiveness in solving complex interactive tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.1739

Country: North America > United States > California (0.46)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Education (0.67)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Lu, Ximing, Brahman, Faeze, West, Peter, Jang, Jaehun, Chandu, Khyathi, Ravichander, Abhilasha, Qin, Lianhui, Ammanabrolu, Prithviraj, Jiang, Liwei, Ramnath, Sahana, Dziri, Nouha, Fisher, Jillian, Lin, Bill Yuchen, Hallinan, Skyler, Ren, Xiang, Welleck, Sean, Choi, Yejin

arXiv.org Artificial IntelligenceDec-6-2023

While extreme-scale language models have demonstrated exceptional performance on a variety of language tasks, the degree of control over these language models through pure prompting can often be limited. Directly fine-tuning such language models can be effective for tailoring them, but it can be either extremely costly (e.g., GPT-3) or not even feasible for the broader community (e.g., GPT-4). We propose Inference-time Policy Adapters (IPA), which efficiently tailors a language model such as GPT-3 without fine-tuning it. IPA guides a large base model during decoding time through a lightweight policy adapter trained to optimize an arbitrary user objective with reinforcement learning. On five challenging text generation tasks, such as toxicity reduction and lexically constrained generation, IPA consistently brings significant improvements over off-the-shelf language models. It outperforms competitive baseline methods, sometimes even including expensive fine-tuning. In particular, tailoring GPT-2 with IPA can outperform GPT-3, while tailoring GPT-3 with IPA brings a major performance boost over GPT-3 (and sometimes even over GPT-4). Our promising results highlight the potential of IPA as a lightweight alternative to tailoring extreme-scale language models.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2305.15065

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Wu, Zeqiu, Hu, Yushi, Shi, Weijia, Dziri, Nouha, Suhr, Alane, Ammanabrolu, Prithviraj, Smith, Noah A., Ostendorf, Mari, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceOct-30-2023

Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). We conduct experiments on detoxification and long-form question answering to illustrate how learning with such reward functions leads to improved performance, supported by both automatic and human evaluation. Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models. We release all data, collected human feedback, and codes at https://FineGrainedRLHF.github.io.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.01693

Country:

Europe (1.00)
North America > United States > California (0.67)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Marine (1.00)
Leisure & Entertainment > Sports > Football (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

Jang, Joel, Kim, Seungone, Lin, Bill Yuchen, Wang, Yizhong, Hessel, Jack, Zettlemoyer, Luke, Hajishirzi, Hannaneh, Choi, Yejin, Ammanabrolu, Prithviraj

arXiv.org Artificial IntelligenceOct-17-2023

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning diverse, individual perspectives. In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strong single-objective baselines, we show that we can achieve personalized alignment by decomposing preferences into multiple dimensions. These dimensions are defined based on personalizations that are declared as desirable by the user. In this work, we show that they can be efficiently trained independently in a distributed manner and combined effectively post-hoc through parameter merging. The code is available at https://github.com/joeljang/RLPHF.

large language model, machine learning, reinforcement learning, (6 more...)

arXiv.org Artificial Intelligence

2310.11564

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions

Wu, Zeqiu, Parish, Ryu, Cheng, Hao, Min, Sewon, Ammanabrolu, Prithviraj, Ostendorf, Mari, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceJun-22-2023

In an information-seeking conversation, a user may ask questions that are under-specified or unanswerable. An ideal agent would interact by initiating different response types according to the available knowledge sources. However, most current studies either fail to or artificially incorporate such agent-side initiative. This work presents InSCIt, a dataset for Information-Seeking Conversations with mixed-initiative Interactions. It contains 4.7K user-agent turns from 805 human-human conversations where the agent searches over Wikipedia and either directly answers, asks for clarification, or provides relevant information to address user queries. The data supports two subtasks, evidence passage identification and response generation, as well as a human evaluation protocol to assess model performance. We report results of two systems based on state-of-the-art models of conversational knowledge identification and open-domain question answering. Both systems significantly underperform humans, suggesting ample room for improvement in future studies.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2207.00746

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.93)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

I Cast Detect Thoughts: Learning to Converse and Guide with Intents and Theory-of-Mind in Dungeons and Dragons

Zhou, Pei, Zhu, Andrew, Hu, Jennifer, Pujara, Jay, Ren, Xiang, Callison-Burch, Chris, Choi, Yejin, Ammanabrolu, Prithviraj

arXiv.org Artificial IntelligenceMay-30-2023

We propose a novel task, G4C, to study teacher-student natural language interactions in a goal-driven and grounded environment. Dungeons and Dragons (D&D), a role-playing game, provides an ideal setting to investigate such interactions. Here, the Dungeon Master (DM), i.e., the teacher, guides the actions of several players -- students, each with their own personas and abilities -- to achieve shared goals grounded in a fantasy world. Our approach is to decompose and model these interactions into (1) the DM's intent to guide players toward a given goal; (2) the DM's guidance utterance to the players expressing this intent; and (3) a theory-of-mind (ToM) model that anticipates the players' reaction to the guidance one turn into the future. We develop a novel reinforcement learning (RL) method for training a DM that generates guidance for players by rewarding utterances where the intent matches the ToM-anticipated player actions. Human and automated evaluations show that a DM trained to explicitly model intents and incorporate ToM of the players using RL generates better-quality guidance that is 3x more likely to fulfill the DM's intent than a vanilla natural language generation (NLG) approach.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.1006

Country:

North America > United States > California (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling

Nottingham, Kolby, Ammanabrolu, Prithviraj, Suhr, Alane, Choi, Yejin, Hajishirzi, Hannaneh, Singh, Sameer, Fox, Roy

arXiv.org Artificial IntelligenceApr-27-2023

Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world. However, if initialized with knowledge of high-level subgoals and transitions between subgoals, RL agents could utilize this Abstract World Model (AWM) for planning and exploration. We propose using few-shot large language models (LLMs) to hypothesize an AWM, that will be verified through world experience, to improve sample efficiency of RL agents. Our DECKARD agent applies LLM-guided exploration to item crafting in Minecraft in two phases: (1) the Dream phase where the agent uses an LLM to decompose a task into a sequence of subgoals, the hypothesized AWM; and (2) the Wake phase where the agent learns a modular policy for each subgoal and verifies or corrects the hypothesized AWM. Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude but is also robust to and corrects errors in the LLM, successfully blending noisy internet-scale information from LLMs with knowledge grounded in environment dynamics.

deckard, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2301.1205

Country:

North America > United States > Hawaii (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Materials > Metals & Mining (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Ramamurthy, Rajkumar, Ammanabrolu, Prithviraj, Brantley, Kianté, Hessel, Jack, Sifa, Rafet, Bauckhage, Christian, Hajishirzi, Hannaneh, Choi, Yejin

arXiv.org Artificial IntelligenceFeb-28-2023

We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL for LM-based generation faces empirical challenges, including training instability due to the combinatorial action space, as well as a lack of open-source libraries and benchmarks customized for LM alignment. Thus, a question rises in the research community: is RL a practical paradigm for NLP? To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL. The library consists of on-policy RL algorithms that can be used to train any encoder or encoder-decoder LM in the HuggingFace library (Wolf et al. 2020) with an arbitrary reward function. Next, we present the GRUE (General Reinforced-language Understanding Evaluation) benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions which capture automated measures of human preference. GRUE is the first leaderboard-style evaluation of RL algorithms for NLP tasks. Finally, we introduce an easy-to-use, performant RL algorithm, NLPO (Natural Language Policy Optimization) that learns to effectively reduce the combinatorial action space in language generation. We show 1) that RL techniques are generally better than supervised methods at aligning LMs to human preferences; and 2) that NLPO exhibits greater stability and performance than previous policy gradient methods (e.g., PPO (Schulman et al. 2017)), based on both automatic and human evaluations.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2210.01241

Country:

Europe (1.00)
Asia (0.92)
North America > United States > New York (0.14)
(3 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Industry:

Transportation > Air (1.00)
Media > Film (1.00)
Leisure & Entertainment > Sports > Soccer (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Behavior Cloned Transformers are Neurosymbolic Reasoners

Wang, Ruoyao, Jansen, Peter, Côté, Marc-Alexandre, Ammanabrolu, Prithviraj

arXiv.org Artificial IntelligenceFeb-11-2023

In this work, we explore techniques for augmenting interactive agents with information from symbolic modules, much like humans use tools like calculators and GPS systems to assist with arithmetic and navigation. We test our agent's abilities in text games -- challenging benchmarks for evaluating the multi-step reasoning abilities of game agents in grounded, language-based environments. Our experimental study indicates that injecting the actions from these symbolic modules into the action space of a behavior cloned transformer agent increases performance on four text game benchmarks that test arithmetic, navigation, sorting, and common sense reasoning by an average of 22%, allowing an agent to reach the highest possible performance on unseen games. This action injection technique is easily extended to new agents, environments, and symbolic modules.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.07382

Country:

Europe (0.68)
North America > United States > Arizona (0.28)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.87)

Add feedback

Inherently Explainable Reinforcement Learning in Natural Language

Peng, Xiangyu, Riedl, Mark O., Ammanabrolu, Prithviraj

arXiv.org Artificial IntelligenceDec-16-2021

We focus on the task of creating a reinforcement learning agent that is inherently explainable -- with the ability to produce immediate local explanations by thinking out loud while performing a task and analyzing entire trajectories post-hoc to produce causal explanations. This Hierarchically Explainable Reinforcement Learning agent (HEX-RL), operates in Interactive Fictions, text-based game environments in which an agent perceives and acts upon the world using textual natural language. These games are usually structured as puzzles or quests with long-term dependencies in which an agent must complete a sequence of actions to succeed -- providing ideal environments in which to test an agent's ability to explain its actions. Our agent is designed to treat explainability as a first-class citizen, using an extracted symbolic knowledge graph-based state representation coupled with a Hierarchical Graph Attention mechanism that points to the facts in the internal graph representation that most influenced the choice of actions. Experiments show that this agent provides significantly improved explanations over strong baselines, as rated by human participants generally unfamiliar with the environment, while also matching state-of-the-art task performance.

explanation, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2112.08907

Country: Oceania > Australia (0.14)

Genre: Research Report (0.85)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback