Soni, Utkarsh
Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments
Thai, Tung, Shen, Ming, Garg, Mayank, Kalani, Ayush, Vaidya, Nakul, Soni, Utkarsh, Verma, Mudit, Gopalakrishnan, Sriram, Varshney, Neeraj, Baral, Chitta, Kambhampati, Subbarao, Sinapov, Jivko, Scheutz, Matthias
Examples of such domains are "perfect information Learning to detect, characterize and accommodate novelties is a games" such as Chess, Go, or Ms.Pac-man, where the rules challenge that agents operating in open-world domains need to of the game, the goals of the players, and the entire state of the address to be able to guarantee satisfactory task performance. Certain game are always known by all agents [10, 24, 30]. This characteristic novelties (e.g., changes in environment dynamics) can interfere simplifies the game AI behavior by limiting the number of novelties with the performance or prevent agents from accomplishing task to instances of known types (e.g., a chess move with the bishop goals altogether. In this paper, we introduce general methods and a player has not seen before), thus allowing the development of architectural mechanisms for detecting and characterizing different the game AI without needing to anticipate any unknown scenarios types of novelties, and for building an appropriate adaptive within the bounds of the system (e.g., a novel piece with novel rules model to accommodate them utilizing logical representations and being introduced).
Towards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion
Soni, Utkarsh, Thakur, Nupur, Sreedharan, Sarath, Guan, Lin, Verma, Mudit, Marquez, Matthew, Kambhampati, Subbarao
There is a growing interest in developing automated agents that can work alongside humans. In addition to completing the assigned task, such an agent will undoubtedly be expected to behave in a manner that is preferred by the human. This requires the human to communicate their preferences to the agent. To achieve this, the current approaches either require the users to specify the reward function or the preference is interactively learned from queries that ask the user to compare behavior. The former approach can be challenging if the internal representation used by the agent is inscrutable to the human while the latter is unnecessarily cumbersome for the user if their preference can be specified more easily in symbolic terms. In this work, we propose PRESCA (PREference Specification through Concept Acquisition), a system that allows users to specify their preferences in terms of concepts that they understand. PRESCA maintains a set of such concepts in a shared vocabulary. If the relevant concept is not in the shared vocabulary, then it is learned. To make learning a new concept more feedback efficient, PRESCA leverages causal associations between the target concept and concepts that are already known. In addition, we use a novel data augmentation approach to further reduce required feedback. We evaluate PRESCA by using it on a Minecraft environment and show that it can effectively align the agent with the user's preference.
Integrating Planning, Execution and Monitoring in the presence of Open World Novelties: Case Study of an Open World Monopoly Solver
Gopalakrishnan, Sriram, Soni, Utkarsh, Thai, Tung, Lymperopoulos, Panagiotis, Scheutz, Matthias, Kambhampati, Subbarao
The game of monopoly is an adversarial multi-agent domain where there is no fixed goal other than to be the last player solvent, There are useful subgoals like monopolizing sets of properties, and developing them. There is also a lot of randomness from dice rolls, card-draws, and adversaries' strategies. This unpredictability is made worse when unknown novelties are added during gameplay. Given these challenges, Monopoly was one of the test beds chosen for the DARPA-SAILON program which aims to create agents that can detect and accommodate novelties. To handle the game complexities, we developed an agent that eschews complete plans, and adapts it's policy online as the game evolves. In the most recent independent evaluation in the SAILON program, our agent was the best performing agent on most measures. We herein present our approach and results.