Agents
Guiding Policies with Language via Meta-Learning
Co-Reyes, John D., Gupta, Abhishek, Sanjeev, Suvansh, Altieri, Nick, DeNero, John, Abbeel, Pieter, Levine, Sergey
Behavioral skills or policies for autonomous agents are conventionally learned from reward functions, via reinforcement learning, or from demonstrations, via imitation learning. However, both modes of task specification have their disadvantages: reward functions require manual engineering, while demonstrations require a human expert to be able to actually perform the task in order to generate the demonstration. Instruction following from natural language instructions provides an appealing alternative: in the same way that we can specify goals to other humans simply by speaking or writing, we would like to be able to specify tasks for our machines. However, a single instruction may be insufficient to fully communicate our intent or, even if it is, may be insufficient for an autonomous agent to actually understand how to perform the desired task. In this work, we propose an interactive formulation of the task specification problem, where iterative language corrections are provided to an autonomous agent, guiding it in acquiring the desired skill. Our proposed language-guided policy learning algorithm can integrate an instruction and a sequence of corrections to acquire new skills very quickly. In our experiments, we show that this method can enable a policy to follow instructions and corrections for simulated navigation and manipulation tasks, substantially outperforming direct, non-interactive instruction following.
Self-Organizing Maps for Storage and Transfer of Knowledge in Reinforcement Learning
Karimpanal, Thommen George, Bouffanais, Roland
The idea of reusing or transferring information from previously learned tasks (source tasks) for the learning of new tasks (target tasks) has the potential to significantly improve the sample efficiency of a reinforcement learning agent. In this work, we describe a novel approach for reusing previously acquired knowledge by using it to guide the exploration of an agent while it learns new tasks. In order to do so, we employ a variant of the growing self-organizing map algorithm, which is trained using a measure of similarity that is defined directly in the space of the vectorized representations of the value functions. In addition to enabling transfer across tasks, the resulting map is simultaneously used to enable the efficient storage of previously acquired task knowledge in an adaptive and scalable manner. We empirically validate our approach in a simulated navigation environment, and also demonstrate its utility through simple experiments using a mobile micro-robotics platform. In addition, we demonstrate the scalability of this approach, and analytically examine its relation to the proposed network growth mechanism. Further, we briefly discuss some of the possible improvements and extensions to this approach, as well as its relevance to real world scenarios in the context of continual learning.
Economics of Human-AI Ecosystem: Value Bias and Lost Utility in Multi-Dimensional Gaps
In recent years, artificial intelligence (AI) decision-making and autonomous systems became an integrated part of the economy, industry, and society. The evolving economy of the human-AI ecosystem raising concerns regarding the risks and values inherited in AI systems. This paper investigates the dynamics of creation and exchange of values and points out gaps in perception of cost-value, knowledge, space and time dimensions. It shows aspects of value bias in human perception of achievements and costs that encoded in AI systems. It also proposes rethinking hard goals definitions and cost-optimal problem-solving principles in the lens of effectiveness and efficiency in the development of trusted machines. The paper suggests a value-driven with cost awareness strategy and principles for problem-solving and planning of effective research progress to address real-world problems that involve diverse forms of achievements, investments, and survival scenarios.
Parameter Sharing Reinforcement Learning Architecture for Multi Agent Driving Behaviors
Kaushik, Meha, S, Phaniteja, Krishna, K. Madhava
Multi-agent learning provides a potential framework for learning and simulating traffic behaviors. This paper proposes a novel architecture to learn multiple driving behaviors in a traffic scenario. The proposed architecture can learn multiple behaviors independently as well as simultaneously. We take advantage of the homogeneity of agents and learn in a parameter sharing paradigm. To further speed up the training process asynchronous updates are employed into the architecture. While learning different behaviors simultaneously, the given framework was also able to learn cooperation between the agents, without any explicit communication. We applied this framework to learn two important behaviors in driving: 1) Lane-Keeping and 2) Over-Taking. Results indicate faster convergence and learning of a more generic behavior, that is scalable to any number of agents. When compared the results with existing approaches, our results indicate equal and even better performance in some cases.
On Human Robot Interaction using Multiple Modes
Humanoid robots have apparently similar body structure like human beings. Due to their technical design, they are sharing the same workspace with humans. They are placed to clean things, to assist old age people, to entertain us and most importantly to serve us. To be acceptable in the household, they must have higher level of intelligence than industrial robots and they must be social and capable of interacting people around it, who are not supposed to be robot specialist. All these come under the field of human robot interaction (HRI). There are various modes like speech, gesture, behavior etc. through which human can interact with robots. To solve all these challenges, a multimodel technique has been introduced where gesture as well as speech is used as a mode of interaction.
The Barbados 2018 List of Open Issues in Continual Learning
Schaul, Tom, van Hasselt, Hado, Modayil, Joseph, White, Martha, White, Adam, Bacon, Pierre-Luc, Harb, Jean, Mourad, Shibl, Bellemare, Marc, Precup, Doina
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a weeklong workshop held in Barbados in February 2018. We adopt the reinforcement learning (RL) formulation, where an agent interacts sequentially with an environment, and the agent is provided a reward signal that unambiguously defines success. We want to explicitly consider some of the most challenging dimensions for a developing intelligence.
Exploring Gameplay With AI Agents
Silva, Fernando de Mesentier, Borovikov, Igor, Kolen, John, Aghdaie, Navid, Zaman, Kazi
The process of playtesting a game is subjective, expensive and incomplete. In this paper, we present a playtesting approach that explores the game space with automated agents and collects data to answer questions posed by the designers. Rather than have agents interacting with an actual game client, this approach recreates the bare bone mechanics of the game as a separate system. Our agent is able to play in minutes what would take testers days of organic gameplay. The analysis of thousands of game simulations exposed imbalances in game actions, identified inconsequential rewards and evaluated the effectiveness of optional strategic choices. Our test case game, The Sims Mobile, was recently released and the findings shown here influenced design changes that resulted in improved player experience.
Optimizing Photonic Nanostructures via Multi-fidelity Gaussian Processes
Song, Jialin, Tokpanov, Yury S., Chen, Yuxin, Fleischman, Dagny, Fountaine, Kate T., Atwater, Harry A., Yue, Yisong
We apply numerical methods in combination with finite-difference-time-domain (FDTD) simulations to optimize transmission properties of plasmonic mirror color filters using a multi-objective figure of merit over a five-dimensional parameter space by utilizing novel multi-fidelity Gaussian processes approach. We compare these results with conventional derivative-free global search algorithms, such as (single-fidelity) Gaussian Processes optimization scheme, and Particle Swarm Optimization---a commonly used method in nanophotonics community, which is implemented in Lumerical commercial photonics software. We demonstrate the performance of various numerical optimization approaches on several pre-collected real-world datasets and show that by properly trading off expensive information sources with cheap simulations, one can more effectively optimize the transmission properties with a fixed budget.
Seq2Seq Mimic Games: A Signaling Perspective
Leni, Juan, Levine, John, Quigley, John
We study the emergence of communication in multiagent adversarial settings inspired by the classic Imitation game. A class of three player games is used to explore how agents based on sequence to sequence (Seq2Seq) models can learn to communicate information in adversarial settings. We propose a modeling approach, an initial set of experiments and use signaling theory to support our analysis. In addition, we describe how we operationalize the learning process of actor-critic Seq2Seq based agents in these communicational games.
Adversarial Resilience Learning - Towards Systemic Vulnerability Analysis for Large and Complex Systems
Fischer, Lars, Memmen, Jan-Menno, Veith, Eric MSP, Tröschel, Martin
Current newspapers are full of horrific tales of "cyber-attackers" threatening our energy systems. And, if not for the notorious "evil state"-actor, it is the ongoing digitization necessary to enable increasing renewable and volatile energy generation that threatens our energy supply and thus the stability of our society. And while the main approach seems to be to patch-up the detected vulnerabilities of protocols, software and controller devices, our approach is to research and develop the means to systematically design and test systems that are structurally resilient against failures and attackers alike. Security in cyber-systems mostly should be concerned with establishing asymetric control in favour of the operator of a system. In order to achieve this on a structural level at design time, reproducible benchmark tests are required.