Country
Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation
Kadian, Abhishek, Truong, Joanne, Gokaslan, Aaron, Clegg, Alexander, Wijmans, Erik, Lee, Stefan, Savva, Manolis, Chernova, Sonia, Batra, Dhruv
Does progress in simulation translate to progress in robotics? Specifically, if method A outperforms method B in simulation, how likely is the trend to hold in reality on a robot? We examine this question for embodied (PointGoal) navigation, developing engineering tools and a research paradigm for evaluating a simulator by its sim2real predictivity, revealing surprising findings about prior work. First, we develop Habitat-PyRobot Bridge (HaPy), a library for seamless execution of identical code on a simulated agent and a physical robot. Habitat-to-Locobot transfer with HaPy involves just one line change in config, essentially treating reality as just another simulator! Second, we investigate sim2real predictivity of Habitat-Sim for PointGoal navigation. We 3D-scan a physical lab space to create a virtualized replica, and run parallel tests of 9 different models in reality and simulation. We present a new metric called Sim-vs-Real Correlation Coefficient (SRCC) to quantify sim2real predictivity. Our analysis reveals several important findings. We find that SRCC for Habitat as used for the CVPR19 challenge is low (0.18 for the success metric), which suggests that performance improvements for this simulator-based challenge would not transfer well to a physical robot. We find that this gap is largely due to AI agents learning to 'cheat' by exploiting simulator imperfections: specifically, the way Habitat allows for 'sliding' along walls on collision. Essentially, the virtual robot is capable of cutting corners, leading to unrealistic shortcuts through non-navigable spaces. Naturally, such exploits do not work in the real world where the robot stops on contact with walls. Our experiments show that it is possible to optimize simulation parameters to enable robots trained in imperfect simulators to generalize learned skills to reality (e.g. improving $SRCC_{Succ}$ from 0.18 to 0.844).
Recruitment-imitation Mechanism for Evolutionary Reinforcement Learning
Lü, Shuai, Han, Shuai, Zhou, Wenbo, Zhang, Junwei
Reinforcement learning, evolutionary algorithms and imitation learning are three principal methods to deal with continuous control tasks. Reinforcement learning is sample efficient, yet sensitive to hyper-parameters setting and needs efficient exploration; Evolutionary algorithms are stable, but with low sample efficiency; Imitation learning is both sample efficient and stable, however it requires the guidance of expert data. In this paper, we propose Recruitment-imitation Mechanism (RIM) for evolutionary reinforcement learning, a scalable framework that combines advantages of the three methods mentioned above. The core of this framework is a dual-actors and single critic reinforcement learning agent. This agent can recruit high-fitness actors from the population of evolutionary algorithms, which instructs itself to learn from experience replay buffer. At the same time, low-fitness actors in the evolutionary population can imitate behavior patterns of the reinforcement learning agent and improve their adaptability. Reinforcement and imitation learners in this framework can be replaced with any off-policy actor-critic reinforcement learner or data-driven imitation learner. We evaluate RIM on a series of benchmarks for continuous control tasks in Mujoco. The experimental results show that RIM outperforms prior evolutionary or reinforcement learning methods. The performance of RIM's components is significantly better than components of previous evolutionary reinforcement learning algorithm, and the recruitment using soft update enables reinforcement learning agent to learn faster than that using hard update.
Learning To Reach Goals Without Reinforcement Learning
Ghosh, Dibya, Gupta, Abhishek, Fu, Justin, Reddy, Ashwin, Devin, Coline, Eysenbach, Benjamin, Levine, Sergey
L EARNING TO R EACH G OALS WITHOUT R EINFORCEMENTL EARNING Dibya Ghosh* 1, Abhishek Gupta* 1, Justin Fu 1, Ashwin Reddy 1, Coline Devin 1 Benjamin Eysenbach 2 Sergey Levine 1 1 University of California Berkeley 2 Carnegie Mellon University A BSTRACT Imitation learning algorithms provide a simple and straightforward approach for training control policies via supervised learning. By maximizing the likelihood of good actions provided by an expert demonstrator, supervised imitation learning can produce effective policies without the algorithmic complexities and optimization challenges of reinforcement learning, at the cost of requiring an expert demonstrator to provide the demonstrations. In this paper, we ask: can we take insights from imitation learning to design algorithms that can effectively acquire optimal policies from scratch without any expert demonstrations? The key observation that makes this possible is that, in the multi-task setting, trajectories that are generated by a suboptimal policy can still serve as optimal examples for other tasks. In particular, when tasks correspond to different goals, every trajectory is a successful demonstration for the goal state that it actually reaches. We propose a simple algorithm for learning goal-reaching behaviors without any demonstrations, complicated user-provided reward functions, or complex reinforcement learning methods. Our method simply maximizes the likelihood of actions the agent actually took in its own previous rollouts, conditioned on the goal being the state that it actually reached. Although related variants of this approach have been proposed previously in imitation learning with demonstrations, we show how this approach can effectively learn goal-reaching policies from scratch. We present a theoretical result linking self-supervised imitation learning and reinforcement learning, and empirical results showing that it performs competitively with more complex reinforcement learning methods on a range of challenging goal reaching problems, while yielding advantages in terms of stability and use of offline data. 1 I NTRODUCTION Reinforcement learning (RL) algorithms hold the promise of providing a broadly-applicable tool for automating control, and the combination of high-capacity deep neural network models with RL extends their applicability to settings with complex observations and that require intricate policies. However, RL with function approximation, including deep RL, presents a challenging optimization problem. Despite years of research, current deep RL methods are far from a turnkey solution: most popular methods lack convergence guarantees (Baird, 1995; Tsitsiklis & V an Roy, 1997) or require prohibitive numbers of samples (Schulman et al., 2015; Lillicrap et al., 2015).
CLOSURE: Assessing Systematic Generalization of CLEVR Models
Bahdanau, Dzmitry, de Vries, Harm, O'Donnell, Timothy J., Murty, Shikhar, Beaudoin, Philippe, Bengio, Yoshua, Courville, Aaron
Dzmitry Bahdanau 123 Harm de Vries 2 Timothy J. O'Donnell 14 Shikhar Murty 5 Philippe Beaudoin 2 Y oshua Bengio 136 Aaron Courville 136 1 Mila, Quebec Artificial Intelligence Institute 2 Element AI 3 Universit e de Montr eal 4 McGill University 5 Stanford University 6 CIFAR Fellow Abstract The CLEVR dataset of natural-looking questions about 3D-rendered scenes has recently received much attention from the research community. A number of models have been proposed for this task, many of which achieved very high accuracies of around 97-99%. In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs. To this end, we test models' understanding of referring expressions based on matching object properties (such as e.g. "the object that is the same size as the red ball") in novel contexts. Our experiments on the thereby constructed CLOSURE benchmark show that state-of-the-art models often do not exhibit systematicity after being trained on CLEVR. Surprisingly, we find that an explicitly compositional Neural Module Network model also generalizes badly on CLOSURE, even when it has access to the ground-truth programs at test time. We improve the NMN's systematic generalization by developing a novel V ector-NMN module architecture with vector-valued inputs and outputs. Lastly, we investigate the extent to which few-shot transfer learning can help models that are pretrained on CLEVR to adapt to CLOSURE. Our few-shot learning experiments contrast the adaptation behavior of the models with intermediate discrete programs with that of the end-to-end continuous models. 1 Introduction The ability to communicate in natural language and ground it effectively into our rich unstructured 3D reality is a crucial skill that we expect from artificial agents of the future. A popular task to benchmark progress towards this goal is Visual Question Answering (VQA), in which one must give a (typically short) answer to a question about the content of an image.
The PlayStation Reinforcement Learning Environment (PSXLE)
Purves, Carlos, Cangea, Cătălina, Veličković, Petar
We propose a new benchmark environment for evaluating Reinforcement Learning (RL) algorithms: the PlayStation Learning Environment (PSXLE), a PlayStation emulator modified to expose a simple control API that enables rich game-state representations. We argue that the PlayStation serves as a suitable progression for agent evaluation and propose a framework for such an evaluation. We build an action-driven abstraction for a PlayStation game with support for the OpenAI Gym interface and demonstrate its use by running OpenAI Baselines.
Graph Neural Networks for Decentralized Multi-Robot Path Planning
Li, Qingbiao, Gama, Fernando, Ribeiro, Alejandro, Prorok, Amanda
Efficient and collision-free navigation in multi-robot systems is fundamental to advancing mobility. Scenarios where the robots are restricted in observation and communication range call for decentralized solutions, whereby robots execute localized planning policies. From the point of view of an individual robot, however, its local decision-making system is incomplete, since other agents' unobservable states affect future values. The manner in which information is shared is crucial to the system's performance, yet is not well addressed by current approaches. To address these challenges, we propose a combined architecture, with the goal of learning a decentralized sequential action policy that yields efficient path plans for all robots. Our framework is composed of a convolutional neural network (CNN) that extracts adequate features from local observations, and a graph neural network (GNN) that communicates these features among robots. We train the model to imitate an expert algorithm, and use the resulting model online in decentralized planning involving only local communication. We evaluate our method in simulations involving teams of robots in cluttered workspaces. We measure the success rates and sum of costs over the planned paths. The results show a performance close to that of our expert algorithm, demonstrating the validity of our approach. In particular, we show our model's capability to generalize to previously unseen cases (involving larger environments and larger robot teams).
Game Design for Eliciting Distinguishable Behavior
Yang, Fan, Leqi, Liu, Wu, Yifan, Lipton, Zachary C., Ravikumar, Pradeep, Cohen, William W., Mitchell, Tom
The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing \emph{behavior diagnostic games} that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound. Our framework is instantiated by using prospect theory to model varying player traits, and Markov Decision Processes to parameterize the games. We validate our approach empirically, showing that our designed games can successfully distinguish among players with different traits, outperforming manually-designed ones by a large margin.
Extending Machine Language Models toward Human-Level Language Understanding
McClelland, James L., Hill, Felix, Rudolph, Maja, Baldridge, Jason, Schütze, Hinrich
Language is central to human intelligence. We review recent breakthroughs in machine language processing and consider what remains to be achieved. Recent approaches rely on domain general principles of learning and representation captured in artificial neural networks. Most current models, however, focus too closely on language itself. In humans, language is part of a larger system for acquiring, representing, and communicating about objects and situations in the physical and social world, and future machine language models should emulate such a system. We describe existing machine models linking language to concrete situations, and point toward extensions to address more abstract cases. Human language processing exploits complementary learning systems, including a deep neural network-like learning system that learns gradually as machine systems do, as well as a fast-learning system that supports learning new information quickly. Adding such a system to machine language models will be an important further step toward truly human-like language understanding.
Formal Verification of Debates in Argumentation Theory
Jha, Ria, Belardinelli, Francesco, Toni, Francesca
Humans engage in informal debates on a daily basis. By expressing their opinions and ideas in an argumentative fashion, they are able to gain a deeper understanding of a given problem and in some cases, find the best possible course of actions towards resolving it. In this paper, we develop a methodology to verify debates formalised as abstract argumentation frameworks. We first present a translation from debates to transition systems. Such transition systems can model debates and represent their evolution over time using a finite set of states. We then formalise relevant debate properties using temporal and strategy logics. These formalisations, along with a debate transition system, allow us to verify whether a given debate satisfies certain properties. The verification process can be automated using model checkers. Therefore, we also measure their performance when verifying debates, and use the results to discuss the feasibility of model checking debates.
Automatic Layout Generation with Applications in Machine Learning Engine Evaluation
Yang, Haoyu, Chen, Wen, Pathak, Piyush, Gennari, Frank, Lai, Ya-Chieh, Yu, Bei
Machine learning-based lithography hotspot detection has been deeply studied recently, from varies feature extraction techniques to efficient learning models. It has been observed that such machine learning-based frameworks are providing satisfactory metal layer hotspot prediction results on known public metal layer benchmarks. In this work, we seek to evaluate how these machine learning-based hotspot detectors generalize to complicated patterns. We first introduce a automatic layout generation tool that can synthesize varies layout patterns given a set of design rules. The tool currently supports both metal layer and via layer generation. As a case study, we conduct hotspot detection on the generated via layer layouts with representative machine learning-based hotspot detectors, which shows that continuous study on model robustness and generality is necessary to prototype and integrate the learning engines in DFM flows. The source code of the layout generation tool will be available at https://github. com/phdyang007/layout-generation.