Abramson, Josh, Ahuja, Arun, Brussee, Arthur, Carnevale, Federico, Cassin, Mary, Clark, Stephen, Dudzik, Andrew, Georgiev, Petko, Guy, Aurelia, Harley, Tim, Hill, Felix, Hung, Alden, Kenton, Zachary, Landon, Jessica, Lillicrap, Timothy, Mathewson, Kory, Muldal, Alistair, Santoro, Adam, Savinov, Nikolay, Varma, Vikrant, Wayne, Greg, Wong, Nathaniel, Yan, Chen, Zhu, Rui
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.
Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g. cutout and random convolution) are previously explored to increase the data diversity. However, we find these approaches only locally perturb the observations regardless of the training environments, showing limited effectiveness on enhancing the data diversity and the generalization performance. In this work, we introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments and imposes linearity constraints on the observation interpolations and the supervision (e.g. associated reward) interpolations. Mixreg increases the data diversity more effectively and helps learn smoother policies. We verify its effectiveness on improving generalization by conducting extensive experiments on the large-scale Procgen benchmark. Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin. Mixreg is simple, effective and general. It can be applied to both policy-based and value-based RL algorithms. Code is available at https://github.com/kaixin96/mixreg .
Deep reinforcement learning (DRL) is an emerging methodology that is transforming the way many complicated transportation decision-making problems are tackled. Researchers have been increasingly turning to this powerful learning-based methodology to solve challenging problems across transportation fields. While many promising applications have been reported in the literature, there remains a lack of comprehensive synthesis of the many DRL algorithms and their uses and adaptations. The objective of this paper is to fill this gap by conducting a comprehensive, synthesized review of DRL applications in transportation. We start by offering an overview of the DRL mathematical background, popular and promising DRL algorithms, and some highly effective DRL extensions. Building on this overview, a systematic investigation of about 150 DRL studies that have appeared in the transportation literature, divided into seven different categories, is performed. Building on this review, we continue to examine the applicability, strengths, shortcomings, and common and application-specific issues of DRL techniques with regard to their applications in transportation. In the end, we recommend directions for future research and present available resources for actually implementing DRL.
In this paper, we consider the syntactic properties of languages emerged in referential games, using unsupervised grammar induction (UGI) techniques originally designed to analyse natural language. We show that the considered UGI techniques are appropriate to analyse emergent languages and we then study if the languages that emerge in a typical referential game setup exhibit syntactic structure, and to what extent this depends on the maximum message length and number of symbols that the agents are allowed to use. Our experiments demonstrate that a certain message length and vocabulary size are required for structure to emerge, but they also illustrate that more sophisticated game scenarios are required to obtain syntactic properties more akin to those observed in human language. We argue that UGI techniques should be part of the standard toolkit for analysing emergent languages and release a comprehensive library to facilitate such analysis for future researchers.
We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest (e.g. linear classifier). We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family. As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees. Empirically, we show that the framework can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization ability of neural networks.
In order to facilitate natural interaction, researchers in social robotics have focused on robots that can adapt to diverse conditions and to the different users with whom they interact. Recently, there has been great interest in the use of machine learning methods for adaptive social robots , , , , , . Machine Learning (ML) algorithms can be categorized into three subfields : supervised learning, unsupervised learning and reinforcement learning. In supervised learning, correct input/output pairs are available and the goal is to find a correct mapping from input to output space. In unsupervised learning, output data is not available and the goal is to find patterns in the input data. Reinforcement Learning (RL)  is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. The agent does not receive direct feedback of correctness, instead it receives scarce feedback about the actions it has taken in the past.
Friedrich, Sarah, Antes, Gerd, Behr, Sigrid, Binder, Harald, Brannath, Werner, Dumpert, Florian, Ickstadt, Katja, Kestler, Hans, Lederer, Johannes, Leitgöb, Heinz, Pauly, Markus, Steland, Ansgar, Wilhelm, Adalbert, Friede, Tim
The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With its specialist knowledge of data evaluation, starting with the precise formulation of the research question and passing through a study design stage on to analysis and interpretation of the results, statistics is a natural partner for other disciplines in teaching, research and practice. This paper aims at contributing to the current discussion by highlighting the relevance of statistical methodology in the context of AI development. In particular, we discuss contributions of statistics to the field of artificial intelligence concerning methodological development, planning and design of studies, assessment of data quality and data collection, differentiation of causality and associations and assessment of uncertainty in results. Moreover, the paper also deals with the equally necessary and meaningful extension of curricula in schools and universities.
Given the recent successes of Deep Learning in AI there has been increased interest in the role and need for explanations in machine learned theories. A distinct notion in this context is that of Michie's definition of Ultra-Strong Machine Learning (USML). USML is demonstrated by a measurable increase in human performance of a task following provision to the human of a symbolic machine learned theory for task performance. A recent paper demonstrates the beneficial effect of a machine learned logic theory for a classification task, yet no existing work has examined the potential harmfulness of machine's involvement in human learning. This paper investigates the explanatory effects of a machine learned theory in the context of simple two person games and proposes a framework for identifying the harmfulness of machine explanations based on the Cognitive Science literature. The approach involves a cognitive window consisting of two quantifiable bounds and it is supported by empirical evidence collected from human trials. Our quantitative and qualitative results indicate that human learning aided by a symbolic machine learned theory which satisfies a cognitive window has achieved significantly higher performance than human self learning. Results also demonstrate that human learning aided by a symbolic machine learned theory that fails to satisfy this window leads to significantly worse performance than unaided human learning.
Recent successes combine reinforcement learning algorithms and deep neural networks, despite reinforcement learning not being widely applied to robotics and real world scenarios. This can be attributed to the fact that current state-of-the-art, end-to-end reinforcement learning approaches still require thousands or millions of data samples to converge to a satisfactory policy and are subject to catastrophic failures during training. Conversely, in real world scenarios and after just a few data samples, humans are able to either provide demonstrations of the task, intervene to prevent catastrophic actions, or simply evaluate if the policy is performing correctly. This research investigates how to integrate these human interaction modalities to the reinforcement learning loop, increasing sample efficiency and enabling real-time reinforcement learning in robotics and real world scenarios. This novel theoretical foundation is called Cycle-of-Learning, a reference to how different human interaction modalities, namely, task demonstration, intervention, and evaluation, are cycled and combined to reinforcement learning algorithms. Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms. Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning. The theoretical foundation developed by this research opens new research paths to human-agent teaming scenarios where autonomous agents are able to learn from human teammates and adapt to mission performance metrics in real-time and in real world scenarios.
In June, OpenAI unveiled the largest language model in the world, a text-generating tool called GPT-3 that can write creative fiction, translate legalese into plain English, and answer obscure trivia questions. It's the latest feat of intelligence achieved by deep learning, a machine learning method patterned after the way neurons in the brain process and store information. But it came at a hefty price: at least $4.6 million and 355 years in computing time, assuming the model was trained on a standard neural network chip, or GPU. The model's colossal size -- 1,000 times larger than a typical language model -- is the main factor in its high cost. "You have to throw a lot more computation at something to get a little improvement in performance," says Neil Thompson, an MIT researcher who has tracked deep learning's unquenchable thirst for computing.