Reinforcement Learning
Ensemble Machine Learning in Python: Random Forest, AdaBoost
In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.
Adaptive Skip Intervals: Temporal Abstraction for Recurrent Dynamical Models
Neitz, Alexander, Parascandolo, Giambattista, Bauer, Stefan, Schรถlkopf, Bernhard
We introduce a method which enables a recurrent dynamics model to be temporally abstract. Our approach, which we call Adaptive Skip Intervals (ASI), is based on the observation that in many sequential prediction tasks, the exact time at which events occur is irrelevant to the underlying objective. Moreover, in many situations, there exist prediction intervals which result in particularly easy-to-predict transitions. We show that there are prediction tasks for which we gain both computational efficiency and prediction accuracy by allowing the model to make predictions at a sampling rate which it can choose itself.
Automatic Derivation Of Formulas Using Reforcement Learning
This paper presents an artificial intelligence algorithm that can be used to derive formulas from various scientific disciplines called automatic derivation machine. First, the formula is abstractly expressed as a multiway tree model, and then each step of the formula derivation transformation is abstracted as a mapping of multiway trees. Derivation steps similar can be expressed as a reusable formula template by a multiway tree map. After that, the formula multiway tree is eigen-encoded to feature vectors construct the feature space of formulas, the Q-learning model using in this feature space can achieve the derivation by making training data from derivation process. Finally, an automatic formula derivation machine is made to choose the next derivation step based on the current state and object. We also make an example about the nuclear reactor physics problem to show how the automatic derivation machine works.
An Optimal Policy for Patient Laboratory Tests in Intensive Care Units
Cheng, Li-Fang, Prasad, Niranjani, Engelhardt, Barbara E
Laboratory testing is an integral tool in the management of patient care in hospitals, particularly in intensive care units (ICUs). There exists an inherent trade-off in the selection and timing of lab tests between considerations of the expected utility in clinical decision-making of a given test at a specific time, and the associated cost or risk it poses to the patient. In this work, we introduce a framework that learns policies for ordering lab tests which optimizes for this trade-off. Our approach uses batch off-policy reinforcement learning with a composite reward function based on clinical imperatives, applied to data that include examples of clinicians ordering labs for patients. To this end, we develop and extend principles of Pareto optimality to improve the selection of actions based on multiple reward function components while respecting typical procedural considerations and prioritization of clinical goals in the ICU. Our experiments show that we can estimate a policy that reduces the frequency of lab tests and optimizes timing to minimize information redundancy. We also find that the estimated policies typically suggest ordering lab tests well ahead of critical onsets--such as mechanical ventilation or dialysis--that depend on the lab results. We evaluate our approach by quantifying how these policies may initiate earlier onset of treatment.
Artificial intelligence system develops drugs from scratch
The research comes from the University of North Carolina at Chapel Hill, and it demonstrates how an artificial-intelligence design can teach itself how to design new drug molecules from scratch. Such a system could accelerate the design of new drug candidates for use across pharmaceuticals and healthcare. The new device is named "Reinforcement Learning for Structural Evolution" (abbreviated to ReLeaSE). The artificial intelligence is in the form of an algorithm which has been configured to work with a computer program, based on two neural networks. The networks are described by the researchers as being akin to a teacher and a student.
Intelligence is not Artificial
Summarizing, there are four desiderata that one would like to see in A.I. systems, if they have to compare well with human (or just animal) brains: meta-learning, learning by demonstration ("few-shot learning"), transfer learning and multi-task learning. Meta-learning is particularly relevant in the case of reinforcement learning. It is obvious that reinforcement learning is highly unnatural. DeepMind's AlphaGo and OpenAi Five need to learn from scratch via a huge number of trials. Animals, instead, use built-in or acquired "meta-skills" to learn new tasks in just a few trials. Modern computational theory of meta-learning (learning how to learn) dates back at least to the 1990s, when Schmidhuber published the manifesto "Simple Principles of Metalearning" (1996), followed by his student Sepp Hochreiter ("Learning to Learn Using Gradient Descent", 2001), and by Nicolas Schweighofer and Kenji Doya at Japan's ATR ("Meta-learning in Reinforcement Learning", 2001). Examples of "deep" meta-learning systems of the new generation are: RL Square by Pieter Abbeel's student Yan Duan at UC Berkeley, based on Schulman's TRPO ("RL Square: Fast Reinforcement Learning via Slow Reinforcement Learning", 2016); the "model-agnostic meta-learning" (MAML) of Sergey Levine's student Chelsea Finn at UC Berkeley ("Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks", 2017); Marcel Binz's thesis at KTH Royal Institute of Technology ("Learning Goal-Directed Behaviour", 2017); Jane Wang's "deep meta-reinforcement learning" at DeepMind ("Learning to Reinforcement Learn", 2017); and OpenAI's Reptile, developed by Alex Nichol and John Schulman, a generalization of Finn's MAML ("On First-Order Meta-Learning Algorithms", 2018). DeepMind's neuroscientist Matthew Botvinick believes that the latter could be a model for how our brain learns: the dopamine system trains another part of the brain, the prefrontal cortex, to operate as its own free-standing learning system ("Prefrontal Cortex as a Meta-reinforcement Learning System", 2018).
Directed Policy Gradient for Safe Reinforcement Learning with Human Advice
Plisnier, Hรฉlรจne, Steckelmacher, Denis, Brys, Tim, Roijers, Diederik M., Nowรฉ, Ann
Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-workers, users or clients. It is desirable that these agents adjust to people's preferences, learn faster thanks to their help, and act safely around them. We argue that most current approaches that learn from human feedback are unsafe: rewarding or punishing the agent a-posteriori cannot immediately prevent it from wrong-doing. In this paper, we extend Policy Gradient to make it robust to external directives, that would otherwise break the fundamentally on-policy nature of Policy Gradient. Our technique, Directed Policy Gradient (DPG), allows a teacher or backup policy to override the agent before it acts undesirably, while allowing the agent to leverage human advice or directives to learn faster. Our experiments demonstrate that DPG makes the agent learn much faster than reward-based approaches, while requiring an order of magnitude less advice.
Large-Scale Study of Curiosity-Driven Learning
Burda, Yuri, Edwards, Harri, Pathak, Deepak, Storkey, Amos, Darrell, Trevor, Efros, Alexei A.
Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments. (b) We investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We demonstrate limitations of the prediction-based rewards in stochastic setups. Game-play videos and code are at https://pathak22.github.io/large-scale-curiosity/
A Framework for Automated Cellular Network Tuning with Reinforcement Learning
Mismar, Faris B., Choi, Jinseok, Evans, Brian L.
Tuning cellular network performance against always occurring wireless impairments can dramatically improve reliability to end users. In this paper, we formulate cellular network performance tuning as a reinforcement learning (RL) problem and provide a solution to improve the signal to interferenceplus-noise ratio (SINR) for indoor and outdoor environments. By leveraging the ability of Q-learning to estimate future SINR improvement rewards, we propose two algorithms: (1) voice over LTE (VoLTE) downlink closed loop power control (PC) and (2) self-organizing network (SON) fault management. The VoLTE PC algorithm uses RL to adjust the indoor base station transmit power so that the effective SINR meets the target SINR. The SON fault management algorithm uses RL to improve the performance of an outdoor cluster by resolving faults in the network through configuration management. Both algorithms exploit measurements from the connected users, wireless impairments, and relevant configuration parameters to solve a non-convex SINR optimization problem using RL. Simulation results show that our proposed RL based algorithms outperform the industry standards today in realistic cellular communication environments. The tuning of network performance aims at providing the end user with excellent quality of experience (QoE). With over 1.5 billion smartphones used globally, demand patterns have The authors are with the Wireless Networking and Communications Group, Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, 78712, USA email: {faris.mismar, This paper is an expanded journal version of [1] and [2]. 2 Demands have shifted towards reliable packetized voice and applications with higher data rates and lower latencies [4].
Visual Sensor Network Reconfiguration with Deep Reinforcement Learning
We present an approach for reconfiguration of dynamic visual sensor networks with deep reinforcement learning (RL). Our RL agent uses a modified asynchronous advantage actor-critic framework and the recently proposed Relational Network module at the foundation of its network architecture. To address the issue of sample inefficiency in current approaches to model-free reinforcement learning, we train our system in an abstract simulation environment that represents inputs from a dynamic scene. Our system is validated using inputs from a real-world scenario and preexisting object detection and tracking algorithms.