Goto

Collaborating Authors

 Reinforcement Learning


The Best Machine Learning Research of June 2019

#artificialintelligence

Machine Learning and the data science industry is always changing. To keep you updated on the most recent discoveries, we've compiled the 5 most exciting machine learning research pieces that expand what we thought we knew about machine learning and the industries to which it relates. Fairness in machine learning has been a heavy topic of discussion since the beginnings of the technology, but now, in a paper by Candice Schumann, Xuezhi Wang, Alex Beutel, Jilin Chen, Hai Qian, and Ed H. Chi we have some theoretical models to ensure fairness across different applications of one machine learning model. They frame this issue as "domain adaptation problems: how can we use what we have learned in a source domain to debias in a new target domain, without directly debiasing on the target domain as if it is a completely new problem?" In the paper, they also offer "a modeling approach to transfer to data-sparse target domains… [and] empirical results validating the theory and showing that these modeling approaches can improve fairness metrics with less data" In a recent paper by Ali Malik, Volodymyr Kuleshov, Jiaming Song, Danny Nemer, Harlan Seymour, and Stefano Ermon, they explore "which uncertainties are needed for model-based reinforcement learning and argues that good uncertainties must be calibrated."


AI Learning to land a Rocket(Lunar Lander) Reinforcement Learning

#artificialintelligence

Reinforcement learning is one of the most discussed, followed and contemplated topics in artificial intelligence (AI) as it has the potential to transform most businesses. At the core of reinforcement learning is the concept that optimal behaviour or action is reinforced by a positive reward. Similar to toddlers learning how to walk who adjust actions based on the outcomes they experience such as taking a smaller step if the previous broad step made them fall. Machines and AI agents use reinforcement learning algorithms to determine the ideal behaviour based upon feedback from the environment. An example of the reinforcement Learning in Action is AlphaGo Zero which was in the headlines in 2017.


Collision Avoidance with Deep Reinforcement Learning

#artificialintelligence

In the past decade, learning algorithms developed to play video games better than humans have become more common. Google's DeepMind Technologies developed learning algorithms that could play Atari video games and also demonstrated their famous AlphaGo algorithm which outperformed professional Go players. However, little research has been done on learning algorithms developed to complete the particularly difficult single-player games. In particular, much further research could be done on developing learning algorithms for mechanically challenging games such as "bullet hell" games. We believe that agents could learn to efficiently evade obstacles utilizing deep reinforcement learning.


Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

arXiv.org Artificial Intelligence

This paper makes one step forward towards characterizing a new family of \textit{model-free} Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function ($V$), alongside an approximation of the state-action value function ($Q$). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN) \cite{sabatelli2018deep}. Intending to investigate why DQV's learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV's performance can get harmed and introduce a novel \textit{off-policy} DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the $V$ and $Q$ functions that are learned by DQV and DQV-Max and show that both algorithms might perform so well on several DRL test-beds because they are less prone to suffer from the overestimation bias of the $Q$ function.


An Open-Source Framework for Adaptive Traffic Signal Control

arXiv.org Artificial Intelligence

Developing optimal transportation control systems at the appropriate scale can be difficult as cities' transportation systems can be large, complex and stochastic. Intersection traffic signal controllers are an important element of modern transportation infrastructure where sub-optimal control policies can incur high costs to many users. Many adaptive traffic signal controllers have been proposed by the community but research is lacking regarding their relative performance difference - which adaptive traffic signal controller is best remains an open question. This research contributes a framework for developing and evaluating different adaptive traffic signal controller models in simulation - both learning and non-learning - and demonstrates its capabilities. The framework is used to first, investigate the performance variance of the modelled adaptive traffic signal controllers with respect to their hyperparameters and second, analyze the performance differences between controllers with optimal hyperparameters. The proposed framework contains implementations of some of the most popular adaptive traffic signal controllers from the literature; Webster's, Max-pressure and Self-Organizing Traffic Lights, along with deep Q-network and deep deterministic policy gradient reinforcement learning controllers. This framework will aid researchers by accelerating their work from a common starting point, allowing them to generate results faster with less effort.


Dynamics-aware Embeddings

arXiv.org Artificial Intelligence

In this paper we consider self-supervised representation learning to improve sample efficiency in reinforcement learning (RL). We propose a forward prediction objective for simultaneously learning embeddings of states and actions. These embeddings capture the structure of the environment's dynamics, enabling efficient policy learning. We demonstrate that our action embeddings alone improve the sample efficiency and peak performance of model-free RL on control from low-dimensional states. By combining state and action embeddings, we achieve efficient learning of high-quality policies on goal-conditioned continuous control from pixel observations in only 1-2 million environment steps.


Reinforcement Learning Applications

#artificialintelligence

A state is constructed from the multidimensional discrete time series composed of 48 variables about demographics, vital signs, premorbid status, laboratory values, and intravenous fluids and vasopressors received as treatments. Clustering is used to define the state space so that patients in the same cluster are similar w.r.t. the observable properties. An action, or a medical treatment, is defined by the total volume of intravenous fluids and maximum dose of vasopressors over each 4 hour period. The dose of each treatment is divided into 5 possible choices, resulting in 25 discrete actions when combining the two treatments. A reward and a penalty is associated with survival and death, respectively, to optimize patient mortality.


Deep Q-Learning with Python and TensorFlow 2.0

#artificialintelligence

In the previous two articles we started exploring the interesting universe of reinforcement learning. First we went through the basics of third paradigm within machine learning – reinforcement learning. Just to freshen up our memory, we saw that approach of this type of learning is unlike the previously explored supervised and unsupervised learning. In reinforcement learning, self-learning agent learns some type of interaction between it and the environment. The agent wants to achieve some kind of goal within mentioned environment while it interacts with it. This interaction is divided into time steps.


Collaborative Policy Learning for Open Knowledge Graph Reasoning

arXiv.org Artificial Intelligence

In recent years, there has been a surge of interests in interpretable graph reasoning methods. However, these models often suffer from limited performance when working on sparse and incomplete graphs, due to the lack of evidential paths that can reach target entities. Here we study open knowledge graph reasoning---a task that aims to reason for missing facts over a graph augmented by a background text corpus. A key challenge of the task is to filter out "irrelevant" facts extracted from corpus, in order to maintain an effective search space during path inference. We propose a novel reinforcement learning framework to train two collaborative agents jointly, i.e., a multi-hop graph reasoner and a fact extractor. The fact extraction agent generates fact triples from corpora to enrich the graph on the fly; while the reasoning agent provides feedback to the fact extractor and guides it towards promoting facts that are helpful for the interpretable reasoning. Experiments on two public datasets demonstrate the effectiveness of the proposed approach. Source code and datasets used in this paper can be downloaded at https://github.com/shanzhenren/CPL


Policy Certificates and Minimax-Optimal PAC Bounds for Episodic Reinforcement Learning

#artificialintelligence

Designing reinforcement learning methods which find a good policy with as few samples as possible is a key goal of both empirical and theoretical research. On the theoretical side there are two main ways, regret- or PAC (probably approximately correct) bounds, to measure and guarantee sample-efficiency of a method. Ideally, we would like to have algorithms that have good performance according to both criteria, as they measure different aspects of sample efficiency and we have shown previously [1] that one cannot simply go from one to the other. In a specific setting called tabular episodic MDPs, a recent algorithm achieved close to optimal regret bounds [2] but there was no methods known to be close to optimal according to the PAC criterion despite a long line of research. In our work presented at ICML 2019, we close this gap with a new method that achieves minimax-optimal PAC (and regret) bounds which match the statistical worst-case lower bounds in the dominating terms.