AITopics

Genre: Research Report > New Finding (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

#artificialintelligenceSep-1-2019, 13:41:12 GMT

AI Learning to land a Rocket(Lunar Lander) Reinforcement Learning

Reinforcement learning is one of the most discussed, followed and contemplated topics in artificial intelligence (AI) as it has the potential to transform most businesses. At the core of reinforcement learning is the concept that optimal behaviour or action is reinforced by a positive reward. Similar to toddlers learning how to walk who adjust actions based on the outcomes they experience such as taking a smaller step if the previous broad step made them fall. Machines and AI agents use reinforcement learning algorithms to determine the ideal behaviour based upon feedback from the environment. An example of the reinforcement Learning in Action is AlphaGo Zero which was in the headlines in 2017.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Industry: Leisure & Entertainment > Games > Go (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceSep-1-2019, 01:49:25 GMT

Collision Avoidance with Deep Reinforcement Learning

In the past decade, learning algorithms developed to play video games better than humans have become more common. Google's DeepMind Technologies developed learning algorithms that could play Atari video games and also demonstrated their famous AlphaGo algorithm which outperformed professional Go players. However, little research has been done on learning algorithms developed to complete the particularly difficult single-player games. In particular, much further research could be done on developing learning algorithms for mechanically challenging games such as "bullet hell" games. We believe that agents could learn to efficiently evade obstacles utilizing deep reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Industry: Leisure & Entertainment > Games > Go (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Sabatelli, Matthia, Louppe, Gilles, Geurts, Pierre, Wiering, Marco A.

Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

arXiv.org Artificial IntelligenceSep-1-2019

This paper makes one step forward towards characterizing a new family of \textit{model-free} Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function ($V$), alongside an approximation of the state-action value function ($Q$). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN) \cite{sabatelli2018deep}. Intending to investigate why DQV's learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV's performance can get harmed and introduce a novel \textit{off-policy} DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the $V$ and $Q$ functions that are learned by DQV and DQV-Max and show that both algorithms might perform so well on several DRL test-beds because they are less prone to suffer from the overestimation bias of the $Q$ function.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1909.01779

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Genders, Wade, Razavi, Saiedeh

An Open-Source Framework for Adaptive Traffic Signal Control

arXiv.org Artificial IntelligenceSep-1-2019

Developing optimal transportation control systems at the appropriate scale can be difficult as cities' transportation systems can be large, complex and stochastic. Intersection traffic signal controllers are an important element of modern transportation infrastructure where sub-optimal control policies can incur high costs to many users. Many adaptive traffic signal controllers have been proposed by the community but research is lacking regarding their relative performance difference - which adaptive traffic signal controller is best remains an open question. This research contributes a framework for developing and evaluating different adaptive traffic signal controller models in simulation - both learning and non-learning - and demonstrates its capabilities. The framework is used to first, investigate the performance variance of the modelled adaptive traffic signal controllers with respect to their hyperparameters and second, analyze the performance differences between controllers with optimal hyperparameters. The proposed framework contains implementations of some of the most popular adaptive traffic signal controllers from the literature; Webster's, Max-pressure and Self-Organizing Traffic Lights, along with deep Q-network and deep deterministic policy gradient reinforcement learning controllers. This framework will aid researchers by accelerating their work from a common starting point, allowing them to generate results faster with less effort.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

1909.00395

Country:

Asia (0.68)
Europe (0.67)
North America > United States (0.46)
North America > Canada > Ontario > Hamilton (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Whitney, William, Agarwal, Rajat, Cho, Kyunghyun, Gupta, Abhinav

Dynamics-aware Embeddings

arXiv.org Artificial IntelligenceSep-1-2019

In this paper we consider self-supervised representation learning to improve sample efficiency in reinforcement learning (RL). We propose a forward prediction objective for simultaneously learning embeddings of states and actions. These embeddings capture the structure of the environment's dynamics, enabling efficient policy learning. We demonstrate that our action embeddings alone improve the sample efficiency and peak performance of model-free RL on control from low-dimensional states. By combining state and action embeddings, we achieve efficient learning of high-quality policies on goal-conditioned continuous control from pixel observations in only 1-2 million environment steps.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1908.09357

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

#artificialintelligenceAug-31-2019, 13:43:01 GMT

Reinforcement Learning Applications

A state is constructed from the multidimensional discrete time series composed of 48 variables about demographics, vital signs, premorbid status, laboratory values, and intravenous fluids and vasopressors received as treatments. Clustering is used to define the state space so that patients in the same cluster are similar w.r.t. the observable properties. An action, or a medical treatment, is defined by the total volume of intravenous fluids and maximum dose of vasopressors over each 4 hour period. The dose of each treatment is divided into 5 possible choices, resulting in 25 discrete actions when combining the two treatments. A reward and a penalty is associated with survival and death, respectively, to optimize patient mortality.

artificial intelligence, machine learning, reinforcement learning application, (5 more...)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

#artificialintelligenceAug-31-2019, 08:31:33 GMT

Deep Q-Learning with Python and TensorFlow 2.0

In the previous two articles we started exploring the interesting universe of reinforcement learning. First we went through the basics of third paradigm within machine learning – reinforcement learning. Just to freshen up our memory, we saw that approach of this type of learning is unlike the previously explored supervised and unsupervised learning. In reinforcement learning, self-learning agent learns some type of interaction between it and the environment. The agent wants to achieve some kind of goal within mentioned environment while it interacts with it. This interaction is divided into time steps.

agent, q-learning, q-value, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceAug-31-2019

Collaborative Policy Learning for Open Knowledge Graph Reasoning

Fu, Cong, Chen, Tong, Qu, Meng, Jin, Woojeong, Ren, Xiang

In recent years, there has been a surge of interests in interpretable graph reasoning methods. However, these models often suffer from limited performance when working on sparse and incomplete graphs, due to the lack of evidential paths that can reach target entities. Here we study open knowledge graph reasoning---a task that aims to reason for missing facts over a graph augmented by a background text corpus. A key challenge of the task is to filter out "irrelevant" facts extracted from corpus, in order to maintain an effective search space during path inference. We propose a novel reinforcement learning framework to train two collaborative agents jointly, i.e., a multi-hop graph reasoner and a fact extractor. The fact extraction agent generates fact triples from corpora to enrich the graph on the fly; while the reasoning agent provides feedback to the fact extractor and guides it towards promoting facts that are helpful for the interpretable reasoning. Experiments on two public datasets demonstrate the effectiveness of the proposed approach. Source code and datasets used in this paper can be downloaded at https://github.com/shanzhenren/CPL

extractor, reasoner, reasoning, (16 more...)

1909.0023

Country:

North America > United States > California (0.14)
Europe > Italy (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.72)

#artificialintelligenceAug-30-2019, 22:21:09 GMT

Policy Certificates and Minimax-Optimal PAC Bounds for Episodic Reinforcement Learning

Designing reinforcement learning methods which find a good policy with as few samples as possible is a key goal of both empirical and theoretical research. On the theoretical side there are two main ways, regret- or PAC (probably approximately correct) bounds, to measure and guarantee sample-efficiency of a method. Ideally, we would like to have algorithms that have good performance according to both criteria, as they measure different aspects of sample efficiency and we have shown previously [1] that one cannot simply go from one to the other. In a specific setting called tabular episodic MDPs, a recent algorithm achieved close to optimal regret bounds [2] but there was no methods known to be close to optimal according to the PAC criterion despite a long line of research. In our work presented at ICML 2019, we close this gap with a new method that achieves minimax-optimal PAC (and regret) bounds which match the statistical worst-case lower bounds in the dominating terms.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Country: North America > United States > California > Santa Clara County > Palo Alto (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)