Goto

Collaborating Authors

 Reinforcement Learning


Feedback-Based Tree Search for Reinforcement Learning

arXiv.org Artificial Intelligence

Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number of artificial intelligence (AI) application domains, we propose a model-based reinforcement learning (RL) technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon Markov decision process. The terminal condition of the finite-horizon problems, or the leaf-node evaluator of the decision tree generated by MCTS, is specified using a combination of an estimated value function and an estimated policy function. The recommendations generated by the MCTS procedure are then provided as feedback in order to refine, through classification and regression, the leaf-node evaluator for the next iteration. We provide the first sample complexity bounds for a tree search-based RL algorithm. In addition, we show that a deep neural network implementation of the technique can create a competitive AI agent for the popular multi-player online battle arena (MOBA) game King of Glory.


Graph Signal Sampling via Reinforcement Learning

arXiv.org Artificial Intelligence

Modern information processing systems generate massive datasets which are often strongly heterogeneous, e.g., partially labeled mixtures of different media (audio, video, text). A quite successful approach to such datasets is based on representing the data as networks or graphs. In particular, we represent datasets by graph signals defined over an underlying graph, which reflects similarities between individual data points. The graph signal values encode label information which often conforms to a clustering hypothesis, i.e., the signal values (labels) of close-by nodes (similar data points) are similar. Two core problems considered within graph signal processing (GSP) are (i) how to sample them, i.e., which signal values provide the most information about the entire dataset, and (ii) how to recover the entire graph signal from these few signal values (samples). These problems have been studied in [1]-[6] which discussed convex optimization methods for recovering a graph signal from a small number of signal values observed on the nodes belonging to a given (small) sampling set. Sufficient conditions on the sampling set and clustering structure such that these convex methods are successful have been discussed in [4], [7].


Leveraging human knowledge in tabular reinforcement learning: A study of human subjects

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer's part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named SASS which is based on the notion of similarities in the agent's state-action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single speedup method with minimal human designer effort overhead.


Data Science: Supervised Machine Learning in Python

@machinelearnbot

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.


Data Science in 90 Seconds: Reinforcement Learning - DATAVERSITY

#artificialintelligence

Laura was born in a small town in North Carolina. She went on to earn a B.S. in Textile Engineering and a B.A. in Spanish at North Carolina State University. Laura thought this unique combination of majors would be amazing after attending a summer camp in high school where she played with bouncing polymers. While attending North Carolina State University, she earned a scholarship to study a summer term in Peru, where she fell in love with the Spanish language. Upon graduation, she moved to Washington, D.C. where she served in a variety of digital information roles.


Deep Reinforcement Learning Essential Prerequisite Review

#artificialintelligence

In this section we are going to review all the background knowledge you need to have in order to understand Deep Reinforcement Learning. This includes: ** Markov Decision Processes (MDPs) ** Dynamic Programming ** Monte Carlo ** Temporal difference learning ** Deep Learning ** Approximation Methods ** State Transition Probabilities Hope to enjoy it!


Advances in Experience Replay

arXiv.org Machine Learning

This project combines recent advances in experience replay techniques, namely, Combined Experience Replay (CER), Prioritized Experience Replay (PER), and Hindsight Experience Replay (HER). We show the results of combinations of these techniques with DDPG and DQN methods. CER always adds the most recent experience to the batch. PER chooses which experiences should be replayed based on how beneficial they will be towards learning. HER learns from failure by substituting the desired goal with the achieved goal and recomputing the reward function. The effectiveness of combinations of these experience replay techniques is tested in a variety of OpenAI gym environments.


Curiosity-driven Exploration for Mapless Navigation with Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Deep Reinforcement Learning (DRL), deploying deep neural networks as function approximators for highdimensional RL tasks, achieves state of the art performance in various fields of research [1]. DRL algorithms have been studied under the context of learning navigation policies for mobile robots. Traditional navigation solutions in robotics generally require a system of procedures, such as Simultaneous Localization and Mapping (SLAM) [2], localization and path planning in a given map, etc. With the powerful representation learning capabilities of deep networks, DRL methods bring about the possibility of learning control policies directly from raw sensory inputs, bypassing all the intermediate steps. Eliminating the requirement for localization, mapping, or path planning procedures, several DRL works have been presented that learn successful navigation policies directly from raw sensor inputs: target-driven navigation [3], successor feature RL for transferring navigation policies [4], and using auxiliary tasks to boost DRL training [5]. Many followup works have also been proposed, such as embedding SLAMlike structures into DRL networks [6], or utilizing DRL for multi-robot collision avoidance [7]. In this paper, we focus specifically on mapless navigation, where the agent is expected to navigate to a designated goal location without the knowledge of the map of its current environment.


A Study of AI Population Dynamics with Million-agent Reinforcement Learning

arXiv.org Artificial Intelligence

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning. Our intention is to put intelligent agents into a simulated natural context and verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning (DRL). In order to scale the population size up to millions agents, a large-scale DRL training platform with redesigned experience buffer is proposed. Our results show that the population dynamics of AI agents, driven only by each agent's individual self-interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.


Deep Reinforcement Learning in Python - Introduction

#artificialintelligence

Requirements: • Know reinforcement learning basics, MDPs, Dynamic Programming, Monte Carlo, TD Learning • Calculus and probability at the undergraduate level • Experience building machine learning models in Python and Numpy • Know how to build a feedforward, convolutional, and recurrent neural network using Theano and Tensorflow This course is all about the application of deep learning and neural networks to reinforcement learning. If you've taken my first reinforcement learning class, then you know that reinforcement learning is on the bleeding edge of what we can do with AI. Specifically, the combination of deep learning with reinforcement learning has led to AlphaGo beating a world champion in the strategy game Go, it has led to self-driving cars, and it has led to machines that can play video games at a superhuman level. Reinforcement learning has been around since the 70s but none of this has been possible until now. The world is changing at a very fast pace.