Goto

Collaborating Authors

 Reinforcement Learning


Getting Started With MarathonEnvs v0.5.0a

#artificialintelligence

I have spent the last two years learning Reinforcement Learning. I created Marathon Environments to help explore the applicability of robotics and locomotion research to Video Games in the domain of Active Ragdoll and Virtual Agents. This tutorial provides a primer on Marathon Environments. Marathon Environments re-implements the classic set of Continuous Control benchmarks typically seen in Deep Reinforcement Learning literature as Unity environments using the ML-Agents toolkit. Marathon Environments was released alongside Unity ML- Agents v0.5 and includes four continuous control environments.


Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS

arXiv.org Artificial Intelligence

This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games. Specifically, we compare two representations: (1) a global representation where the observation represents the whole game state, and the RL agent needs to choose which unit to issue actions to, and which actions to execute; and (2) a local representation where the observation is represented from the point of view of an individual unit, and the RL agent picks actions for each unit independently. We evaluate these representations in MicroRTS showing that the local representation seems to outperform the global representation when training agents with the task of harvesting resources.


Convergent Policy Optimization for Safe Reinforcement Learning

arXiv.org Machine Learning

We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions. For such a problem, we construct a sequence of surrogate convex constrained optimization problems by replacing the nonconvex functions locally with convex quadratic functions obtained from policy gradient estimators. We prove that the solutions to these surrogate problems converge to a stationary point of the original nonconvex problem. Furthermore, to extend our theoretical results, we apply our algorithm to examples of optimal control and multi-agent reinforcement learning with safety constraints.


ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations

arXiv.org Artificial Intelligence

Learning from demonstrations is a popular tool for accelerating and reducing the exploration requirements of reinforcement learning. When providing expert demonstrations to human students, we know that the demonstrations must fall within a particular range of difficulties called the "Zone of Proximal Development (ZPD)". If they are too easy the student learns nothing, but if they are too difficult the student is unable to follow along. This raises the question: Given a set of potential demonstrators, which among them is best suited for teaching any particular learner? Prior work, such as the popular Deep Q-learning from Demonstrations (DQfD) algorithm has generally focused on single demonstrators. In this work we consider the problem of choosing among multiple demonstrators of varying skill levels. Our results align with intuition from human learners: it is not always the best policy to draw demonstrations from the best performing demonstrator (in terms of reward). We show that careful selection of teaching strategies can result in sample efficiency gains in the learner's environment across nine Atari games


D-Point Trigonometric Path Planning based on Q-Learning in Uncertain Environments

arXiv.org Artificial Intelligence

Finding the optimum path for a robot for moving from start to the goal position through obstacles is still a challenging issue. Thi s paper presents a novel path planning method, named D - point trigonometric, based on Q - learning algorithm for dynamic and uncertain environments, in which all the obstacles and the target are moving. We define a new state, action and reward functions for t he Q - learning by which the agent can find the best action in every state to reach the goal in the most appropriate path. Moreover, the experiment s in Unity3D confirmed the high convergence speed, the high hit rate, as well as the low dependency on environmental parameters of the proposed method compared with an opponent approach. The planning has been considered as a challenging concern in video games [1], transportation systems [2], and mobile robots [3] [4] . A s the most important path planning issues, w e can refer to the dynamics and the uncertainty of the environment, the smoothness and the length of the path, obstacle avoidance, and the computation al cost . In the last few decades, researchers have done numerous research efforts to present new approaches to solve them [5] [6] [7] [8] . Generally, most of the path planning approaches are categorized to one of the following methods [9] [10] [11]: ( 1) Classical methods (a) Computational geometry (CG) (b) Probabilistic r oadmap (PRM) (c) Potential fields method (PFM) ( 2) Heuristic and meta heuristic methods (a) Soft computing (b) Hybrid algorithms Since the complexity and the execution time of CG methods were high [11], PRMs were proposed to red uce the search space using techniques like milestones [12] .


Reinforcement Learning Sessions at DataHack Summit 2019

#artificialintelligence

"If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake." Reinforcement learning is an intriguing and complex field. We at Analytics Vidhya are strongly behind the incredible potential of this domain and the breakthroughs and research by behemoths like DeepMind support our thought process. Reinforcement learning (RL) has been around for a while now. However, there is a general perception in the community that RL's use cases are limited to computer simulations.


Reinforcement Learning Explained: Overview, Comparisons and Applications in Business

#artificialintelligence

RL algorithm learns how to act best through many attempts and failures. Trial-and-error learning is connected with the so-called long-term reward. This reward is the ultimate goal the agent learns while interacting with an environment through numerous trials and errors. The algorithm gets short-term rewards that together lead to the cumulative, long-term one. So, the key goal of reinforcement learning used today is to define the best sequence of decisions that allow the agent to solve a problem while maximizing a long-term reward. And that set of coherent actions is learned through the interaction with environment and observation of rewards in every state. Reinforcement learning is distinguished from other training styles, including supervised and unsupervised learning, by its goal and, consequently, the learning approach. Three ML training styles compared.


Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

arXiv.org Machine Learning

We present relay policy learning, a method for imitation and reinforcement learning that can solve multi-stage, long-horizon robotic tasks. This general and universally-applicable, two-phase approach consists of an imitation learning stage that produces goal-conditioned hierarchical policies, and a reinforcement learning phase that finetunes these policies for task performance. Our method, while not necessarily perfect at imitation learning, is very amenable to further improvement via environment interaction, allowing it to scale to challenging long-horizon tasks. We simplify the long-horizon policy learning problem by using a novel data-relabeling algorithm for learning goal-conditioned hierarchical policies, where the low-level only acts for a fixed number of steps, regardless of the goal achieved. While we rely on demonstration data to bootstrap policy learning, we do not assume access to demonstrations of every specific tasks that is being solved, and instead leverage unstructured and unsegmented demonstrations of semantically meaningful behaviors that are not only less burdensome to provide, but also can greatly facilitate further improvement using reinforcement learning. We demonstrate the effectiveness of our method on a number of multi-stage, long-horizon manipulation tasks in a challenging kitchen simulation environment. Videos are available at https://relay-policy-learning.github.io/


Deep Q-Learning for Same-Day Delivery with a Heterogeneous Fleet of Vehicles and Drones

arXiv.org Machine Learning

In this paper, we consider same-day delivery with a heterogeneous fleet of vehicles and drones. Customers make delivery requests over the course of the day and the dispatcher dynamically dispatches vehicles and drones to deliver the goods to customers before their delivery deadline. Vehicles can deliver multiple packages in one route but travel relatively slowly due to the urban traffic. Drones travel faster, but they have limited capacity and require charging or battery swaps. To exploit the different strengths of the fleets, we propose a deep Q-learning approach. Our method learns the value of assigning a new customer to either drones or vehicles as well as the option to not offer service at all. To aid feature selection, we present an analytical analysis that demonstrates the role that different types of information have on the value function and decision making. In a systematic computational analysis, we show the superiority of our policy compared to benchmark policies and the effectiveness of our deep Q-learning approach.


On the convergence of projective-simulation-based reinforcement learning in Markov decision processes

arXiv.org Artificial Intelligence

In recent years, the interest in leveraging quantum effects for enhancing machine learning tasks has significantly increased. Many algorithms speeding up supervised and unsupervised learning were established. The first framework in which ways to exploit quantum resources specifically for the broader context of reinforcement learning were found is projective simulation. Projective simulation presents an agent-based reinforcement learning approach designed in a manner which may support quantum walk-based speed-ups. Although classical variants of projective simulation have been benchmarked against common reinforcement learning algorithms, very few formal theoretical analyses have been provided for its performance in standard learning scenarios. In this paper, we provide a detailed formal discussion of the properties of this model. Specifically, we prove that one version of the projective simulation model, understood as a reinforcement learning approach, converges to optimal behavior in a large class of Markov decision processes. This proof shows that a physically-inspired approach to reinforcement learning can guarantee to converge.