Reinforcement Learning
Expected Policy Gradients for Reinforcement Learning
Ciosek, Kamil, Whiteson, Shimon
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadric critics and then extend it to an analytical method for the universal case, covering a broad class of actors and critics, including Gaussian, exponential families, and reparameterised policies with bounded support. For Gaussian policies, we show that it is optimal to explore using covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions. EPG also provides a general framework for reasoning about policy gradient methods, which we use to establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. Furthermore, we prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and with little computational overhead. Finally, we show that EPG outperforms existing approaches on six challenging domains involving the simulated control of physical systems.
Machine Learning at Udacity Goes Deeper Udacity
We just unlocked a Free Preview of our Machine Learning Engineer Nanodegree Program! Discover amazing new content, and explore your future in Machine Learning, today! The Machine Learning Engineer Nanodegree program has been one of Udacity's benchmark programs for over 2 years. Thousands of students have graduated the program, and many have gone on to great careers at companies like Google, Amazon, and more. As technology evolves, so does our curriculum, and we think much of the program's success can be attributed to keeping the content up-to-the-minute current.
zuoxingdong/gym-maze
This repository contains a customizable gym environment for all kinds of mazes or gridworlds. The motivation of this repository is, as maze or gridworld are used very often in the reinforcement learning community, however, it is still lack of a standardized framework. The repo will be actively maintained, any comments, feedbacks or improvements are highly welcomed. We have provided a Jupyter Notebook to illustrate how to make various of maze environments, and generate animation of the agent's trajectory following the optimal actions solved by A* optimal planner.
Gated-Attention Architectures for Task-Oriented Language Grounding
Chaplot, Devendra Singh, Sathyendra, Kanthashree Mysore, Pasumarthi, Rama Kumar, Rajagopal, Dheeraj, Salakhutdinov, Ruslan
To perform tasks specified by natural language instructions, autonomous agents need to extract semantically meaningful representations of language and map it to visual elements and actions in the environment. This problem is called task-oriented language grounding. We propose an end-to-end trainable neural architecture for task-oriented language grounding in 3D environments which assumes no prior linguistic or perceptual knowledge and requires only raw pixels from the environment and the natural language instruction as input. The proposed model combines the image and text representations using a Gated-Attention mechanism and learns a policy to execute the natural language instruction using standard reinforcement and imitation learning methods. We show the effectiveness of the proposed model on unseen instructions as well as unseen maps, both quantitatively and qualitatively. We also introduce a novel environment based on a 3D game engine to simulate the challenges of task-oriented language grounding over a rich set of instructions and environment states.
Expressivity, Trainability, and Generalization in Machine Learning
Update 11/29: I'm looking for translators to help translate this post into different languages, particularly Chinese (ไธญๆ), Spanish (Espaรฑol), Korean (ํ๊ตญ์ด), Russian (ัั ััะบะธะน ัะทั ะบ), and Japanese (ๆฅๆฌ่ช). When I read Machine Learning papers, I ask myself whether the contributions of the paper fall under improvements to 1) Expressivity 2) Trainability, and/or 3) Generalization. I learned this categorization from my colleague Jascha Sohl-Dickstein at Google Brain, and the terminology is also introduced in this paper. I have found this categorization effective in thinking about how individual research papers (especially on the theoretical side) tie subfields of AI research (e.g. In this blog post, I discuss how these concepts tie into current (Nov 2017) machine learning research on Supervised Learning, Unsupervised Learning, and Reinforcement Learning. I consider Generalization to be comprised of two categories -- "weak" and "strong" generalization -- and I will discuss them separately.
Applications of Deep Learning and Reinforcement Learning to Biological Data
Mahmud, Mufti, Kaiser, M. Shamim, Hussain, Amir, Vassanelli, Stefano
Rapid advances of hardware-based technologies during the past decades have opened up new possibilities for Life scientists to gather multimodal data in various application domains (e.g., Omics, Bioimaging, Medical Imaging, and [Brain/Body]-Machine Interfaces), thus generating novel opportunities for development of dedicated data intensive machine learning techniques. Overall, recent research in Deep learning (DL), Reinforcement learning (RL), and their combination (Deep RL) promise to revolutionize Artificial Intelligence. The growth in computational power accompanied by faster and increased data storage and declining computing costs have already allowed scientists in various fields to apply these techniques on datasets that were previously intractable for their size and complexity. This review article provides a comprehensive survey on the application of DL, RL, and Deep RL techniques in mining Biological data. In addition, we compare performances of DL techniques when applied to different datasets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations
This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. In our setting the model and algorithm do not decouple by agent. In order to find Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In our numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to previous approaches using tabular representations. Moreover, with sub-optimal expert demonstrations our algorithms recover both reward functions and strategies with good quality.
Beginner's guide to Reinforcement Learning & its implementation in Python
One of the most fundamental question for scientists across the globe has been โ "How to learn a new skill?". The desire to understand the answer is obvious โ if we can understand this, we can enable human species to do things we might not have thought before. Alternately, we can train machines to do more "human" tasks and create true artificial intelligence. While we don't have a complete answer to the above question yet, there are a few things which are clear. Irrespective of the skill, we first learn by interacting with the environment.
Guest Post (Part I): Demystifying Deep Reinforcement Learning - Intel AI
Two years ago, a small company in London called DeepMind uploaded their pioneering paper "Playing Atari with Deep Reinforcement Learning" to Arxiv. In this paper they demonstrated how a computer learned to play Atari 2600 video games by observing just the screen pixels and receiving a reward when the game score increased. The result was remarkable, because the games and the goals in every game were very different and designed to be challenging for humans. The same model architecture, without any change, was used to learn seven different games, and in three of them the algorithm performed even better than a human! It has been hailed since then as the first step towards general artificial intelligence โ an AI that can survive in a variety of environments, instead of being confined to strict realms such as playing chess. No wonder DeepMind was immediately bought by Google and has been on the forefront of deep learning research ever since.