Reinforcement Learning
Hierarchical Reinforcement Learning for Multi-agent MOBA Game
Zhang, Zhijian, Li, Haozheng, Zhang, Luo, Zheng, Tianyin, Zhang, Ting, Hao, Xiong, Chen, Xiaoxin, Chen, Min, Xiao, Fangxu, Zhou, Wei
Although deep reinforcement learning has achieved great success recently, there are still challenges in Real Time Strategy (RTS) games. Due to its large state and action space, as well as hidden information, RTS games require macro strategies as well as micro level manipulation to obtain satisfactory performance. In this paper, we present a novel hierarchical reinforcement learning model for mastering Multiplayer Online Battle Arena (MOBA) games, a sub-genre of RTS games. In this hierarchical framework, agents make macro strategies by imitation learning and do micromanipulations through reinforcement learning. Moreover, we propose a simple self-learning method to get better sample efficiency for reinforcement part and extract some global features by multi-target detection method in the absence of game engine or API. In 1v1 mode, our agent successfully learns to combat and defeat built-in AI with 100\% win rate, and experiments show that our method can create a competitive multi-agent for a kind of mobile MOBA game King of Glory (KOG) in 5v5 mode.
Reinforcement Learning in Motion
Reinforcement Learning in Motion introduces you to the exciting world of machine systems that learn from their environments! In this course, he'll break down key concepts like how RL systems learn, how to sense and process environmental data, and how to build and train AI agents. As you learn, you'll master the core algorithms and get to grips with tools like Open AI Gym, numpy, and Matplotlib. Reinforcement systems learn by doing, and so will you in this interactive, hands-on course! You'll build and train a variety of algorithms as you go, each with a specific purpose in mind.
Qrash Course: Reinforcement Learning 101 & Deep Q Networks in 10 Minutes
This article assumes no prior knowledge in Reinforcement Learning, but it does assume some basic understanding of neural networks. Out of all the different types of Machine Learning fields, the one fascinating me the most is Reinforcement Learning. For those who are less familiar with it -- while Supervised Learning deals with predicting values or classes based on labeled data and Unsupervised Learning deals with clustering and finding relations in unlabeled data, Reinforcement Learning deals with how some arbitrary being (formally referred to as an "Agent") should act and behave in a given environment. The way it is done is by giving the Agent rewards or punishments based on the actions it has performed on different scenarios. One of the first practical Reinforcement Learning methods I learned was Deep Q Networks, and I believe it's an excellent kickstart to this journey.
Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning
Lee, Joonho, Hwangbo, Jemin, Hutter, Marco
The ability to recover from a fall is an essential feature for a legged robot to navigate in challenging environments robustly. Until today, there has been very little progress on this topic. Current solutions mostly build upon (heuristically) predefined trajectories, resulting in unnatural behaviors and requiring considerable effort in engineering system-specific components. In this paper, we present an approach based on model-free Deep Reinforcement Learning (RL) to control recovery maneuvers of quadrupedal robots using a hierarchical behavior-based controller. The controller consists of four neural network policies including three behaviors and one behavior selector to coordinate them. Each of them is trained individually in simulation and deployed directly on a real system. We experimentally validate our approach on the quadrupedal robot ANYmal, which is a dog-sized quadrupedal system with 12 degrees of freedom. With our method, ANYmal manifests dynamic and reactive recovery behaviors to recover from an arbitrary fall configuration within less than 5 seconds. We tested the recovery maneuver more than 100 times, and the success rate was higher than 97 %.
Visual Imitation Learning with Recurrent Siamese Networks
Berseth, Glen, Pal, Christopher J.
People solve the difficult problem of understanding the salient features of both observations of others and the relationship to their own state when learning to imitate specific tasks. In this work, we train a comparator network which is used to compute distances between motions. Given a desired motion the comparator can provide a reward signal to the agent via the distance between the desired motion and the agent's motion. We train an RNN-based comparator model to compute distances in space and time between motion clips while training an RL policy to minimize this distance. Furthermore, we examine a challenging form of this problem where a single \demonstrationText is provided for a given task. We demonstrate our approach in the setting of deep learning based control for physical simulation of humanoid walking in both 2D with $10$ degrees of freedom (DoF) and 3D with $38$ DoF.
RL-- Introduction to Deep Reinforcement Learning โ Jonathan Hui โ Medium
Deep reinforcement learning is about taking the best actions from what we see and hear. Unfortunately, reinforcement learning RL has a high barrier in learning the concepts and the lingos. In this article, we will cover deep RL with an overview of the general landscape. Yet, we will not shy away from equations and lingos. They provide the basics in understanding the concepts deeper. We will not appeal to you that it only takes 20 lines of code to tackle an RL problem. The official answer should be one! But we will try hard to make it approachable. In most AI topics, we create mathematical frameworks to tackle problems. For RL, the answer is the Markov Decision Process (MDP). It sounds complicated but it produces an easy framework to model a complex problem. An agent (e.g. a human) observes the environment and takes actions. Rewards are given out but they may be infrequent and delayed. Very often, the long-delayed rewards make it extremely hard to untangle the information and traceback what sequence of actions contributed to the rewards.
Soft actor critic โ Deep reinforcement learning with real-world robots
We are announcing the release of our state-of-the-art off-policy model-free reinforcement learning algorithm, soft actor-critic (SAC). This algorithm has been developed jointly at UC Berkeley and Google Brain, and we have been using it internally for our robotics experiment. Soft actor-critic is, to our knowledge, one of the most efficient model-free algorithms available today, making it especially well-suited for real-world robotic learning. We also release our implementation of SAC, which is particularly designed for real-world robotic systems. What makes an ideal deep RL algorithm for real-world systems?
A Short Survey on Probabilistic Reinforcement Learning
A reinforcement learning agent tries to maximize its cumulative payoff by interacting in an unknown environment. It is important for the agent to explore suboptimal actions as well as to pick actions with highest known rewards. Yet, in sensitive domains, collecting more data with exploration is not always possible, but it is important to find a policy with a certain performance guaranty. In this paper, we present a brief survey of methods available in the literature for balancing exploration-exploitation trade off and computing robust solutions from fixed samples in reinforcement learning.
NervanaSystems/coach
Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-use APIs for experimenting with new RL algorithms, and allows simple integration of new environments to solve. Basic RL components (algorithms, environments, neural network architectures, exploration policies, ...) are well decoupled, so that extending and reusing existing components is fairly painless. Contacting the Coach development team is also possible through the email coach@intel.com One of the main challenges when building a research project, or a solution based on a published algorithm, is getting a concrete and reliable baseline that reproduces the algorithm's results, as reported by its authors.
Manning Deal of the Day
Grokking Deep Learning for Computer Vision teaches you the concepts and tools for building intelligent, scalable computer vision systems. Using Python, OpenCV, Keras, Tensorflow, and Amazon's MxNet, you'll discover advanced techniques for building amazing end-to-end CV projects! Use this same code to get half off GANs in Action and Grokking Deep Reinforcement Learning.