Goto

Collaborating Authors

 Reinforcement Learning


Playing FPS Games with Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as average humans in deathmatch scenarios.


Distributional Bellman and the C51 Algorithm

@machinelearnbot

I got the chance to read this paper on Distributional Bellman published by DeepMind in July. Glossing over it the first time, my impression was that it would be an important paper, since the theory was sound and the experimental results were promising. However, it did not generate as much noise in the reinforcement learning community as I would have hoped. Nevertheless, as I thought the idea of Distributional Bellman was pretty neat, I decided to implement it (in Keras) and test it out myself. I hope this article can help interested readers better understanding the core concepts of Distributional Bellman. To understand Distributional Bellman, we first have to acquire a basic understanding of Q Learning.


Deep Reinforcement Learning for Dynamic Treatment Regimes on Medical Registry Data

arXiv.org Machine Learning

This paper presents the first deep reinforcement learning (DRL) framework to estimate the optimal Dynamic Treatment Regimes from observational medical data. This framework is more flexible and adaptive for high dimensional action and state spaces than existing reinforcement learning methods to model real-life complexity in heterogeneous disease progression and treatment choices, with the goal of providing doctor and patients the data-driven personalized decision recommendations. The proposed DRL framework comprises (i) a supervised learning step to predict the most possible expert actions, and (ii) a deep reinforcement learning step to estimate the long-term value function of Dynamic Treatment Regimes. Both steps depend on deep neural networks. As a key motivational example, we have implemented the proposed framework on a data set from the Center for International Bone Marrow Transplant Research (CIBMTR) registry database, focusing on the sequence of prevention and treatments for acute and chronic graft versus host disease after transplantation. In the experimental results, we have demonstrated promising accuracy in predicting human experts' decisions, as well as the high expected reward function in the DRL-based dynamic treatment regimes.


Run, skeleton, run: skeletal model in a physics-based simulation

arXiv.org Machine Learning

In this paper, we present our approach to solve a physics-based reinforcement learning challenge "Learning to Run" with objective to train physiologically-based human model to navigate a complex obstacle course as quickly as possible. The environment is computationally expensive, has a high-dimensional continuous action space and is stochastic. We benchmark state of the art policy-gradient methods and test several improvements, such as layer normalization, parameter noise, action and state reflecting, to stabilize training and improve its sample-efficiency. We found that the Deep Deterministic Policy Gradient method is the most efficient method for this environment and the improvements we have introduced help to stabilize training. Learned models are able to generalize to new physical scenarios, e.g. different obstacle courses.


8 ways AI can help save the planet

#artificialintelligence

This nascent AI technique – which requires no input data, substantially less computing power, and in which the evolutionary-like AI learns from itself – could soon evolve to enable its application to real-world problems in the natural sciences. Collaboration with Earth scientists to identify the systems – from climate science, materials science, biology, and other areas – which can be codified to apply reinforcement learning for scientific progress and discovery is vital. For example, DeepMind co-founder, Demis Hassabis, has suggested that in materials science, a descendant of AlphaGo Zero could be used to search for a room temperature superconductor – a hypothetical substance that allows for incredibly efficient energy systems.


What is reinforcement learning: The next step in AI and deep learning

#artificialintelligence

Reinforcement learning has traditionally occupied a niche status in the world of artificial intelligence. But reinforcement learning has started to assume a larger role in many AI initiatives in the past few years. Its application sweet spot is in calculation of optimal actions to be taken by agents in environmentally contextualized decision scenarios. Using trial-and-error approaches to maximize an algorithmic reward function, reinforcement learning is well suited to many adaptive-control and multiagent automation applications in IT operations management, energy, health care, commerce, finance, transportation, and finance. And it's being used to train the AI that powers both its traditional focus areas--robotics, gaming, and simulation--and a new generation of AI solutions in edge analytics, natural language processing, machine translation, computer vision, and digital assistants.


Directions of AI Research in 2018

@machinelearnbot

Many existing Reinforcement Learning (RL) systems already rely on simulations to explore the solution space and solve complex problems. These include systems based on Self-Play for gaming applications. Self-Play is an essential part of the algorithms used by Google DeepMind in AlphaGo and in the more recent AlphaGo Zero reinforcement learning systems. These are the breakthrough approaches that have defeated the world champion at the ancient Chinese game of Go (D. Silver et al., 2017 https://www.nature.com/articles/nature24270 The newer AlphaGo Zero system has achieved a significant step forward compared to the original Alpha Go system.


On the Sample Complexity of the Linear Quadratic Regulator

arXiv.org Machine Learning

This paper addresses the optimal control problem known as the Linear Quadratic Regulator in the case when the dynamics are unknown. We propose a multi-stage procedure, called Coarse-ID control, that estimates a model from a few experimental trials, estimates the error in that model with respect to the truth, and then designs a controller using both the model and uncertainty estimate. Our technique uses contemporary tools from random matrix theory to bound the error in the estimation procedure. We also employ a recently developed approach to control synthesis called System Level Synthesis that enables robust control design by solving a convex optimization problem. We provide end-to-end bounds on the relative error in control cost that are nearly optimal in the number of parameters and that highlight salient properties of the system to be controlled such as closed-loop sensitivity and optimal control magnitude. We show experimentally that the Coarse-ID approach enables efficient computation of a stabilizing controller in regimes where simple control schemes that do not take the model uncertainty into account fail to stabilize the true system.


Understanding Supervised, Unsupervised, and Reinforcement Learning

#artificialintelligence

Once we start delving into the concepts behind Artificial Intelligence (AI) and Machine Learning (ML), we come across copious amounts of jargon related to this field of study. Understanding this jargon and how it can have an impact on the study related to ML goes a long way in comprehending the study that has been conducted by researchers and data scientists to get AI to the state it now is. In this article, I will be providing you with a comprehensive definition of supervised, unsupervised and reinforcement learning in the broader field of Machine Learning. You must have encountered these terms while hovering over articles pertaining to the progress made in AI and the role played by ML in propelling this success forward. Understanding these concepts is a given fact, and should not be compromised at any cost.


How Can Engineers Stop AI from Going Rogue?

#artificialintelligence

How do we stop artificial intelligence from going rogue? The idea is scary enough that robotics companies are proposing that the UN put a ban on killer autonomous robots. And it's made even scarier by mounting evidence that engineers actually understand very little about how AI algorithms do what they do. Doomsday singularity scenarios aside, rogue AI presents a very serious problem even in more everyday terms. What if the autonomous cars chauffeuring us around reach the wrong conclusions on how they should operate in traffic?