Goto

Collaborating Authors

 Reinforcement Learning


Intuitive RL: Intro to Advantage-Actor-Critic (A2C)

#artificialintelligence

Reinforcement learning (RL) practitioners have produced a number of excellent tutorials. Most, however, describe RL in terms of mathematical equations and abstract diagrams. We like to think of the field from a different perspective. RL itself is inspired by how animals learn, so why not translate the underlying RL machinery back into the natural phenomena they're designed to mimic? Humans learn best through stories.


Diverse Behavior Is What Game AI Needs: Generating Varied Human-Like Playing Styles Using Evolutionary Multi-Objective Deep Reinforcement Learning

arXiv.org Machine Learning

Designing artificial intelligence for games (Game AI) has been long recognized as a notoriously challenging task in game industry, as it mainly relies on manual design, requiring plenty of domain knowledge. More frustratingly, even spending a lot of efforts, a satisfying Game AI is still hard to achieve by manual design due to the almost infinite search space. The recent success of deep reinforcement learning (DRL) sheds light on advancing automated game designing, significantly relaxing human competitive intelligent supp ort. However, existing DRL algorithms mostly focus on training a Game AI to win the game rather that the way it wins (style). To bridge the gap, we introduce EMO-DRL, an end-to-end game design framework, leveraging evolutionary algorithm, DRL and multi-objective optimization (MOO) to perform intelligent and automatic game design. Firstly, EMO-DRL proposes the style-oriented learning to bypass manual reward shaping in DRL and directly learns a Game AI with an expected style in an end-to-end fashion. On this basis, the prioritized multi-objective optimization is introduced to achieve more diverse, nature and humanlike Game AI. Large-scale evaluations on a Atari game and a commercial massively mul-tiplayer online game are conducted. The results demonstrat es that EMO-DRL, compared to existing algorithms, achieve better game designs in an intelligent and automatic way.


Autonomous Industrial Management via Reinforcement Learning: Self-Learning Agents for Decision-Making -- A Review

arXiv.org Artificial Intelligence

Industry has always been in the pursuit of becoming more economically efficient and the current focus has been to reduce human labour using modern technologies. Even with cutting edge technologies, which range from packaging robots to AI for fault detection, there is still some ambiguity on the aims of some new systems, namely, whether they are automated or autonomous. In this paper we indicate the distinctions between automated and autonomous system as well as review the current literature and identify the core challenges for creating learning mechanisms of autonomous agents. We discuss using different types of extended realities, such as digital twins, to train reinforcement learning agents to learn specific tasks through generalization. Once generalization is achieved, we discuss how these can be used to develop self-learning agents. We then introduce self-play scenarios and how they can be used to teach self-learning agents through a supportive environment which focuses on how the agents can adapt to different real-world environments.


Policy Learning for Malaria Control

arXiv.org Artificial Intelligence

Sequential decision making is a typical problem in reinforcement learning with plenty of algorithms to solve it. However, only a few of them can work effectively with a very small number of observations. In this report, we introduce the progress to learn the policy for Malaria Control as a Reinforcement Learning problem in the KDD Cup Challenge 2019 and propose diverse solutions to deal with the limited observations problem. We apply the Genetic Algorithm, Bayesian Optimization, Q-learning with sequence breaking to find the optimal policy for five years in a row with only 20 episodes/100 evaluations. We evaluate those algorithms and compare their performance with Random Search as a baseline. Among these algorithms, Q-Learning with sequence breaking has been submitted to the challenge and got ranked 7th in KDD Cup.


RLScheduler: Learn to Schedule HPC Batch Jobs Using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

We present RLScheduler, a deep reinforcement learning based job scheduler for scheduling independent batch jobs in high-performance computing (HPC) environment. From knowing nothing about scheduling at beginning, RLScheduler is able to autonomously learn how to effectively schedule HPC batch jobs, targeting a given optimization goal. This is achieved by deep reinforcement learning with the help of specially designed neural network structures and various optimizations to stabilize and accelerate the learning. Our results show that RLScheduler can outperform existing heuristic scheduling algorithms, including a manually fine-tuned machine learning-based scheduler on the same workload. More importantly, we show that RLScheduler does not blindly over-fit the given workload to achieve such optimization, instead, it learns general rules for scheduling batch jobs which can be further applied to different workloads and systems to achieve similarly optimized performance. We also demonstrate that RLScheduler is capable of adjusting itself along with changing goals and workloads, making it an attractive solution for the future autonomous HPC management.


Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

arXiv.org Artificial Intelligence

This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g. videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and the expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.



Azalia Mirhoseini

#artificialintelligence

Azalia Mirhoseini, a research scientist at Google Brain, is using artificial intelligence itself to make better chips for artificial intelligence. Many microchips that are used for AI weren't specifically built for it. Most are repurposed from hardware used in video and gaming. As a result, these older, human-engineered designs leave much to be desired in terms of energy efficiency, cost, and functionality. Mirhoseini's system--which trained itself using trial and error, based on the AI concept of reinforcement learning--can produce chip designs in just a few hours.


A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited. By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks. At each step, the assignment is obtained by solving a centralized optimization problem (the inference procedure) whose objective function is parameterized by a learned scoring model. We propose different combinations of inference procedures and scoring models able to represent coordination patterns of increasing complexity. The resulting assignment policy can be efficiently learned on small problem instances and readily reused in problems with more agents and tasks (i.e., zero-shot generalization). We report experimental results on a toy search and rescue problem and on several target selection scenarios in StarCraft: Brood War, in which our model significantly outperforms strong rule-based baselines on instances with 5 times more agents and tasks than those seen during training.


Explainable AI: Deep Reinforcement Learning Agents for Residential Demand Side Cost Savings in Smart Grids

arXiv.org Artificial Intelligence

Motivated by the recent advancements in deep Reinforcement Learning (RL), we develop an RL agent to manage the operation of storage devices in a household designed to maximize demand-side cost savings. The proposed technique is data-driven, and the RL agent learns from scratch on how to efficiently use the energy storage device under variable tariff-structures Contracting the concept of the "black box" where the techniques learned by the agent are ignored. We explain the learning progression of the RL agent, and the strategies it follows based on the capacity of the storage device.