Reinforcement Learning
Microsoft proposes AI that improves when you smile
Positive affectivity, or the characteristic that describes how people experience affects (e.g., sensations, emotions, and sentiments) and interact with others as a consequence, has been linked to increased interest and curiosity as well as satisfaction in learning. Inspired by this, a team of Microsoft researchers propose imbuing reinforcement learning, an AI training technique that employs rewards to spur systems toward goals, with positive affect, which they assert might drive exploration useful in gathering experiences critical to learning. As the researchers explain, reinforcement learning is commonly implemented via policy-specific rewards designed for a predefined goal. Problematically, these extrinsic rewards are narrow in scope and can be difficult to define, as opposed to intrinsic rewards that are task-independent and quickly indicate success or failure. In pursuit of an intrinsic policy, the researchers developed a framework comprising mechanisms motivated by human affect -- one that motivates agents by drives like delight.
Ubisoft uses AI to teach a car to drive itself in a racing game
Reinforcement learning, an AI training technique that employs rewards to drive software policies toward goals, has been applied successfully to domains from industrial robotics to drug discovery. But while firms including OpenAI and Alphabet's DeepMind have investigated its efficacy in video games like Dota 2, Quake III Arena, and StarCraft 2, few to date have studied its use under constraints like those encountered in the game industry. That's presumably why Ubisoft La Forge, game developer Ubisoft's eponymous prototyping space, proposed in a recent paper an algorithm that's able to handle discrete, continuous video game actions in a "principled" and predictable way. They set it loose on a "commercial game" (likely The Crew or The Crew 2, though neither is explicitly mentioned) and report that it's competitive with state-of-the-art benchmark tasks. "Reinforcement Learning applications in video games have recently seen massive advances coming from the research community, with agents trained to play Atari games from pixels or to be competitive with the best players in the world in complicated imperfect information games," wrote the coauthors of a paper describing the work.
Reimagining Reinforcement Learning – Upside Down
Summary: For all the hype around winning game play and self-driving cars, traditional Reinforcement Learning (RL) has yet to deliver as a reliable tool for ML applications. Here we explore the main drawbacks as well as an innovative approach to RL that dramatically reduces the training compute requirement and time to train. Ever since Reinforcement Learning (RL) was recognized as a legitimate third style of machine learning alongside supervised and unsupervised learning we've been waiting for that killer app to prove its value. Yes RL has had some press-worthy wins in game play (Alpha Go), self-driving cars (not here yet), drone control, and even dialogue systems like personal assistants but the big breakthrough isn't here yet. RL ought to be our go-to solution for any problem requiring sequential decisions and these individual successes might make you think that RL is ready for prime time but the reality is that it's not.
SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning
Loon, Keng Wah, Graesser, Laura, Cvitkovic, Milan
We introduce SLM Lab, a software framework for reproducible reinforcement learning (RL) research. SLM Lab implements a number of popular RL algorithms, provides synchronous and asynchronous parallel experiment execution, hyperparameter search, and result analysis. RL algorithms in SLM Lab are implemented in a modular way such that differences in algorithm performance can be confidently ascribed to differences between algorithms, not between implementations. In this work we present the design choices behind SLM Lab and use it to produce a comprehensive single-codebase RL algorithm benchmark. In addition, as a consequence of SLM Lab's modular design, we introduce and evaluate a discrete-action variant of the Soft Actor-Critic algorithm (Haarnoja et al., 2018) and a hybrid synchronous/asynchronous training method for RL agents.
Crowdfunding Dynamics Tracking: A Reinforcement Learning Approach
Wang, Jun, Zhang, Hefu, Liu, Qi, Pan, Zhen, Tao, Hanqing
Recent years have witnessed the increasing interests in research of crowdfunding mechanism. In this area, dynamics tracking is a significant issue but is still under exploration. Existing studies either fit the fluctuations of time-series or employ regularization terms to constrain learned tendencies. However, few of them take into account the inherent decision-making process between investors and crowdfunding dynamics. To address the problem, in this paper, we propose a Trajectory-based Continuous Control for Crowdfunding (TC3) algorithm to predict the funding progress in crowdfunding. Specifically, actor-critic frameworks are employed to model the relationship between investors and campaigns, where all of the investors are viewed as an agent that could interact with the environment derived from the real dynamics of campaigns. Then, to further explore the in-depth implications of patterns (i.e., typical characters) in funding series, we propose to subdivide them into $\textit{fast-growing}$ and $\textit{slow-growing}$ ones. Moreover, for the purpose of switching from different kinds of patterns, the actor component of TC3 is extended with a structure of options, which comes to the TC3-Options. Finally, extensive experiments on the Indiegogo dataset not only demonstrate the effectiveness of our methods, but also validate our assumption that the entire pattern learned by TC3-Options is indeed the U-shaped one.
Observational Overfitting in Reinforcement Learning
Song, Xingyou, Jiang, Yiding, Tu, Stephen, Du, Yilun, Neyshabur, Behnam
A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP). We provide a general framework for analyzing this scenario, which we use to design multiple synthetic benchmarks from only modifying the observation space of an MDP. When an agent overfits to different observation spaces even if the underlying MDP dynamics is fixed, we term this observational overfitting. Our experiments expose intriguing properties especially with regards to implicit regularization, and also corroborate results from previous works in RL generalization and supervised learning (SL).
Training Reinforcement Learning Agents to Ask the Right Questions
That paradigm assumes that the target knowledge is already embedded in the dataset and doesn't require any further clarifications but that rarely resembles how humans learn. When presented with a new subject, we are constantly forced to ask questions and clarifications about it. What if we could build the same skill into artificial intelligence(AI) models. The ability of formulate questions is a fundamental element of the human cognition process. The cornerstone of human's dialogs relies on our ability to express questions in a myriad of ways in order to obtain a specific answer.
Quasi-Newton Trust Region Policy Optimization
Jha, Devesh, Raghunathan, Arvind, Romeres, Diego
We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization QNTRPO. Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance
A Survey of Deep Reinforcement Learning in Video Games
Shao, Kun, Tang, Zhentao, Zhu, Yuanheng, Li, Nannan, Zhao, Dongbin
Deep reinforcement learning (DRL) has made great achievements since proposed. Generally, DRL agents receive high-dimensional inputs at each step, and make actions according to deep-neural-network-based policies. This learning mechanism updates the policy to maximize the return with an end-to-end method. In this paper, we survey the progress of DRL methods, including value-based, policy gradient, and model-based algorithms, and compare their main techniques and properties. Besides, DRL plays an important role in game artificial intelligence (AI). We also take a review of the achievements of DRL in various video games, including classical Arcade games, first-person perspective games and multi-agent real-time strategy games, from 2D to 3D, and from single-agent to multi-agent. A large number of video game AIs with DRL have achieved super-human performance, while there are still some challenges in this domain. Therefore, we also discuss some key points when applying DRL methods to this field, including exploration-exploitation, sample efficiency, generalization and transfer, multi-agent learning, imperfect information, and delayed spare rewards, as well as some research directions.
Simulation-based reinforcement learning for real-world autonomous driving
Osiński, Błażej, Jakubowski, Adam, Miłoś, Piotr, Zięcina, Paweł, Galias, Christopher, Homoceanu, Silviu, Michalewski, Henryk
We use synthetic data and a reinforcement learning algorithm to train a system controlling a full-size real-world vehicle in a number of restricted driving scenarios. The driving policy uses RGB images as input. We analyze how design decisions about perception, control and training impact the real-world performance.