Reinforcement Learning
Sequential decision making
In artificial intelligence, sequential decision making refers to algorithms that take the dynamics[clarification needed] of the world into consideration,[1] thus delay parts of the problem until it must be solved[clarification needed]. It can be described as a procedural approach to decision-making, or as a step by step decision theory. Sequential decision making has as a consequence the intertemporal choice problem, where earlier decisions influences the later available choices.[2] This mathematics-related article is a stub. You can help Wikipedia by expanding it.
Learning One Representation to Optimize All Rewards
We introduce the forward-backward (FB) representation of the dynamics of a reward-free Markov decision process. It provides explicit near-optimal policies for any reward specified a posteriori. During an unsupervised phase, we use reward-free interactions with the environment to learn two representations via off-the-shelf deep learning methods and temporal difference (TD) learning. In the test phase, a reward representation is estimated either from observations or an explicit reward description (e.g., a target state). The optimal policy for that reward is directly obtained from these representations, with no planning. The unsupervised FB loss is well-principled: if training is perfect, the policies obtained are provably optimal for any reward function. With imperfect training, the sub-optimality is proportional to the unsupervised approximation error. The FB representation learns long-range relationships between states and actions, via a predictive occupancy map, without having to synthesize states as in model-based approaches. This is a step towards learning controllable agents in arbitrary black-box stochastic environments. This approach compares well to goal-oriented RL algorithms on discrete and continuous mazes, pixel-based MsPacman, and the FetchReach virtual robot arm. We also illustrate how the agent can immediately adapt to new tasks beyond goal-oriented RL.
Modelling Behavioural Diversity for Learning in Open-Ended Games
Nieves, Nicolas Perez, Yang, Yaodong, Slumbers, Oliver, Mguni, David Henry, Wang, Jun
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.
Simulation Studies on Deep Reinforcement Learning for Building Control with Human Interaction
Lee, Donghwan, He, Niao, Lee, Seungjae, Karava, Panagiota, Hu, Jianghai
The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings. Inspired by recent advances in reinforcement learning (RL), this paper aims at assessing the potential of RL in building climate control problems with occupant interaction. We apply a recent RL approach, called DDPG (deep deterministic policy gradient), for the continuous building control tasks and assess its performance with simulation studies in terms of its ability to handle (a) the partial state observability due to sensor limitations; (b) complex stochastic system with high-dimensional state-spaces, which are jointly continuous and discrete; (c) uncertainties due to ambient weather conditions, occupant's behavior, and comfort feelings. Especially, the partial observability and uncertainty due to the occupant interaction significantly complicate the control problem. Through simulation studies, the policy learned by DDPG demonstrates reasonable performance and computational tractability.
Investigating Value of Curriculum Reinforcement Learning in Autonomous Driving Under Diverse Road and Weather Conditions
Ozturk, Anil, Gunel, Mustafa Burak, Dagdanov, Resul, Vural, Mirac Ekim, Yurdakul, Ferhat, Dal, Melih, Ure, Nazim Kemal
Applications of reinforcement learning (RL) are popular in autonomous driving tasks. That being said, tuning the performance of an RL agent and guaranteeing the generalization performance across variety of different driving scenarios is still largely an open problem. In particular, getting good performance on complex road and weather conditions require exhaustive tuning and computation time. Curriculum RL, which focuses on solving simpler automation tasks in order to transfer knowledge to complex tasks, is attracting attention in RL community. The main contribution of this paper is a systematic study for investigating the value of curriculum reinforcement learning in autonomous driving applications. For this purpose, we setup several different driving scenarios in a realistic driving simulator, with varying road complexity and weather conditions. Next, we train and evaluate performance of RL agents on different sequences of task combinations and curricula. Results show that curriculum RL can yield significant gains in complex driving tasks, both in terms of driving performance and sample complexity. Results also demonstrate that different curricula might enable different benefits, which hints future research directions for automated curriculum training.
Reinforcement Learning, Bit by Bit
Lu, Xiuyuan, Van Roy, Benjamin, Dwaracherla, Vikranth, Ibrahimi, Morteza, Osband, Ian, Wen, Zheng
Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We develop concepts and establish a regret bound that together offer principled guidance. The bound sheds light on questions of what information to seek, how to seek that information, and what information to retain. To illustrate concepts, we design simple agents that build on them and present computational results that demonstrate improvements in data efficiency. Other learning paradigms are about minimization; reinforcement learning is about maximization.
Image Recognition A.I. Has a Weakness. This Could Fix It
You're probably familiar with deepfakes, the digitally altered "synthetic media" that's capable of fooling people into seeing or hearing things that never actually happened. Adversarial examples are like deepfakes for image-recognition A.I. systems -- and while they don't look even slightly strange to us, they're capable of befuddling the heck out of machines. Several years ago, researchers at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL) found that they could fool even sophisticated image recognition algorithms into confusing objects simply by slightly altering their surface texture. In the researchers' demonstration, they showed that it was possible to get a cutting-edge neural network to look at a 3D-printed turtle and see a rifle instead. Or to gaze upon a baseball and come away with the conclusion that it is an espresso.
Hybrid computer approach to train a machine learning system
This book chapter describes a novel approach to training machine learning systems by means of a hybrid computer setup i.e. a digital computer tightly coupled with an analog computer. As an example a reinforcement learning system is trained to balance an inverted pendulum which is simulated on an analog computer, thus demonstrating a solution to the major challenge of adequately simulating the environment for reinforcement learning.
A Survey of Embodied AI: From Simulators to Research Tasks
Duan, Jiafei, Yu, Samson, Tan, Hui Li, Zhu, Hongyuan, Tan, Cheston
There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", whereby AI algorithms and agents no longer simply learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through embodied physical interactions with their environments, whether real or simulated. Consequently, there has been substantial growth in the demand for embodied AI simulators to support a diversity of embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of artificial general intelligence, but there is no contemporary and comprehensive survey of this field. This paper comprehensively surveys state-of-the-art embodied AI simulators and research, mapping connections between these. By benchmarking nine state-of-the-art embodied AI simulators in terms of seven features, this paper aims to understand the simulators in their provision for use in embodied AI research. Finally, based upon the simulators and a pyramidal hierarchy of embodied AI research tasks, this paper surveys the main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation and datasets.
The successor representation, gamma-models, and infinite-horizon prediction
Reinforcement learning algorithms are frequently categorized by whether they predict future states at any point in their decision-making process. Those that do are called model-based, and those that do not are dubbed model-free. This classification is so common that we mostly take it for granted these days; I am guilty of using it myself. However, this distinction is not as clear-cut as it may initially seem. In this post, I will talk about an alternative view that emphases the mechanism of prediction instead of the content of prediction.