AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Sequential decision making

#artificialintelligenceMar-14-2021, 05:05:31 GMT

In artificial intelligence, sequential decision making refers to algorithms that take the dynamics[clarification needed] of the world into consideration,[1] thus delay parts of the problem until it must be solved[clarification needed]. It can be described as a procedural approach to decision-making, or as a step by step decision theory. Sequential decision making has as a consequence the intertemporal choice problem, where earlier decisions influences the later available choices.[2] This mathematics-related article is a stub. You can help Wikipedia by expanding it.

clarification, sequential decision

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Communications > Social Media (0.71)

Add feedback

Learning One Representation to Optimize All Rewards

Touati, Ahmed, Ollivier, Yann

arXiv.org Artificial IntelligenceMar-14-2021

We introduce the forward-backward (FB) representation of the dynamics of a reward-free Markov decision process. It provides explicit near-optimal policies for any reward specified a posteriori. During an unsupervised phase, we use reward-free interactions with the environment to learn two representations via off-the-shelf deep learning methods and temporal difference (TD) learning. In the test phase, a reward representation is estimated either from observations or an explicit reward description (e.g., a target state). The optimal policy for that reward is directly obtained from these representations, with no planning. The unsupervised FB loss is well-principled: if training is perfect, the policies obtained are provably optimal for any reward function. With imperfect training, the sub-optimality is proportional to the unsupervised approximation error. The FB representation learns long-range relationships between states and actions, via a predictive occupancy map, without having to synthesize states as in model-based approaches. This is a step towards learning controllable agents in arbitrary black-box stochastic environments. This approach compares well to goal-oriented RL algorithms on discrete and continuous mazes, pixel-based MsPacman, and the FetchReach virtual robot arm. We also illustrate how the agent can immediately adapt to new tasks beyond goal-oriented RL.

optimal policy, representation, reward function, (15 more...)

arXiv.org Artificial Intelligence

2103.07945

Country:

North America > United States > New York (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Modelling Behavioural Diversity for Learning in Open-Ended Games

Nieves, Nicolas Perez, Yang, Yaodong, Slumbers, Oliver, Mguni, David Henry, Wang, Jun

arXiv.org Artificial IntelligenceMar-14-2021

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.

cardinality, diversity, submission and formatting instruction, (12 more...)

arXiv.org Artificial Intelligence

2103.07927

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Simulation Studies on Deep Reinforcement Learning for Building Control with Human Interaction

Lee, Donghwan, He, Niao, Lee, Seungjae, Karava, Panagiota, Hu, Jianghai

arXiv.org Artificial IntelligenceMar-14-2021

The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings. Inspired by recent advances in reinforcement learning (RL), this paper aims at assessing the potential of RL in building climate control problems with occupant interaction. We apply a recent RL approach, called DDPG (deep deterministic policy gradient), for the continuous building control tasks and assess its performance with simulation studies in terms of its ability to handle (a) the partial state observability due to sensor limitations; (b) complex stochastic system with high-dimensional state-spaces, which are jointly continuous and discrete; (c) uncertainties due to ambient weather conditions, occupant's behavior, and comfort feelings. Especially, the partial observability and uncertainty due to the occupant interaction significantly complicate the control problem. Through simulation studies, the policy learned by DDPG demonstrates reasonable performance and computational tractability.

air temperature, algorithm, reinforcement, (13 more...)

arXiv.org Artificial Intelligence

2103.07919

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.14)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(6 more...)

Genre: Research Report (0.50)

Industry:

Construction & Engineering (0.95)
Energy > Renewable (0.48)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Investigating Value of Curriculum Reinforcement Learning in Autonomous Driving Under Diverse Road and Weather Conditions

Ozturk, Anil, Gunel, Mustafa Burak, Dagdanov, Resul, Vural, Mirac Ekim, Yurdakul, Ferhat, Dal, Melih, Ure, Nazim Kemal

arXiv.org Artificial IntelligenceMar-14-2021

Applications of reinforcement learning (RL) are popular in autonomous driving tasks. That being said, tuning the performance of an RL agent and guaranteeing the generalization performance across variety of different driving scenarios is still largely an open problem. In particular, getting good performance on complex road and weather conditions require exhaustive tuning and computation time. Curriculum RL, which focuses on solving simpler automation tasks in order to transfer knowledge to complex tasks, is attracting attention in RL community. The main contribution of this paper is a systematic study for investigating the value of curriculum reinforcement learning in autonomous driving applications. For this purpose, we setup several different driving scenarios in a realistic driving simulator, with varying road complexity and weather conditions. Next, we train and evaluate performance of RL agents on different sequences of task combinations and curricula. Results show that curriculum RL can yield significant gains in complex driving tasks, both in terms of driving performance and sample complexity. Results also demonstrate that different curricula might enable different benefits, which hints future research directions for automated curriculum training.

agent, curriculum, scenario, (14 more...)

arXiv.org Artificial Intelligence

2103.07903

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning, Bit by Bit

Lu, Xiuyuan, Van Roy, Benjamin, Dwaracherla, Vikranth, Ibrahimi, Morteza, Osband, Ian, Wen, Zheng

arXiv.org Artificial IntelligenceMar-14-2021

Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We develop concepts and establish a regret bound that together offer principled guidance. The bound sheds light on questions of what information to seek, how to seek that information, and what information to retain. To illustrate concepts, we design simple agents that build on them and present computational results that demonstrate improvements in data efficiency. Other learning paradigms are about minimization; reinforcement learning is about maximization.

agent, information, value function, (16 more...)

arXiv.org Artificial Intelligence

2103.04047

Country:

Europe > Kosovo > District of Gjilan > Kamenica (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report (0.81)

Industry:

Leisure & Entertainment > Games > Computer Games (0.92)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)

Add feedback

Image Recognition A.I. Has a Weakness. This Could Fix It

#artificialintelligenceMar-13-2021, 19:55:41 GMT

You're probably familiar with deepfakes, the digitally altered "synthetic media" that's capable of fooling people into seeing or hearing things that never actually happened. Adversarial examples are like deepfakes for image-recognition A.I. systems -- and while they don't look even slightly strange to us, they're capable of befuddling the heck out of machines. Several years ago, researchers at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL) found that they could fool even sophisticated image recognition algorithms into confusing objects simply by slightly altering their surface texture. In the researchers' demonstration, they showed that it was possible to get a cutting-edge neural network to look at a 3D-printed turtle and see a rifle instead. Or to gaze upon a baseball and come away with the conclusion that it is an espresso.

agent, algorithm, real world, (10 more...)

#artificialintelligence

Country: North America > United States > Massachusetts (0.25)

Industry:

Information Technology > Security & Privacy (0.72)
Automobiles & Trucks (0.71)
Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Hybrid computer approach to train a machine learning system

Holzer, Mirko, Ulmann, Bernd

arXiv.org Artificial IntelligenceMar-13-2021

This book chapter describes a novel approach to training machine learning systems by means of a hybrid computer setup i.e. a digital computer tightly coupled with an analog computer. As an example a reinforcement learning system is trained to balance an inverted pendulum which is simulated on an analog computer, thus demonstrating a solution to the major challenge of adequately simulating the environment for reinforcement learning.

analog computer, computer, ws-rv9x6 book title, (13 more...)

arXiv.org Artificial Intelligence

2103.07802

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment > Games > Chess (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

A Survey of Embodied AI: From Simulators to Research Tasks

Duan, Jiafei, Yu, Samson, Tan, Hui Li, Zhu, Hongyuan, Tan, Cheston

arXiv.org Artificial IntelligenceMar-13-2021

There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", whereby AI algorithms and agents no longer simply learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through embodied physical interactions with their environments, whether real or simulated. Consequently, there has been substantial growth in the demand for embodied AI simulators to support a diversity of embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of artificial general intelligence, but there is no contemporary and comprehensive survey of this field. This paper comprehensively surveys state-of-the-art embodied AI simulators and research, mapping connections between these. By benchmarking nine state-of-the-art embodied AI simulators in terms of seven features, this paper aims to understand the simulators in their provision for use in embodied AI research. Finally, based upon the simulators and a pyramidal hierarchy of embodied AI research tasks, this paper surveys the main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation and datasets.

agent, navigation, simulator, (15 more...)

arXiv.org Artificial Intelligence

2103.04918

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre:

Overview (1.00)
Research Report (0.83)

Industry:

Leisure & Entertainment > Games (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

The successor representation, gamma-models, and infinite-horizon prediction

AIHubMar-12-2021, 14:36:00 GMT

Reinforcement learning algorithms are frequently categorized by whether they predict future states at any point in their decision-making process. Those that do are called model-based, and those that do not are dubbed model-free. This classification is so common that we mostly take it for granted these days; I am guilty of using it myself. However, this distinction is not as clear-cut as it may initially seem. In this post, I will talk about an alternative view that emphases the mechanism of prediction instead of the content of prediction.

prediction, successor representation, value function, (14 more...)

AIHub

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback