AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Real World Games Look Like Spinning Tops

Czarnecki, Wojciech Marian, Gidel, Gauthier, Tracey, Brendan, Tuyls, Karl, Omidshafiei, Shayegan, Balduzzi, David, Jaderberg, Max

arXiv.org Machine LearningApr-20-2020

This paper investigates the geometrical properties of real world games (e.g. Tic-Tac-Toe, Go, StarCraft II). We hypothesise that their geometrical structure resemble a spinning top, with the upright axis representing transitive strength, and the radial axis, which corresponds to the number of cycles that exist at a particular transitive strength, representing the non-transitive dimension. We prove the existence of this geometry for a wide class of real world games, exposing their temporal nature. Additionally, we show that this unique structure also has consequences for learning - it clarifies why populations of strategies are necessary for training of agents, and how population size relates to the structure of the game. Finally, we empirically validate these claims by using a selection of nine real world two-player zero-sum symmetric games, showing 1) the spinning top structure is revealed and can be easily re-constructed by using a new method of Nash clustering to measure the interaction between transitive and cyclical strategy behaviour, and 2) the effect that population size has on the convergence in these games.

geometry, pure strategy, real world game, (15 more...)

arXiv.org Machine Learning

2004.09468

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.49)
Leisure & Entertainment > Games > Chess (0.46)
Leisure & Entertainment > Games > Tic-Tac-Toe (0.35)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (0.93)

Add feedback

Reinforcement Learning - AI Experts Explain

#artificialintelligenceApr-19-2020, 06:50:25 GMT

We put together a directory of over 100 resources which you can see here.

agent, learning, reinforcement learning, (15 more...)

#artificialintelligence

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
North America > United States > New York (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry:

Information Technology (0.96)
Health & Medicine (0.69)
Leisure & Entertainment > Games > Computer Games (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Intention Propagation for Multi-agent Reinforcement Learning

Qu, Chao, Li, Hui, Liu, Chang, Xiong, Junwu, Zhang, James, Chu, Wei, Qi, Yuan, Song, Le

arXiv.org Machine LearningApr-19-2020

Collaborative multi-agent reinforcement learning is an important sub-field of the multiagent reinforcement learning (MARL), where the agents learn to coordinate to achieve joint success. It has wide applications in traffic control [Kuyer et al., 2008], autonomous driving [Shalev-Shwartz et al., 2016] and smart grid [Yang et al., 2018]. To learn a coordination, the interactions between agents are indispensable. For instance, humans can reason about other's behaviors or know other peoples' intentions through communication and then determine an effective coordination plan. However, how to design a mechanism of such interaction in a principled way and at the same time solve the large scale real-world applications is still a challenging problem. Recently, there is a surge of interest in solving the collaborative MARL problem [Foerster et al., 2018, Qu et al., 2019, Lowe et al., 2017]. Among them, joint policy approaches have demonstrated their superiority [Rashid et al., 2018, Sunehag et al., 2018, Oliehoek et al., 2016]. A straightforward approach is to replace the action in the single-agent reinforcement learning by the joint action a (a 1, a 2,..., a N), while it obviously suffers from the issue of the exponentially large action space.

agent, algorithm, average reward, (15 more...)

arXiv.org Machine Learning

2004.08883

Country:

North America > United States (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.67)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)

Add feedback

Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagination

Hafez, Muhammad Burhan, Weber, Cornelius, Kerzel, Matthias, Wermter, Stefan

arXiv.org Machine LearningApr-19-2020

Combining model-based and model-free learning systems has been shown to improve the sample efficiency of learning to perform complex robotic tasks. However, dual-system approaches fail to consider the reliability of the learned model when it is applied to make multiple-step predictions, resulting in a compounding of prediction errors and performance degradation. In this paper, we present a novel dual-system motor learning approach where a meta-controller arbitrates online between model-based and model-free decisions based on an estimate of the local reliability of the learned model. The reliability estimate is used in computing an intrinsic feedback signal, encouraging actions that lead to data that improves the model. Our approach also integrates arbitration with imagination where a learned latent-space model generates imagined experiences, based on its local reliability, to be used as additional training data. We evaluate our approach against baseline and state-of-the-art methods on learning vision-based robotic grasping in simulation and real world. The results show that our approach outperforms the compared methods and learns near-optimal grasping policies in dense- and sparse-reward environments.

learning, reliability, world model, (17 more...)

arXiv.org Machine Learning

2004.0883

Country:

North America > United States > Massachusetts (0.04)
Europe > Germany > Hamburg (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Language may help AI navigate new environments

#artificialintelligenceApr-18-2020, 17:04:21 GMT

In a new study published this week on the preprint server Arxiv.org, Both it and several baseline models will soon be available on GitHub. One of the most powerful techniques in machine learning -- reinforcement learning, which entails spurring software agents toward goals via rewards -- is also one of the most flawed. It's sample inefficient, meaning it requires a large number of compute cycles to complete, and without additional data to cover variations, it adapts poorly to environments that differ from the training environment. It's theorized that prior knowledge of tasks through structured language could be combined with reinforcement learning to mitigate its shortcomings, and BabyAI was designed to put this theory to the test.

agent, babyai, help ai navigate new environment, (5 more...)

#artificialintelligence

AI-Alerts: 2020 > 2020-04 > AAAI AI-Alert for Apr 21, 2020 (1.00)

Country: North America > Canada > Ontario > Toronto (0.17)

Genre: Research Report (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Xiao, Yuchen, Hoffman, Joshua, Amato, Christopher

arXiv.org Artificial IntelligenceApr-18-2020

In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.

agent, robot, turtlebot, (14 more...)

arXiv.org Artificial Intelligence

2004.08646

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Modeling Survival in model-based Reinforcement Learning

Moazami, Saeed, Doerschuk, Peggy

arXiv.org Machine LearningApr-18-2020

Although recent model-free reinforcement learning algorithms have been shown to be capable of mastering complicated decision-making tasks, the sample complexity of these methods has remained a hurdle to utilizing them in many real-world applications. In this regard, model-based reinforcement learning proposes some remedies. Yet, inherently, model-based methods are more computationally expensive and susceptible to sub-optimality. One reason is that model-generated data are always less accurate than real data, and this often leads to inaccurate transition and reward function models. With the aim to mitigate this problem, this work presents the notion of survival by discussing cases in which the agent's goal is to survive and its analogy to maximizing the expected rewards. To that end, a substitute model for the reward function approximator is introduced that learns to avoid terminal states rather than to maximize accumulated rewards from safe states. Focusing on terminal states, as a small fraction of state-space, reduces the training effort drastically. Next, a model-based reinforcement learning method is proposed (Survive) to train an agent to avoid dangerous states through a safety map model built upon temporal credit assignment in the vicinity of terminal states. Finally, the performance of the presented algorithm is investigated, along with a comparison between the proposed and current methods.

agent, model-based reinforcement, reinforcement, (15 more...)

arXiv.org Machine Learning

2004.08648

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Time Adaptive Reinforcement Learning

Reinke, Chris

arXiv.org Machine LearningApr-18-2020

Reinforcement learning (RL) allows to solve complex tasks such as Go often with a stronger performance than humans. However, the learned behaviors are usually fixed to specific tasks and unable to adapt to different contexts. Here we consider the case of adapting RL agents to different time restrictions, such as finishing a task with a given time limit that might change from one task execution to the next. We define such problems as Time Adaptive Markov Decision Processes and introduce two model-free, value-based algorithms: the Independent Gamma-Ensemble and the n-Step Ensemble. In difference to classical approaches, they allow a zero-shot adaptation between different time restrictions. The proposed approaches represent general mechanisms to handle time adaptive tasks making them compatible with many existing RL methods, algorithms, and scenarios.

module, objective, terminal state, (16 more...)

arXiv.org Machine Learning

2004.086

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Vietnam > Long An Province (0.04)

Genre: Research Report (0.50)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

What Is Deep Reinforcement Learning?

#artificialintelligenceApr-17-2020, 14:17:50 GMT

Along with unsupervised machine learning and supervised learning, another common form of AI creation is reinforcement learning. Beyond regular reinforcement learning, deep reinforcement learning can lead to astonishingly impressive results, thanks to the fact that it combines the best aspects of both deep learning and reinforcement learning. Let's take a look at precisely how deep reinforcement learning operates. Note that this article won't delve too deeply into the formulas used in deep reinforcement learning, rather it aims to give the reader a high level intution for how the process works. Before we dive into deep reinforcement learning, it might be a good idea to refresh ourselves on how regular reinforcement learning works.

q-value, reinforcement, reinforcement learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Deep Reinforcement Learning for Adaptive Learning Systems

Li, Xiao, Xu, Hanchen, Zhang, Jinming, Chang, Hua-hua

arXiv.org Machine LearningApr-17-2020

In this paper, we formulate the adaptive learning problem---the problem of how to find an individualized learning plan (called policy) that chooses the most appropriate learning materials based on learner's latent traits---faced in adaptive learning systems as a Markov decision process (MDP). We assume latent traits to be continuous with an unknown transition model. We apply a model-free deep reinforcement learning algorithm---the deep Q-learning algorithm---that can effectively find the optimal learning policy from data on learners' learning process without knowing the actual transition model of the learners' continuous latent traits. To efficiently utilize available data, we also develop a transition model estimator that emulates the learner's learning process using neural networks. The transition model estimator can be used in the deep Q-learning algorithm so that it can more efficiently discover the optimal learning policy for a learner. Numerical simulation studies verify that the proposed algorithm is very efficient in finding a good learning policy, especially with the aid of a transition model estimator, it can find the optimal learning policy after training using a small number of learners.

learner, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2004.0841

Country:

North America > United States > Illinois (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.93)
Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback