AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Computational Theories of Curiosity-Driven Learning

Oudeyer, Pierre-Yves

arXiv.org Artificial IntelligenceJun-18-2018

What are the functions of curiosity? What are the mechanisms of curiosity-driven learning? We approach these questions about the living using concepts and tools from machine learning and developmental robotics. We argue that curiosity-driven learning enables organisms to make discoveries to solve complex problems with rare or deceptive rewards. By fostering exploration and discovery of a diversity of behavioural skills, and ignoring these rewards, curiosity can be efficient to bootstrap learning when there is no information, or deceptive information, about local improvement towards these problems. We also explain the key role of curiosity for efficient learning of world models. We review both normative and heuristic computational frameworks used to understand the mechanisms of curiosity in humans, conceptualizing the child as a sense-making organism. These frameworks enable us to discuss the bi-directional causal links between curiosity and learning, and to provide new hypotheses about the fundamental role of curiosity in self-organizing developmental structures through curriculum learning. We present various developmental robotics experiments that study these mechanisms in action, both supporting these hypotheses to understand better curiosity in humans and opening new research avenues in machine learning and artificial intelligence. Finally, we discuss challenges for the design of experimental paradigms for studying curiosity in psychology and cognitive neuroscience. Keywords: Curiosity, intrinsic motivation, lifelong learning, predictions, world model, rewards, free-energy principle, learning progress, machine learning, AI, developmental robotics, development, curriculum learning, self-organization.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1802.10546

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Bergen County > Mahwah (0.04)
North America > United States > Nebraska (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.55)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Task-Relevant Object Discovery and Categorization for Playing First-person Shooter Games

Liang, Junchi, Boularias, Abdeslam

arXiv.org Machine LearningJun-17-2018

We consider the problem of learning to play first-person shooter (FPS) video games using raw screen images as observations and keyboard inputs as actions. The high-dimensionality of the observations in this type of applications leads to prohibitive needs of training data for model-free methods, such as the deep Q-network (DQN), and its recurrent variant DRQN. Thus, recent works focused on learning low-dimensional representations that may reduce the need for data. This paper presents a new and efficient method for learning such representations. Salient segments of consecutive frames are detected from their optical flow, and clustered based on their feature descriptors. The clusters typically correspond to different discovered categories of objects. Segments detected in new frames are then classified based on their nearest clusters. Because only a few categories are relevant to a given task, the importance of a category is defined as the correlation between its occurrence and the agent's performance. The result is encoded as a vector indicating objects that are in the frame and their locations, and used as a side input to DRQN. Experiments on the game Doom provide a good evidence for the benefit of this approach.

category, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1806.06392

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
(2 more...)

Add feedback

DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

Peng, Xue Bin, Abbeel, Pieter, Levine, Sergey, van de Panne, Michiel

arXiv.org Artificial IntelligenceJun-16-2018

A longstanding goal in character animation is to combine data-driven specification of behavior with a system that can execute a similar behavior in a physical simulation, thus enabling realistic responses to perturbations and environmental variation. We show that well-known reinforcement learning (RL) methods can be adapted to learn robust control policies capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals. Our method handles keyframed motions, highly-dynamic actions such as motion-captured flips and spins, and retargeted motions. By combining a motion-imitation objective with a task objective, we can train characters that react intelligently in interactive settings, e.g., by walking in a desired direction or throwing a ball at a user-specified target. This approach thus combines the convenience and motion quality of using motion clips to define the desired style and appearance, with the flexibility and generality afforded by RL methods and physics-based animation. We further explore a number of methods for integrating multiple clips into the learning process to develop multi-skilled agents capable of performing a rich repertoire of diverse skills. We demonstrate results using multiple characters (human, Atlas robot, bipedal dinosaur, dragon) and a large variety of skills, including locomotion, acrobatics, and martial arts.

machine learning, reference motion, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1804.02717

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Policy Gradients in a Nutshell – Towards Data Science

#artificialintelligenceJun-15-2018, 13:21:01 GMT

Reinforcement Learning (RL) refers to both the learning problem and the sub-field of machine learning which has lately been in the news for great reasons. RL based systems have now beaten world champions of Go, helped operate datacenters better and mastered a wide variety of Atari games. The research community is seeing many more promising results. With enough motivation, let us now take a look at the Reinforcement Learning problem. Reinforcement Learning is the most general description of the learning problem where the aim is to maximize a long-term objective.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Industry:

Education > Focused Education > Special Education (0.67)
Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method

Joseph, Ajin George, Bhatnagar, Shalabh

arXiv.org Machine LearningJun-15-2018

In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy (CE) optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1806.0672

Country:

North America > United States > New York (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)

Add feedback

On Machine Learning and Structure for Mobile Robots

Wulfmeier, Markus

arXiv.org Artificial IntelligenceJun-15-2018

Due to recent advances - compute, data, models - the role of learning in autonomous systems has expanded significantly, rendering new applications possible for the first time. While some of the most significant benefits are obtained in the perception modules of the software stack, other aspects continue to rely on known manual procedures based on prior knowledge on geometry, dynamics, kinematics etc. Nonetheless, learning gains relevance in these modules when data collection and curation become easier than manual rule design. Building on this coarse and broad survey of current research, the final sections aim to provide insights into future potentials and challenges as well as the necessity of structure in current practical applications.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1806.06003

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
(2 more...)

Genre: Overview (1.00)

Industry:

Automobiles & Trucks (0.93)
Transportation > Ground > Road (0.68)
Information Technology > Robotics & Automation (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Improving width-based planning with compact policies

Junyent, Miquel, Jonsson, Anders, Gómez, Vicenç

arXiv.org Artificial IntelligenceJun-15-2018

Optimal action selection in decision problems characterized by sparse, delayed rewards is still an open challenge. For these problems, current deep reinforcement learning methods require enormous amounts of data to learn controllers that reach human-level performance. In this work, we propose a method that interleaves planning and learning to address this issue. The planning step hinges on the Iterated-Width (IW) planner, a state of the art planner that makes explicit use of the state representation to perform structured exploration. IW is able to scale up to problems independently of the size of the state space. From the state-actions visited by IW, the learning step estimates a compact policy, which in turn is used to guide the planning step. The type of exploration used by our method is radically different than the standard random exploration used in RL. We evaluate our method in simple problems where we show it to have superior performance than the state-of-the-art reinforcement learning algorithms A2C and Alpha Zero. Finally, we present preliminary results in a subset of the Atari games suite.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1806.05898

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Q-Learning from Scratch in Python with OpenAI Gym – LearnDataSci

#artificialintelligenceJun-14-2018, 09:41:29 GMT

Essentially, Q-learning lets the agent use the environment's rewards to learn, over time, the best action to take in a given state. In our Taxi environment, we have the reward table, P, that the agent will learn from. It does thing by looking receiving a reward for taking an action in the current state, then updating a Q-value to remember if that action was beneficial. The values store in the Q-table are called a Q-values, and they map to a (state, action) combination. A Q-value for a particular state-action combination is representative of the "quality" of an action taken from that state.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

Stochastic Variance-Reduced Policy Gradient

Papini, Matteo, Binaghi, Damiano, Canonaco, Giuseppe, Pirotta, Matteo, Restelli, Marcello

arXiv.org Machine LearningJun-14-2018

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective func- tion; II) approximations in the full gradient com- putation; and III) a non-stationary sampling pro- cess. The result is SVRPG, a stochastic variance- reduced policy gradient algorithm that leverages on importance weights to preserve the unbiased- ness of the gradient estimate. Under standard as- sumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

machine learning, reinforcement learning, variance, (16 more...)

arXiv.org Machine Learning

1806.05618

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.63)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Implicit Quantile Networks for Distributional Reinforcement Learning

Dabney, Will, Ostrovski, Georg, Silver, David, Munos, Rémi

arXiv.org Artificial IntelligenceJun-14-2018

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1806.06923

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback