AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

[Perspective] Why does time seem to fly when we're having fun?

ScienceDec-8-2016, 20:45:40 GMT

Animals use the neurotransmitter dopamine to encode the relationship between their responses and reward. Reinforcement learning theory (1) successfully explains the role of phasic bursts of dopamine in terms of future reward maximization. Yet, dopamine clearly plays other roles in shaping behavior that have no obvious relationship to reinforcement learning, including modulating the rate at which our subjective sense of time grows in real time. On page 1273 of this issue, Soares et al. (2) closely examine the role of dopamine in mice performing a task in which they keep track of the time between two events and make decisions about this temporal duration. The results suggest the need to reassess the leading theory of dopamine function in timing--the dopamine clock hypothesis (3). They may also help explain empirical phenomena that challenge the reinforcement learning account of dopamine function.

artificial intelligence, machine learning, reinforcement learning, (1 more...)

Science

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

Chen, Yichen, Wang, Mengdi

arXiv.org Machine LearningDec-7-2016

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity per iteration.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1612.02516

Country: North America > United States (0.67)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback

Model-based Adversarial Imitation Learning

Baram, Nir, Anschel, Oron, Mannor, Shie

arXiv.org Machine LearningDec-7-2016

Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle $D$ that discriminates between the expert's data distribution and that of the generative model $G$. The generative model is trained to capture the expert's distribution by maximizing the probability of $D$ misclassifying the data it generates. Overall, the system is \emph{differentiable} end-to-end and is trained using basic backpropagation. This type of learning was successfully applied to the problem of policy imitation in a model-free setup. However, a model-free approach does not allow the system to be differentiable, which requires the use of high-variance gradient estimations. In this paper we introduce the Model based Adversarial Imitation Learning (MAIL) algorithm. A model-based approach for the problem of adversarial imitation learning. We show how to use a forward model to make the system fully differentiable, which enables us to train policies using the (stochastic) gradient of $D$. Moreover, our approach requires relatively few environment interactions, and fewer hyper-parameters to tune. We test our method on the MuJoCo physics simulator and report initial results that surpass the current state-of-the-art.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1612.02179

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Elon Musk-backed OpenAI reveals Universe – a universal training ground for computers

#artificialintelligenceDec-5-2016, 22:55:33 GMT

Hoping to teach AI agents the common sense they need to solve arbitrary tasks without specific training, OpenAI on Monday will introduce Universe, a collection of virtualized video games, browser interfaces, and applications that serve as a training ground for code-based decision making. Universe is open-source middleware that supports Gym, the organization's toolkit for developing and evaluating reinforcement learning (RL) algorithms. RL is used to train software perform specific actions, such as playing a videogame or making a 3D model walk, under a framework that prioritizes actions through a reward scheme. Universe aims to accelerate the education of AI agents by broadening the number of available training resources. Previously, according to OpenAI, the largest RL resource consisted of 55 Atari games, the Atari Learning Environment.

large language model, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Country:

North America > United States > New York (0.06)
Europe > Sweden > Skåne County > Malmö (0.06)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)
(2 more...)

Add feedback

Tony Hadfield Ph.D. on Twitter

#artificialintelligenceDec-3-2016, 06:35:04 GMT

deep learning, reinforcement learning, tony hadfield ph, (3 more...)

#artificialintelligence

Industry: Information Technology (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.42)

Add feedback

Project Malmo: Enabling AI technology that can collaborate with humans - Microsoft Research

#artificialintelligenceDec-2-2016, 13:35:09 GMT

Project Malmo, a platform that uses the world of Minecraft as a testing ground for advanced artificial intelligence research and innovation, is available for novice to experienced programmers on GitHub via an open-source license. The system is primarily designed to help researchers develop sophisticated AI that can do things like learn, converse, make decisions and complete complex tasks. It supports research on a range of methods such as reinforcement learning, deep learning and symbolic AI, allowing researchers to compare and integrate different approaches to advance AI understanding, reasoning, learning and communications. Project Malmo is available at aka.ms/github-malmo

machine learning, project malmo, reinforcement learning, (5 more...)

#artificialintelligence

Country: Europe > Sweden > Skåne County > Malmö (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes

Killian, Taylor, Konidaris, George, Doshi-Velez, Finale

arXiv.org Machine LearningDec-1-2016

Due to physiological variation, patients diagnosed with the same condition may exhibit divergent, but related, responses to the same treatments. Hidden Parameter Markov Decision Processes (HiP-MDPs) tackle this transfer-learning problem by embedding these tasks into a low-dimensional space. However, the original formulation of HiP-MDP had a critical flaw: the embedding uncertainty was modelled independently of the agent's state uncertainty, requiring an unnatural training procedure in which all tasks visited every part of the state space--possible for robots that can be moved to a particular location, impossible for human patients. We update the HiP-MDP framework and extend it to more robustly develop personalized medicine strategies for HIV treatment.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

arXiv.org Machine Learning

1612.00475

Country: Europe > Spain (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Add feedback

Playing Doom with SLAM-Augmented Deep Reinforcement Learning

Bhatti, Shehroze, Desmaison, Alban, Miksik, Ondrej, Nardelli, Nantas, Siddharth, N., Torr, Philip H. S.

arXiv.org Machine LearningDec-1-2016

A number of recent approaches to policy learning in 2D game domains have been successful going directly from raw input images to actions. However when employed in complex 3D environments, they typically suffer from challenges related to partial observability, combinatorial exploration spaces, path planning, and a scarcity of rewarding scenarios. Inspired from prior work in human cognition that indicates how humans employ a variety of semantic concepts and abstractions (object categories, localisation, etc.) to reason about the world, we build an agent-model that incorporates such abstractions into its policy-learning framework. We augment the raw image input to a Deep Q-Learning Network (DQN), by adding details of objects and structural elements encountered, along with the agent's localisation. The different components are automatically extracted and composed into a topological representation using on-the-fly object detection and 3D-scene reconstruction. We evaluate the efficacy of our approach in "Doom", a 3D first-person combat game that exhibits a number of challenges discussed, and show that our augmented framework consistently learns better, more effective policies.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1612.0038

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.68)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

Jiang, Nan, Krishnamurthy, Akshay, Agarwal, Alekh, Langford, John, Schapire, Robert E.

arXiv.org Machine LearningDec-1-2016

We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings. Our second contribution is a new reinforcement learning algorithm that engages in systematic exploration to learn contextual decision processes with low Bellman rank. Our algorithm provably learns near-optimal behavior with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for reinforcement learning with function approximation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1610.09512

Country: North America > United States > Massachusetts (0.27)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

Tessler, Chen, Givony, Shahar, Zahavy, Tom, Mankowitz, Daniel J., Mannor, Shie

arXiv.org Artificial IntelligenceNov-30-2016

We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledgebase. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as Deep Skill Networks, are then incorporated into our novel Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using two techniques: (1) a deep skill array and (2) skill distillation, our novel variation of policy distillation (Rusu et al. 2015) for learning skills. Skill distillation enables the H-DRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network. The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et al. 2015) in sub-domains of Minecraft.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1604.07255

Genre:

Instructional Material (0.79)
Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games > Computer Games (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback