AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Non-Markovian Control with Gated End-to-End Memory Policy Networks

Perez, Julien, Silander, Tomi

arXiv.org Machine LearningMay-31-2017

Partially observable environments present an important open challenge in the domain of sequential control learning with delayed rewards. Despite numerous attempts during the two last decades, the majority of reinforcement learning algorithms and associated approximate models, applied to this context, still assume Markovian state transitions. In this paper, we explore the use of a recently proposed attention-based model, the Gated End-to-End Memory Network, for sequential control. We call the resulting model the Gated End-to-End Memory Policy Network. More precisely, we use a model-free value-based algorithm to learn policies for partially observed domains using this memory-enhanced neural network. This model is end-to-end learnable and it features unbounded memory. Indeed, because of its attention mechanism and associated non-parametric memory, the proposed model allows us to define an attention mechanism over the observation stream unlike recurrent models. We show encouraging results that illustrate the capability of our attention-based model in the context of the continuous-state non-stationary control problem of stock trading. We also present an OpenAI Gym environment for simulated stock exchange and explain its relevance as a benchmark for the field of non-Markovian decision process learning.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1705.10993

Country:

Europe (0.68)
North America > United States > New York (0.17)
North America > Canada > Quebec (0.15)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (1.00)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Experience Replay Using Transition Sequences

Karimpanal, Thommen George, Bouffanais, Roland

arXiv.org Artificial IntelligenceMay-30-2017

Experience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-policy setting. In addition to selecting appropriate sequences, we also artificially construct transition sequences using information gathered from previous agent-environment interactions. These sequences, when replayed, allow value function information to trickle down to larger sections of the state/state-action space, thereby making the most of the agent's experience. We demonstrate our approach on modified versions of standard reinforcement learning tasks such as the mountain car and puddle world problems and empirically show that it enables better learning of value functions as compared to other forms of experience replay. Further, we briefly discuss some of the possible extensions to this work, as well as applications and situations where this approach could be particularly useful.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.3389/fnbot.2018.00032

1705.10834

Country: North America > Canada (0.68)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Artificial Intelligence will Speak Its Own Language Soon

#artificialintelligenceMay-29-2017, 04:09:24 GMT

The article is about a system that invents a language which is tied to perception of the world. In sum, the post reveals possibilities that might be opened via researches related to an artificial language. At least the language will be similar to a signal language typical for animals. Further languages will be evolved into more complex technologies. There is no such thing as an evolution of languages. There is an evolution of the ability to use languages.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Latent Intention Dialogue Models

Wen, Tsung-Hsien, Miao, Yishu, Blunsom, Phil, Young, Steve

arXiv.org Machine LearningMay-29-2017

Developing a dialogue agent that is capable of making autonomous decisions and communicating by natural language is one of the long-term goals of machine learning research. Traditional approaches either rely on hand-crafting a small state-action set for applying reinforcement learning that is not scalable or constructing deterministic models for learning dialogue sentences that fail to capture natural conversational variability. In this paper, we propose a Latent Intention Dialogue Model (LIDM) that employs a discrete latent variable to learn underlying dialogue intentions in the framework of neural variational inference. In a goal-oriented dialogue scenario, these latent intentions can be interpreted as actions guiding the generation of machine responses, which can be further refined autonomously by reinforcement learning. The experimental evaluation of LIDM shows that the model out-performs published benchmarks for both corpus-based and human evaluation, demonstrating the effectiveness of discrete latent variable models for learning goal-oriented dialogues.

intention, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1705.10229

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Curious AI learns by exploring game worlds and making mistakes

New ScientistMay-26-2017, 18:40:03 GMT

I wonder what will happen if I press this button? Algorithms armed with a sense of curiosity are teaching themselves to discover and solve problems they've never encountered before. Faced with level one of Super Mario Bros, a curiosity-driven AI learned how to explore, avoid pits, and dodge and kill enemies. This might not sound impressive – algorithms have been thrashing humans at video games for a few years now – but this AI's skills all were all learned thanks to an inbuilt desire to discover more about the game world. Conventional AI algorithms are taught through positive reinforcement.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

New Scientist

Country: North America > United States > California > Alameda County > Berkeley (0.05)

Industry: Leisure & Entertainment > Games > Computer Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)
Information Technology > Artificial Intelligence > Games > Go (0.33)
Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.32)

Add feedback

Personalizing a Dialogue System with Transfer Reinforcement Learning

Mo, Kaixiang, Li, Shuangyin, Zhang, Yu, Li, Jiajun, Yang, Qiang

arXiv.org Artificial IntelligenceMay-26-2017

It is difficult to train a personalized task-oriented dialogue system because the data collected from each individual is often insufficient. Personalized dialogue systems trained on a small dataset can overfit and make it difficult to adapt to different user needs. One way to solve this problem is to consider a collection of multiple users' data as a source domain and an individual user's data as a target domain, and to perform a transfer learning from the source to the target domain. By following this idea, we propose "PETAL"(PErsonalized Task-oriented diALogue), a transfer-learning framework based on POMDP to learn a personalized dialogue system. The system first learns common dialogue knowledge from the source domain and then adapts this knowledge to the target user. This framework can avoid the negative transfer problem by considering differences between source and target users. The policy in the personalized POMDP can learn to choose different actions appropriately for different users. Experimental results on a real-world coffee-shopping data and simulation data show that our personalized dialogue system can choose different optimal actions for different users, and thus effectively improve the dialogue quality under the personalized setting.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1610.02891

Country: Asia > China (0.46)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Add feedback

State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

Sahni, Himanshu, Kumar, Saurabh, Tejani, Farhan, Schroecker, Yannick, Isbell, Charles

arXiv.org Machine LearningMay-24-2017

Typical reinforcement learning (RL) agents learn to complete tasks specified by reward functions tailored to their domain. As such, the policies they learn do not generalize even to similar domains. To address this issue, we develop a framework through which a deep RL agent learns to generalize policies from smaller, simpler domains to more complex ones using a recurrent attention mechanism. The task is presented to the agent as an image and an instruction specifying the goal. This meta-controller guides the agent towards its goal by designing a sequence of smaller subtasks on the part of the state space within the attention, effectively decomposing it. As a baseline, we consider a setup without attention as well. Our experiments show that the meta-controller learns to create subgoals within the attention.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1705.08997

Country: North America > United States (0.30)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

aslanides/aixijs

#artificialintelligenceMay-23-2017, 08:47:02 GMT

AIXIjs is a JavaScript demo for running General Reinforcement Learning (RL) agents in the browser. In particular, it provides a general and extensible framework for running experiments on Bayesian RL agents in general (partially observable, non-Markov, non-ergodic) environments. UPDATE (May 2017): I'll be presenting a conference paper containing a literature survey along with some experiments based on AIXIjs at IJCAI 2017, in Melbourne, Australia. The paper (to appear) is: J. S. Aslanides, Jan Leike, and Marcus Hutter. See the main site for more background, documentation, references, and demos.

experiment, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Country: Oceania > Australia > Victoria > Melbourne (0.27)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A unified view of entropy-regularized Markov decision processes

Neu, Gergely, Jonsson, Anders, Gómez, Vicenç

arXiv.org Machine LearningMay-22-2017

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1705.07798

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning

Mhamdi, El Mahdi El, Guerraoui, Rachid, Hendrikx, Hadrien, Maurer, Alexandre

arXiv.org Machine LearningMay-22-2017

In reinforcement learning, agents learn by performing actions and observing their outcomes. Sometimes, it is desirable for a human operator to \textit{interrupt} an agent in order to prevent dangerous situations from happening. Yet, as part of their learning process, agents may link these interruptions, that impact their reward, to specific states and deliberately avoid them. The situation is particularly challenging in a multi-agent context because agents might not only learn from their own past interruptions, but also from those of other agents. Orseau and Armstrong defined \emph{safe interruptibility} for one learner, but their work does not naturally extend to multi-agent systems. This paper introduces \textit{dynamic safe interruptibility}, an alternative definition more suited to decentralized learning problems, and studies this notion in two learning frameworks: \textit{joint action learners} and \textit{independent learners}. We give realistic sufficient conditions on the learning algorithm to enable dynamic safe interruptibility in the case of joint action learners, yet show that these conditions are not sufficient for independent learners. We show however that if agents can detect interruptions, it is possible to prune the observations to ensure dynamic safe interruptibility even for independent learners.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1704.02882

Genre: Research Report (0.40)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback