AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Deep reinforcement learning for de novo drug design

#artificialintelligenceJul-26-2018, 19:31:56 GMT

We have devised and implemented a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). On the basis of deep and reinforcement learning (RL) approaches, ReLeaSE integrates two deep neural networks--generative and predictive--that are trained separately but are used jointly to generate novel targeted chemical libraries. ReLeaSE uses simple representation of molecules by their simplified molecular-input line-entry system (SMILES) strings only. Generative models are trained with a stack-augmented memory network to produce chemically feasible SMILES strings, and predictive models are derived to forecast the desired properties of the de novo–generated compounds. In the first phase of the method, generative and predictive models are trained separately with a supervised learning algorithm. In the second phase, both models are trained jointly with the RL approach to bias the generation of new ...

artificial intelligence, machine learning, reinforcement learning, (21 more...)

#artificialintelligence

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Estimating scale-invariant future in continuous time

Tiganj, Zoran, Gershman, Samuel J., Sederberg, Per B., Howard, Marc W.

arXiv.org Artificial IntelligenceJul-26-2018

Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Critically, the learner cannot in general know a priori the relevant time scale over which meaningful relationships will be observed. Widely used reinforcement learning algorithms discretize continuous time and use the Bellman equation to estimate exponentially-discounted future reward. However, exponential discounting introduces a time scale to the computation of value. Scaling is a serious problem in continuous time: efficient learning with scaled algorithms requires prior knowledge of the relevant scale. That is, with scaled algorithms one must know at least part of the solution to a problem prior to attempting a solution. We present a computational mechanism, developed based on work in psychology and neuroscience, for computing a scale-invariant timeline of future events. This mechanism efficiently computes a model for future time on a logarithmically-compressed scale, and can be used to generate a scale-invariant power-law-discounted estimate of expected future reward. Moreover, the representation of future time retains information about what will happen when, enabling flexible decision making based on future events. The entire timeline can be constructed in a single parallel operation.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1802.06426

Country:

North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

ToriLLE: Learning Environment for Hand-to-Hand Combat

Kanervisto, Anssi, Hautamäki, Ville

arXiv.org Artificial IntelligenceJul-26-2018

Toribash is a MuJoCo-like environment of two humanoid character fighting each other hand-to-hand, controlled by changing states of body joints. Competitive nature of Toribash lends itself to two-agent experiments, and active player-base can be used for human baselines. This white paper describes the environment with its pros, cons and limitations as well experimentally show ToriLLE's applicability as a learning environment by successfully training reinforcement learning agents that improved over time. The code is available at https: //github.com/Miffyli/ToriLLE.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1807.1011

Country:

Europe > Finland > North Karelia > Joensuu (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

Variational Option Discovery Algorithms

Achiam, Joshua, Edwards, Harrison, Amodei, Dario, Abbeel, Pieter

arXiv.org Artificial IntelligenceJul-26-2018

We explore methods for option discovery based on variational inference and make two algorithmic contributions. First: we highlight a tight connection between variational option discovery methods and variational autoencoders, and introduce Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection. In VALOR, the policy encodes contexts from a noise distribution into trajectories, and the decoder recovers the contexts from the complete trajectories. Second: we propose a curriculum learning approach where the number of contexts seen by the agent increases whenever the agent's performance is strong enough (as measured by the decoder) on the current set of contexts. We show that this simple trick stabilizes training for VALOR and prior variational option discovery methods, allowing a single agent to learn many more modes of behavior than it could with a fixed context distribution. Finally, we investigate other topics related to variational option discovery, including fundamental limitations of the general approach and the applicability of learned options to downstream tasks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1807.10299

Country: North America > United States > Massachusetts (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning

Cruz, Francisco, Parisi, German I., Wermter, Stefan

arXiv.org Artificial IntelligenceJul-26-2018

Interactive reinforcement learning (IRL) extends traditional reinforcement learning (RL) by allowing an agent to interact with parent-like trainers during a task. In this paper, we present an IRL approach using dynamic audio-visual input in terms of vocal commands and hand gestures as feedback. Our architecture integrates multi-modal information to provide robust commands from multiple sensory cues along with a confidence value indicating the trustworthiness of the feedback. The integration process also considers the case in which the two modalities convey incongruent information. Additionally, we modulate the influence of sensory-driven feedback in the IRL task using goal-oriented knowledge in terms of contextual affordances. We implement a neural network architecture to predict the effect of performed actions with different objects to avoid failed-states, i.e., states from which it is not possible to accomplish the task. In our experimental setup, we explore the interplay of multimodal feedback and task-specific affordances in a robot cleaning scenario. We compare the learning performance of the agent under four different conditions: traditional RL, multi-modal IRL, and each of these two setups with the use of contextual affordances. Our experiments show that the best performance is obtained by using audio-visual feedback with affordancemodulated IRL. The obtained results demonstrate the importance of multi-modal sensory processing integrated with goal-oriented knowledge in IRL tasks.

affordance, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1807.09991

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Germany > Hamburg (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

Variational Bayesian Reinforcement Learning with Regret Bounds

O'Donoghue, Brendan

arXiv.org Artificial IntelligenceJul-25-2018

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

agent, bayesian inference, upstream oil & gas, (18 more...)

arXiv.org Artificial Intelligence

1807.09647

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Multi-Agent Generative Adversarial Imitation Learning

Song, Jiaming, Ren, Hongyu, Sadigh, Dorsa, Ermon, Stefano

arXiv.org Artificial IntelligenceJul-25-2018

Imitation learning algorithms can be used to learn a policy from expert demonstrations without access to a reward signal. However, most existing approaches are not applicable in multi-agent settings due to the existence of multiple (Nash) equilibria and non-stationary environments. We propose a new framework for multi-agent imitation learning for general Markov games, where we build upon a generalized notion of inverse reinforcement learning. We further introduce a practical multi-agent actor-critic algorithm with good empirical performance. Our method can be used to imitate complex behaviors in high-dimensional environments with multiple cooperative or competing agents.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1807.09936

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

Kapoor, Sanyam

arXiv.org Artificial IntelligenceJul-24-2018

Reinforcement Learning (RL) is a learning paradigm concerned with learning to control a system so as to maximize an objective over the long term. This approach to learning has received immense interest in recent times and success manifests itself in the form of human-level performance on games like \textit{Go}. While RL is emerging as a practical component in real-life systems, most successes have been in Single Agent domains. This report will instead specifically focus on challenges that are unique to Multi-Agent Systems interacting in mixed cooperative and competitive environments. The report concludes with advances in the paradigm of training Multi-Agent Systems called \textit{Decentralized Actor, Centralized Critic}, based on an extension of MDPs called \textit{Decentralized Partially Observable MDP}s, which has seen a renewed interest lately.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1807.09427

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Temporal Difference Reinforcement Learning Theory of Emotion: unifying emotion, cognition and adaptive behavior

Broekens, Joost

arXiv.org Artificial IntelligenceJul-24-2018

Emotions are intimately tied to motivation and the adaptation of behavior, and many animal species show evidence of emotions in their behavior. Therefore, emotions must be related to powerful mechanisms that aid survival, and, emotions must be evolutionary continuous phenomena. How and why did emotions evolve in nature, how do events get emotionally appraised, how do emotions relate to cognitive complexity, and, how do they impact behavior and learning? In this article I propose that all emotions are manifestations of reward processing, in particular Temporal Difference (TD) error assessment. Reinforcement Learning (RL) is a powerful computational model for the learning of goal oriented tasks by exploration and feedback. Evidence indicates that RL-like processes exist in many animal species. Key in the processing of feedback in RL is the notion of TD error, the assessment of how much better or worse a situation just became, compared to what was previously expected (or, the estimated gain or loss of utility - or well-being - resulting from new evidence). I propose a TDRL Theory of Emotion and discuss its ramifications for our understanding of emotions in humans, animals and machines, and present psychological, neurobiological and computational evidence in its support.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1807.08941

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Using Deep Q-Learning in FIFA 18 to perfect the art of free-kicks

#artificialintelligenceJul-23-2018, 02:58:35 GMT

A code tutorial in Tensorflow that uses Reinforcement Learning to take free kicks. In my previous article, I presented an AI bot trained to play the game of FIFA using Supervised Learning technique. However, the training data required to improve it further quickly became cumbersome to gather and provided little-to-no improvements, making this approach very time consuming. For this sake, I decided to switch to Reinforcement Learning, as suggested by almost everyone who commented on that article! In this article, I'll provide a short description of what Reinforcement Learning is and how I applied it to this game.

machine learning, reinforcement learning, training data, (16 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback