AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

RUDDER: Return Decomposition for Delayed Rewards

Arjona-Medina, Jose A., Gillhofer, Michael, Widrich, Michael, Unterthiner, Thomas, Hochreiter, Sepp

arXiv.org Artificial IntelligenceJun-20-2018

We propose a novel reinforcement learning approach for finite Markov decision processes (MDPs) with delayed rewards. In this work, biases of temporal difference (TD) estimates are proved to be corrected only exponentially slowly in the number of delay steps. Furthermore, variances of Monte Carlo (MC) estimates are proved to increase the variance of other estimates, the number of which can exponentially grow in the number of delay steps. We introduce RUDDER, a return decomposition method, which creates a new MDP with same optimal policies as the original MDP but with redistributed rewards that have largely reduced delays. If the return decomposition is optimal, then the new MDP does not have delayed rewards and TD estimates are unbiased. In this case, the rewards track Q-values so that the future expected reward is always zero. We experimentally confirm our theoretical results on bias and variance of TD and MC estimates. On artificial tasks with different lengths of reward delays, we show that RUDDER is exponentially faster than TD, MC, and MC Tree Search (MCTS). RUDDER outperforms rainbow, A3C, DDQN, Distributional DQN, Dueling DDQN, Noisy DQN, and Prioritized DDQN on the delayed reward Atari game Venture in only a fraction of the learning time. RUDDER considerably improves the state-of-the-art on the delayed reward Atari game Bowling in much less learning time. Source code is available at https://github.com/ml-jku/baselines-rudder, with demonstration videos at https://goo.gl/EQerZV.

machine learning, reinforcement learning, variance, (19 more...)

arXiv.org Artificial Intelligence

1806.07857

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
(13 more...)

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Sim-to-Real Reinforcement Learning for Deformable Object Manipulation

Matas, Jan, James, Stephen, Davison, Andrew J.

arXiv.org Artificial IntelligenceJun-20-2018

We have seen much recent progress in rigid object manipulation, but interaction with deformable objects has notably lagged behind. Due to the large configuration space of deformable objects, solutions using traditional modelling approaches require significant engineering work. Perhaps then, bypassing the need for explicit modelling and instead learning the control in an end-to-end manner serves as a better approach? Despite the growing interest in the use of end-to-end robot learning approaches, only a small amount of work has focused on their applicability to deformable object manipulation. Moreover, due to the large amount of data needed to learn these end-to-end solutions, an emerging trend is to learn to control policies in simulation and then transfer them over to the real world. To-date, no work has explored whether it is possible to learn and transfer deformable object policies. We believe that if sim-to-real methods are the way forward, then it should be possible to learn to interact with a wide variety of objects, and not just rigid objects. In this work, we use a combination of state-of-the-art deep reinforcement learning algorithms to solve the problem of manipulating deformable objects (specifically cloth). We evaluate our approach on three tasks --- folding a towel up to a mark, folding a face towel diagonally, and draping a piece of cloth over a hanger. Our agents are fully trained in simulation with domain randomisation, and then successfully deployed in the real world without having seen any real deformable objects.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1806.07851

Country: Europe > Spain > Aragón (0.04)

Genre:

Research Report (0.50)
Overview (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Learning in the Real World: Tim's Presentation at Thug Think 2017

#artificialintelligenceJun-19-2018, 13:56:10 GMT

Our own Chief Data Scientist Dr. Tim Oates recently give a talk at the Thug Think conference in Portland, Oregon. He focused in on reinforcement learning, a branch of machine learning with exciting applications. Whether you're familiar or brand new to the subject, his engaging presentation is going to be of interest to you. If you want to hear more of Tim's thoughts on reinforcement learning, read "Exploration vs. Exploitation"

deep learning, machine learning, reinforcement learning, (4 more...)

#artificialintelligence

Country: North America > United States > Oregon > Multnomah County > Portland (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Workshop Deep Reinforcement Learning - YouTube

#artificialintelligenceJun-19-2018, 13:56:04 GMT

artificial intelligence, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Unsupervised Imitation Learning

Curi, Sebastian, Levy, Kfir Y., Krause, Andreas

arXiv.org Machine LearningJun-19-2018

We introduce a novel method to learn a policy from unsupervised demonstrations of a process. Given a model of the system and a set of sequences of outputs, we find a policy that has a comparable performance to the original policy, without requiring access to the inputs of these demonstrations. We do so by first estimating the inputs of the system from observed unsupervised demonstrations. Then, we learn a policy by applying vanilla supervised learning algorithms to the (estimated)input-output pairs. For the input estimation, we present a new adaptive linear estimator (AdaL-IE) that explicitly trades-off variance and bias in the estimation. As we show empirically, AdaL-IE produces estimates with lower error compared to the state-of-the-art input estimation method, (UMV-IE) [Gillijns and De Moor, 2007]. Using AdaL-IE in conjunction with imitation learning enables us to successfully learn control policies that consistently outperform those using UMV-IE.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1806.072

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Vietnam > Long An Province (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)

Add feedback

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning

Gruslys, Audrunas, Dabney, Will, Azar, Mohammad Gheshlaghi, Piot, Bilal, Bellemare, Marc, Munos, Remi

arXiv.org Artificial IntelligenceJun-19-2018

In this work we present a new agent architecture, called Reactor, which combines multiple algorithmic and architectural contributions to produce an agent with higher sample-efficiency than Prioritized Dueling DQN (Wang et al., 2016) and Categorical DQN (Bellemare et al., 2017), while giving better run-time performance than A3C (Mnih et al., 2016). Our first contribution is a new policy evaluation algorithm called Distributional Retrace, which brings multi-step off-policy updates to the distributional reinforcement learning setting. The same approach can be used to convert several classes of multi-step policy evaluation algorithms designed for expected value evaluation into distributional ones. Next, we introduce the \b{eta}-leave-one-out policy gradient algorithm which improves the trade-off between variance and bias by using action values as a baseline. Our final algorithmic contribution is a new prioritized replay algorithm for sequences, which exploits the temporal locality of neighboring observations for more efficient replay prioritization. Using the Atari 2600 benchmarks, we show that each of these innovations contribute to both the sample efficiency and final agent performance. Finally, we demonstrate that Reactor reaches state-of-the-art performance after 200 million frames and less than a day of training.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1704.04651

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Continual Reinforcement Learning with Complex Synapses

Kaplanis, Christos, Shanahan, Murray, Clopath, Claudia

arXiv.org Artificial IntelligenceJun-19-2018

Unlike humans, who are capable of continual learning over their lifetimes, artificial neural networks have long been known to suffer from a phenomenon known as catastrophic forgetting, whereby new learning can lead to abrupt erasure of previously acquired knowledge. Whereas in a neural network the parameters are typically modelled as scalar values, an individual synapse in the brain comprises a complex network of interacting biochemical components that evolve at different timescales. In this paper, we show that by equipping tabular and deep reinforcement learning agents with a synaptic model that incorporates this biological complexity (Benna & Fusi, 2016), catastrophic forgetting can be mitigated at multiple timescales. In particular, we find that as well as enabling continual learning across sequential training of two simple tasks, it can also be used to overcome within-task forgetting by reducing the need for an experience replay database.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1802.07239

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.93)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Free Online Sources To Learn Machine Learning – AiMantra – Medium

#artificialintelligenceJun-18-2018, 08:51:05 GMT

Above two are intro course to deep learning. This is a Youtube channel which contains courses by Prof. Andrew Ng on various topics in deep learning. This a 14 week course, taught by Jeremy Howard. It cover most of the topics in deep learning. This course is a gentle introduction to Reinforcement Learning. It walks you through most of the topics in Reinforcement Learning in high level. This course is not taught at a basic level, so you need to be familiar with basic concepts and perhaps a little more. For more stories follow AiMantra.

deep learning, machine learning, reinforcement learning, (3 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

Theoretical Analysis of Image-to-Image Translation with Adversarial Learning

Pan, Xudong, Zhang, Mi, Ding, Daizong

arXiv.org Machine LearningJun-18-2018

Recently, a unified model for image-to-image translation tasks within adversarial learning framework has aroused widespread research interests in computer vision practitioners. Their reported empirical success however lacks solid theoretical interpretations for its inherent mechanism. In this paper, we reformulate their model from a brand-new geometrical perspective and have eventually reached a full interpretation on some interesting but unclear empirical phenomenons from their experiments. Furthermore, by extending the definition of generalization for generative adversarial nets to a broader sense, we have derived a condition to control the generalization capability of their model. According to our derived condition, several practical suggestions have also been proposed on model design and dataset construction as a guidance for further empirical researches.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1806.07001

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress

Arora, Saurabh, Doshi, Prashant

arXiv.org Machine LearningJun-18-2018

Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1806.06877

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Georgia > Clarke County > Athens (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(8 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation (0.68)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback