Goto

Collaborating Authors

 Reinforcement Learning


One-Shot Imitation Learning

arXiv.org Artificial Intelligence

Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. In this paper, we propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning. Specifically, we consider the setting where there is a very large set of tasks, and each task has many instantiations. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into two-block towers, etc. In each case, different instances of the task would consist of different sets of blocks with different initial states. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At test time, a demonstration of a single instance of a new task is presented, and the neural net is expected to perform well on new instances of this new task. The use of soft attention allows the model to generalize to conditions and tasks unseen in the training data. We anticipate that by training this model on a much greater variety of tasks and settings, we will obtain a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks. Videos available at https://bit.ly/nips2017-oneshot .


google-s-ai-built-it-s-own-ai-that-outperforms-any-made-by-humans

#artificialintelligence

In May 2017, researchers at Google Brain announced the creation of AutoML, an artificial intelligence (AI) that's capable of generating its own AIs. More recently, they decided to present AutoML with its biggest challenge to date, and the AI that can build AI created a'child' that outperformed all of its human-made counterparts. The Google researchers automated the design of machine learning models using an approach called reinforcement learning. AutoML acts as a controller neural network that develops a child AI network for a specific task. For this particular child AI, which the researchers called NASNet, the task was recognising objects - people, cars, traffic lights, handbags, backpacks, etc. - in a video in real-time.


Thinking Fast and Slow with Deep Learning and Tree Search

arXiv.org Artificial Intelligence

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most recent Olympiad Champion player to be publicly released.


Model-based Reinforcement Learning with Neural Network Dynamics

@machinelearnbot

A learned neural network dynamics model enables a hexapod robot to learn to run and follow desired trajectories, using just 17 minutes of real-world experience. Enabling robots to act autonomously in the real-world is difficult. Even with expensive robots and teams of world-class researchers, robots still have difficulty autonomously navigating and interacting in complex, unstructured environments. Why are autonomous robots not out in the world among us? Engineering systems that can cope with all the complexities of our world is hard.


Model-based reinforcement learning with neural network dynamics

Robohub

Enabling robots to act autonomously in the real-world is difficult. Even with expensive robots and teams of world-class researchers, robots still have difficulty autonomously navigating and interacting in complex, unstructured environments. A learned neural network dynamics model enables a hexapod robot to learn to run and follow desired trajectories, using just 17 minutes of real-world experience. Why are autonomous robots not out in the world among us? Engineering systems that can cope with all the complexities of our world is hard.


?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Mashable+%28Mashable%29

Mashable

Just to let you know, if you buy something featured here, Mashable might earn an affiliate commission. If you've ever trained a puppy before, you know just how valuable food rewards can be. After little Sparky realizes that the act of rolling over instantly earns him a mouthful of peanut butter, he starts performing the trick with increased enthusiasm and speed. This type of behavioral psychology -- getting something to act a certain way so that it maximizes its rewards -- has inspired a new approach to artificial intelligence called reinforcement learning. Named one of the 10 Breakthrough Technologies of 2017 by the MIT Technology Review, this revolutionary kind of machine learning allows computers to learn new things without human intervention through the mere act of experimenting.


5 Ways to Get Started with Reinforcement Learning

@machinelearnbot

Machine learning algorithms, and neural networks in particular, are considered to be the cause of a new AI'revolution'. In this article I will introduce the concept of reinforcement learning but with limited technical details so that readers with a variety of backgrounds can understand the essence of the technique, its capabilities and limitations. At the end of the article, I will provide links to a few resources for implementing RL. Broadly speaking, data-driven algorithms can be categorized into three types: Supervised, Unsupervised, and Reinforcement learning. The first two are generally used to perform tasks such as image classification, detection, etc.


Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

arXiv.org Machine Learning

In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on epsilon-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such as Gaussian Process SARSA (GPSARSA) estimate uncertainties and are sample efficient, leading to better user experience, but on the expense of a greater computational complexity. This paper examines approaches to extract uncertainty estimates from deep Q-networks (DQN) in the context of dialogue management. We perform an extensive benchmark of deep Bayesian methods to extract uncertainty estimates, namely Bayes-By-Backprop, dropout, its concrete variation, bootstrapped ensemble and alpha-divergences, combining it with DQN algorithm.


A Glance at Reinforcement Learning - ADG Efficiency

#artificialintelligence

A professional highlight of 2017 has been teaching A Glance at Reinforcement Learning – an introductory course I've developed. You can find the course materials on GitHub. This one day course is aimed at data scientists with a grasp of supervised machine learning but no prior understanding of reinforcement learning. Course scope – introduction to the fundamental concepts of reinforcement learning – value function methods dynamic programming, Monte Carlo, temporal difference, Q-Learning, DQN – policy gradient methods score function, REINFORCE, advantage actor-critic, AC3 – AlphaGo – practical concerns reward scaling, mistakes I've made, advice from Vlad Mnih & John Schulman – literature highlights distributional perspective, auxiliary loss functions, inverse RL I've given this course to three batches at Data Science Retreat in Berlin and once to a group of startups from Entrepreneur First in London. Each time I've had great questions, kind feedback and improved my own understanding.


Variational Deep Q Network

arXiv.org Machine Learning

We propose a framework that directly tackles the probability distribution of the value function parameters in Deep Q Network (DQN), with powerful variational inference subroutines to approximate the posterior of the parameters. We will establish the equivalence between our proposed surrogate objective and variational inference loss. Our new algorithm achieves efficient exploration and performs well on large scale chain Markov Decision Process (MDP). Deep reinforcement learning (RL) has enjoyed numerous recent successes in video games, board games, and robotics control [17, 3, 9, 18]. Deep RL algorithms typically apply naive exploration schemes such as ɛ greedy [12, 19], directly injecting noise into actions [10], and action level entropy regularization [24].