Goto

Collaborating Authors

 Reinforcement Learning


Deep Reinforcement Learning: Pong from Pixels

#artificialintelligence

This is a long overdue blog post on Reinforcement Learning (RL). You may have noticed that computers can now automatically learn to play ATARI games (from raw game pixels!), they are beating world champions at Go, simulated quadrupeds are learning to run and leap, and robots are learning how to perform complex manipulation tasks that defy explicit programming. It turns out that all of these advances fall under the umbrella of RL research. I also became interested in RL myself over the last year: I worked through Richard Sutton's book, read through David Silver's course, watched John Schulmann's lectures, wrote an RL library in Javascript, over the summer interned at DeepMind working in the DeepRL group, and most recently pitched in a little with the design/development of OpenAI Gym, a new RL benchmarking toolkit. So I've certainly been on this funwagon for at least a year but until now I haven't gotten around to writing up a short post on why RL is a big deal, what it's about, how it all developed and where it might be going. It's interesting to reflect on the nature of recent progress in RL. Similar to what happened in Computer Vision, the progress in RL is not driven as much as you might reasonably assume by new amazing ideas. In Computer Vision, the 2012 AlexNet was mostly a scaled up (deeper and wider) version of 1990's ConvNets. Similarly, the ATARI Deep Q Learning paper from 2013 is an implementation of a standard algorithm (Q Learning with function approximation, which you can find in the standard RL book of Sutton 1998), where the function approximator happened to be a ConvNet. AlphaGo uses policy gradients with Monte Carlo Tree Search (MCTS) - these are also standard components.


Reinforcement Learning and DQN, learning to play from pixels - Ruben Fiszel's website

#artificialintelligence

My 2 month summer internship at Skymind (the company behind the open source deeplearning library DL4J) comes to an end and this is a post to summarize what I have been working on: Building a deep reinforcement learning library for DL4J: โ€ฆ (drums roll) โ€ฆ RL4J! This post begins by an introduction to reinforcement learning and is then followed by a detailed explanation of DQN (Deep Q-Network) for pixel inputs and is concluded by an RL4J example. I will assume from the reader some familiarity with neural networks. But first, lets talk about the core concepts of reinforcement learning. A "simple aspect of science" may be defined as one which, through good fortune, I happen to understand. Reinforcement Learning is an exciting area of machine learning. It is basically the learning of an efficient strategy in a given environment. Informally, this is very similar to Pavlovian conditioning: you assign a reward for a given behavior and over time, the agents learn to reproduce that behavior in order to receive more rewards. It is an iterative trial and error process. Formally, an environment is defined as a Markov Decision Process (MDP). Note: It is usually more convenient to use the set of Action \(A_s\) which is the set of available move from a given state, than the complete set A. \(A_s\) is simply the elements \(a\) in \(A\) such that \(P(s' s, a) 0\).



Deep Deterministic Policy Gradients in TensorFlow

#artificialintelligence

Deep Reinforcement Learning has recently gained a lot of traction in the machine learning community due to the significant amount of progress that has been made in the past few years. Traditionally, reinforcement learning algorithms were constrained to tiny, discretized grid worlds, which seriously inhibited them from gaining credibility as being viable machine learning tools. Here's a classic example from Richard Sutton's book, which I will be referencing a lot. After Deep Q-Networks [4] became a hit, people realized that deep learning methods could be used to solve high-dimensional problems. One of the subsequent challenges that the reinforcement learning community faced was figuring out how to deal with continuous action spaces.


Single-shot Adaptive Measurement for Quantum-enhanced Metrology

arXiv.org Machine Learning

Quantum-enhanced metrology aims to estimate an unknown parameter such that the precision scales better than the shot-noise bound. Single-shot adaptive quantum-enhanced metrology (AQEM) is a promising approach that uses feedback to tweak the quantum process according to previous measurement outcomes. Techniques and formalism for the adaptive case are quite different from the usual non-adaptive quantum metrology approach due to the causal relationship between measurements and outcomes. We construct a formal framework for AQEM by modeling the procedure as a decision-making process, and we derive the imprecision and the Cram\'{e}r-Rao lower bound with explicit dependence on the feedback policy. We also explain the reinforcement learning approach for generating quantum control policies, which is adopted due to the optimal policy being non-trivial to devise. Applying a learning algorithm based on differential evolution enables us to attain imprecision for adaptive interferometric phase estimation, which turns out to be SQL when non-entangled particles are used in the scheme.


Teaching Machines to Direct Traffic through Deep Reinforcement Learning

#artificialintelligence

The dreaded time of day when traffic conditions seem bent on making you late. As your car slowly creeps in line behind countless others stuck at a stop light, you think to yourself, "Why aren't these lights changing faster?" Traffic control scientists have long tried to solve this signaling problem. Unfortunately, the complexity of traffic situations has made the job extremely hard. A recent study suggests that machines can learn how to plan traffic signals just right to reduce wait times and make traffic queues shorter.


Physical Review Letters - Accepted Paper: Quantum-enhanced machine learning

#artificialintelligence

The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention.


aikorea/awesome-rl

#artificialintelligence

A curated list of resources dedicated to reinforcement learning. We are looking for more contributors and maintainers!


Teaching machines to direct traffic through deep reinforcement learning

#artificialintelligence

Rush hour--the dreaded time of day when traffic conditions seem bent on making you late. As your car slowly creeps in line behind countless others stuck at a stop light, you think to yourself, "Why aren't these lights changing faster?" Traffic control scientists have long tried to solve this signaling problem. Unfortunately, the complexity of traffic situations makes the job extremely hard. A recent study suggests that machines can learn how to plan traffic signals just right to reduce wait times and make traffic queues shorter.


How deep reinforcement learning can help chatbots

#artificialintelligence

In March this year, Microsoft CEO Satya Nadella talked about the industry trend of using human language more pervasively for interaction with computing devices, a trend he called "conversation as a platform." He also announced several bot initiatives, including the company's bot framework. In April, Facebook launched its Messenger platform with bots. Then, in May, Google announced its attempt to develop AI-powered bots, called Google Assistant. Since then, bots have been widely regarded as a new user interface (UI) to fundamentally change how computing will be experienced by people.