Reinforcement Learning
Torch Dueling Deep Q-Networks
Deep Q-networks (DQNs) [1] have reignited interest in neural networks for reinforcement learning, proving their abilities on the challenging Arcade Learning Environment (ALE) benchmark [2]. The ALE is a reinforcement learning interface for over 50 video games for the Atari 2600; with a single architecture and choice of hyperparameters the DQN was able to achieve superhuman scores on over half of these games. The original work has now been superseded with several advancements, several of which can be found on GitHub. As training on the ALE can take over a week on a GPU, the code is also set up to learn how to play a simpler game of catch in a couple of hours on a CPU. Most recent deep learning research has focused around supervised learning, which involves finding a mapping from input data \(x\) to target data \(y\).
Q-learning with Neural Networks
We've made it to what we've all been waiting for, Q-learning with neural networks. Since I'm sure a lot of people didn't follow parts 1 and 2 because they were kind of boring, I will attempt to make this post relatively (but not completely) self-contained. In this post, we will dive into using Q-learning to train an agent (player) how to play Gridworld. Gridworld is a simple text based game in which there is a 4x4 grid of tiles and 4 objects placed therein: a player, pit, goal, and a wall. The player can move up/down/left/right ( a \in A \{up,down,left,right\}) and the point of the game is to get to the goal where the player will receive a numerical reward. Unfortunately, we have to avoid a pit, because if we land on the pit we are penalized with a negative'reward'.
TensorFlow in Action: TensorBoard, Training a Model, and Deep Q-learning - Blog on All Things Cloud Foundry
Peter Morgan is a published author and computer science industry veteran with twenty years' experience working within the IT industry. Before entering industry, he solved high energy physics problems while enrolled in the PhD program in physics at the University of Massachusetts at Amherst. After spending three years as a Research Associate on an experiment lead by Stanford University to measure the mass of the neutrino, Peter now works as a technical director at Data Science Partnership--a company he co-founded--overseeing business development and helping clients to design and implement their deep learning solutions.
An Introduction to Semi-supervised Reinforcement Learning
As usual, our goal is to quickly learn a policy which receives a high reward per episode. We can apply a traditional RL algorithm to the semi-supervised setting by simply ignoring all of the unlabelled episodes. This will generally result in very slow learning. The interesting challenge is to learn efficiently from the unlabelled episodes. I think that semi-supervised RL is a valuable ingredient for AI control, as well as an interesting research problem in reinforcement learning.
Deep Reinforcement Learning
Goal In this week's summary we introduce the basic concepts behind reinforcement learning and some ways it is applied in very controlled environments. Motivation Reinforcement learning methods recently experienced a hype through AlphaGo ranking next to the best human Go players. Furthermore the complexity of Go might ease the transfer of reinforcement learning to very large NLP tasks like dialog handling. Steps Reinforcement Learning is usually applied to tasks, where an environment is partially observable and a certain action has to be taken. Any kind of game basically fits the former description.
NN with Q-learning: which activation function with which cost function? โข /r/MachineLearning
I've been messing around with Q-learning adapted with NN, after I read these two articles: I'm not yet ready to understand and implement conv NN so I just fooled around with normal NN. I've been told to use sigmoid as activation function and cross-entropy as cost function. The problem is it doesn't seem to work well with Q-learning since I want my output to be a real number, using a probability output seem like a bad hack to me. The papers I read seem to use the quadratic cost function but I have no detail about the activation function. I checked the github of someone who implemented all these and he seem to not use any activation function at all.
openai/gym
OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to an ever-growing variety of environments. You can use it from Python code, and soon from other languages. If you're not sure where to start, we recommend beginning with the docs on our site. There are two basic concepts in reinforcement learning: the environment (namely, the outside world) and the agent (namely, the algorithm you are writing).
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Dann, Christoph, Brunskill, Emma
Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound $\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\mathcal S|$. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least $H^3$.
A New 'Gym' for Building and Testing A.I. - Dice Insights
If you're interested in working with machine learning and artificial-intelligence algorithms--but unsure of how to start--check out the OpenAI Gym, now in beta. The premise behind OpenAI Gym is simple: it's a toolkit for building reinforcement learning (RL) algorithms, which govern bots' decision-making and motor-control capabilities. Reinforcement learning is a key element in A.I. development, as it allows software to deal with random, unpredictable environments; one "classic" problem involves balancing an untethered pole on a rolling cart: OpenAI is a non-profit "artificial intelligence research company" funded by some heavy hitters in the tech world, including Tesla CEO Elon Musk and venture capitalist Peter Thiel. Its altruistic goal is to develop open-source A.I. software that's "friendly" to humanity. According to a blog posting accompanying the launch of OpenAI Gym, RL research is slowed by two factors: a need for better benchmarks, and a lack of standardization of environments used in publications.