Reinforcement Learning
PAC Reinforcement Learning with Rich Observations
Krishnamurthy, Akshay, Agarwal, Alekh, Langford, John
We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal of achieving long-term performance competitive with a large set of policies. To avoid barriers to sample-efficient learning associated with large observation spaces and general POMDPs, we focus on problems that can be summarized by a small number of hidden states and have long-term rewards that are predictable by a reactive function class. In this setting, we design and analyze a new reinforcement learning algorithm, Least Squares Value Elimination by Exploration. We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation.
5 EBooks to Read Before Getting into A Machine Learning Career
Don't know where to start? If you are looking for something more, you could look here for an overview of MOOCs and online lectures from freely-available university lectures. Of course, nothing substitutes rigorous formal education, but let's say that isn't in the cards for whatever reason. Not all machine learning positions require a PhD; it really depends where on the machine learning spectrum one wants to fit in. Check out this motivating and inspirational post, the author of which went from little understanding of machine learning to actively and effectively utilizing techniques in their job within a year.
paulhendricks/gym-R
OpenAI Gym is a open-source Python toolkit for developing and comparing reinforcement learning algorithms. This R package is a wrapper for the OpenAI Gym API, and enables access to an ever-growing variety of environments. If you encounter a clear bug, please file a minimal reproducible example on github.
Simple reinforcement learning methods to learn CartPole
I've been experimenting with OpenAI gym recently, and one of the simplest environments is CartPole. The problem consists of balancing a pole connected with one joint on top of a moving cart. The only actions are to add a force of -1 or 1 to the cart, pushing it left or right. In this post, I will be going over some of the methods described in the CartPole request for research, including implementations and some intuition behind how they work. In CartPole's environment, there are four observations at any given state, representing information such as the angle of the pole and the position of the cart.
A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning
One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key strategies in scaling up reinforcement learning algorithms. In this setting, we have effective and reasonably well understood algorithms for adapting the learning-rate parameter, online during learning. Such meta-learning approaches can improve robustness of learning and enable specialization to current task, improving learning speed. For temporal-difference learning algorithms which we study here, there is yet another parameter, $\lambda$, that similarly impacts learning speed and stability in practice. Unfortunately, unlike the learning-rate parameter, $\lambda$ parametrizes the objective function that temporal-difference methods optimize. Different choices of $\lambda$ produce different fixed-point solutions, and thus adapting $\lambda$ online and characterizing the optimization is substantially more complex than adapting the learning-rate parameter. There are no meta-learning method for $\lambda$ that can achieve (1) incremental updating, (2) compatibility with function approximation, and (3) maintain stability of learning under both on and off-policy sampling. In this paper we contribute a novel objective function for optimizing $\lambda$ as a function of state rather than time. We derive a new incremental, linear complexity $\lambda$-adaption algorithm that does not require offline batch updating or access to a model of the world, and present a suite of experiments illustrating the practicality of our new algorithm in three different settings. Taken together, our contributions represent a concrete step towards black-box application of temporal-difference learning methods in real world problems.
5 EBooks to Read Before Getting into A Machine Learning Career
Note that, while there are numerous machine learning ebooks available for free online, including many which are very well-known, I have opted to move past these "regulars" and seek out lesser-known and more niche options for readers. Don't know where to start? If you are looking for something more, you could look here for an overview of MOOCs and online lectures from freely-available university lectures. Of course, nothing substitutes rigorous formal education, but let's say that isn't in the cards for whatever reason. Not all machine learning positions require a PhD; it really depends where on the machine learning spectrum one wants to fit in.
Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation
Parisi, Simone, Pirotta, Matteo, Restelli, Marcello
Many real-world control applications, from economics to robotics, are characterized by the presence of multiple conflicting objectives. In these problems, the standard concept of optimality is replaced by Pareto-optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge. In this paper, we propose a reinforcement learning policy gradient approach to learn a continuous approximation of the Pareto frontier in multi-objective Markov Decision Problems (MOMDPs). Differently from previous policy gradient algorithms, where n optimization routines are executed to have n solutions, our approach performs a single gradient ascent run, generating at each step an improved continuous approximation of the Pareto frontier. The idea is to optimize the parameters of a function defining a manifold in the policy parameters space, so that the corresponding image in the objectives space gets as close as possible to the true Pareto frontier. Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers. Finally, the properties of the proposed approach are empirically evaluated on two problems, a linear-quadratic Gaussian regulator and a water reservoir control task.
Quantum artificial intelligence could lead to super-smart machines
Quantum physics has some spooky, anti-intuitive effects, but it could also be essential to how actual intuition works, at least in regards to artificial intelligence. In a new study, researcher Vedran Dunjko and co-authors applied a quantum analysis to a field within artificial intelligence called reinforcement learning, which deals with how to program a machine to make appropriate choices to maximize a cumulative reward. The field is surprisingly complex and must take into account everything from game theory to information theory. Dunjko and his team found that quantum effects, when applied to reinforcement learning in artificial intelligence systems, could provide quadratic improvements in learning efficiency, reports Phys.org . Exponential improvements might even be possible over short-term performance tasks.
Towards deep symbolic reinforcement learning
Every now and then I read a paper that makes a really strong connection with me, one where I can't stop thinking about the implications and I can't wait to share it with all of you. For me, this is one such paper. In the great see-saw of popularity for artificial intelligence techniques, symbolic reasoning and neural networks have taken turns, each having their dominant decade(s). The popular wisdom is that data-driven learning techniques (machine learning) won. Symbolic reasoning systems were just too hard and fragile to be successful at scale.