AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Optimization on a Budget: A Reinforcement Learning Approach

Ruvolo, Paul L., Fasel, Ian, Movellan, Javier R.

Neural Information Processing SystemsFeb-15-2020, 03:13:50 GMT

Many popular optimization algorithms, like the Levenberg-Marquardt algorithm (LMA), use heuristic-based controllers'' that modulate the behavior of the optimizer during the optimization process. For example, in the LMA a damping parameter is dynamically modified based on a set rules that were developed using various heuristic arguments. Reinforcement learning (RL) is a machine learning approach to learn optimal controllers by examples and thus is an obvious candidate to improve the heuristic-based controllers implicit in the most popular and heavily used optimization algorithms. Improving the performance of off-the-shelf optimizers is particularly important for time-constrained optimization problems. For example the LMA algorithm has become popular for many real-time computer vision problems, including object tracking from video, where only a small amount of time can be allocated to the optimizer on each incoming video frame.

algorithm, controller, reinforcement learning approach, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms

Roberts, John W., Tedrake, Russ

Neural Information Processing SystemsFeb-15-2020, 03:12:43 GMT

Policy gradient (PG) reinforcement learning algorithms have strong (local) convergence guarantees, but their learning performance is typically limited by a large variance in the estimate of the gradient. In this paper, we formulate the variance reduction problem by describing a signal-to-noise ratio (SNR) for policy gradient algorithms, and evaluate this SNR carefully for the popular Weight Perturbation (WP) algorithm. We confirm that SNR is a good predictor of long-term learning performance, and that in our episodic formulation, the cost-to-go function is indeed the optimal baseline. We then propose two modifications to traditional model-free policy gradient algorithms in order to optimize the SNR. First, we examine WP using anisotropic sampling distributions, which introduces a bias into the update but increases the SNR; this bias can be interpretted as following the natural gradient of the cost function.

policy gradient algorithm, signal-to-noise ratio analysis, snr

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

Add feedback

Biasing Approximate Dynamic Programming with a Lower Discount Factor

Petrik, Marek, Scherrer, Bruno

Neural Information Processing SystemsFeb-15-2020, 02:58:22 GMT

Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. In fact, it is often used in problems with is no intrinsic motivation. In this paper, we show that when used in approximate dynamic programming, an artificially low discount factor may significantly improve the performance on some problems, such as Tetris. We propose two explanations for this phenomenon. Our first justification follows directly from the standard approximation error bounds: using a lower discount factor may decrease the approximation error bounds.

biasing approximate dynamic programming, discount factor, lower discount factor, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

A Generalized Natural Actor-Critic Algorithm

Morimura, Tetsuro, Uchibe, Eiji, Yoshimoto, Junichiro, Doya, Kenji

Neural Information Processing SystemsFeb-15-2020, 02:57:17 GMT

Policy gradient Reinforcement Learning (RL) algorithms have received much attention in seeking stochastic policies that maximize the average rewards. In addition, extensions based on the concept of the Natural Gradient (NG) show promising learning efficiency because these regard metrics for the task. Though there are two candidate metrics, Kakades Fisher Information Matrix (FIM) and Morimuras FIM, all RL algorithms with NG have followed the Kakades approach. In this paper, we describe a generalized Natural Gradient (gNG) by linearly interpolating the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, generalized Natural Actor-Critic (gNAC). The gNAC algorithm involves a near optimal auxiliary function to reduce the variance of the gNG estimates.

algorithm, generalized natural actor-critic algorithm, natural gradient, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Multi-resolution Exploration in Continuous Spaces

Nouri, Ali, Littman, Michael L.

Neural Information Processing SystemsFeb-15-2020, 02:56:42 GMT

The essence of exploration is acting to try to decrease uncertainty. We propose a new methodology for representing uncertainty in continuous-state control problems. Our approach, multi-resolution exploration (MRE), uses a hierarchical mapping to identify regions of the state space that would benefit from additional samples. We demonstrate MRE's broad utility by using it to speed up learning in a prototypical model-based and value-based reinforcement-learning method. Empirical results show that MRE improves upon state-of-the-art exploration approaches.

continuous space, multi-resolution exploration

Neural Information Processing Systems

Genre: Research Report (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Add feedback

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

Bhatnagar, Shalabh, Precup, Doina, Silver, David, Sutton, Richard S., Maei, Hamid R., Szepesvári, Csaba

Neural Information Processing SystemsFeb-15-2020, 02:42:59 GMT

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD($\lambda$), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximation, can cause these algorithms to become unstable (i.e., the parameters of the approximator may diverge). Sutton et al (2009a,b) solved the problem of off-policy learning with linear TD algorithms by introducing a new objective function, related to the Bellman-error, and algorithms that perform stochastic gradient-descent on this function. In this paper, we generalize their work to nonlinear function approximation.

algorithm, arbitrary smooth function approximation, convergent temporal-difference learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Solving Stochastic Games

Dermed, Liam M., Isbell, Charles L.

Neural Information Processing SystemsFeb-15-2020, 02:42:31 GMT

Solving multi-agent reinforcement learning problems has proven difficult because of the lack of tractable algorithms. We provide the first approximation algorithm which solves stochastic games to within $\epsilon$ relative error of the optimal game-theoretic solution, in time polynomial in $1/\epsilon$. Our algorithm extends Murrays and Gordon's (2007) modified Bellman equation which determines the \emph{set} of all possible achievable utilities; this provides us a truly general framework for multi-agent learning. Further, we empirically validate our algorithm and find the computational cost to be orders of magnitude less than what the theory predicts. Papers published at the Neural Information Processing Systems Conference.

algorithm, epsilon, stochastic game

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

PAC-Bayesian Model Selection for Reinforcement Learning

Fard, Mahdi M., Pineau, Joelle

Neural Information Processing SystemsFeb-15-2020, 02:28:23 GMT

This paper introduces the first set of PAC-Bayesian bounds for the batch reinforcement learning problem in finite state spaces. These bounds hold regardless of the correctness of the prior distribution. We demonstrate how such bounds can be used for model-selection in control problems where prior information is available either on the dynamics of the environment, or on the value of actions. Our empirical results confirm that PAC-Bayesian model-selection is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignores them when they are misleading. Papers published at the Neural Information Processing Systems Conference.

pac-bayesian model selection, reinforcement learning

Neural Information Processing Systems

Genre: Research Report (0.71)

Industry: Education > Focused Education > Special Education (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

Konidaris, George, Barto, Andrew G.

Neural Information Processing SystemsFeb-15-2020, 02:26:56 GMT

We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains, that builds chains of skills leading to an end-of-task reward. We demonstrate experimentally that it creates skills that result in performance benefits in a challenging continuous domain. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, continuous reinforcement learning domain, machine learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Search for Motor Primitives in Robotics

Kober, Jens, Peters, Jan R.

Neural Information Processing SystemsFeb-15-2020, 02:13:31 GMT

Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results into a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning by assuming a form of exploration that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable in complex motor learning tasks.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)

Add feedback