AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Compatible Reward Inverse Reinforcement Learning

Metelli, Alberto Maria, Pirotta, Matteo, Restelli, Marcello

Neural Information Processing SystemsFeb-14-2020, 09:28:43 GMT

Inverse Reinforcement Learning (IRL) is an effective approach to recover a reward function that explains the behavior of an expert by observing a set of demonstrations. This paper is about a novel model-free IRL approach that, differently from most of the existing IRL algorithms, does not require to specify a function space where to search for the expert's reward function. Leveraging on the fact that the policy gradient needs to be zero for any optimal policy, the algorithm generates a set of basis functions that span the subspace of reward functions that make the policy gradient vanish. Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy. After introducing our approach for finite domains, we extend it to continuous ones.

compatible reward inverse reinforcement learning, inverse reinforcement learning, reward function, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques

Richardson, Elad, Herskovitz, Rom, Ginsburg, Boris, Zibulevsky, Michael

Neural Information Processing SystemsFeb-14-2020, 09:12:30 GMT

SEBOOST applies a secondary optimization process in the subspace spanned by the last steps and descent directions. The method was inspired by the SESOP optimization method for large-scale problems, and has been adapted for the stochastic learning framework. It can be applied on top of any existing optimization method with no need to tweak the internal algorithm. We show that the method is able to boost the performance of different algorithms, and make them more robust to changes in their hyper-parameters. As the boosting steps of SEBOOST are applied between large sets of descent steps, the additional subspace optimization hardly increases the overall computational burden.

seboost, stochastic learning, subspace optimization technique, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning

Petrik, Marek, Subramanian, Dharmashankar

Neural Information Processing SystemsFeb-14-2020, 08:59:27 GMT

We describe how to use robust Markov decision processes for value function approximation with state aggregation. The robustness serves to reduce the sensitivity to the approximation error of sub-optimal policies in comparison to classical methods such as fitted value iteration. Our experimental results show that using the robust representation can significantly improve the solution quality with minimal additional computational cost. Papers published at the Neural Information Processing Systems Conference.

approximating aggregated mdp, reinforcement learning, robustness, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Repeated Inverse Reinforcement Learning

Amin, Kareem, Jiang, Nan, Singh, Satinder

Neural Information Processing SystemsFeb-14-2020, 08:58:28 GMT

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted. Each time the human is surprised, the agent is provided a demonstration of the desired behavior by the human. We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results. Papers published at the Neural Information Processing Systems Conference.

agent, repeated inverse reinforcement learning, sequence

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning

Zhang, Liangpeng, Tang, Ke, Yao, Xin

Neural Information Processing SystemsFeb-14-2020, 08:58:18 GMT

Under/overestimation of state/action values are harmful for reinforcement learning agents. In this paper, we show that a state/action value estimated using the Bellman equation can be decomposed to a weighted sum of path-wise values that follow log-normal distributions. Since log-normal distributions are skewed, the distribution of estimated state/action values can also be skewed, leading to an imbalanced likelihood of under/overestimation. The degree of such imbalance can vary greatly among actions and policies within a single problem instance, making the agent prone to select actions/policies that have inferior expected return and higher likelihood of overestimation. We present a comprehensive analysis to such skewness, examine its factors and impacts through both theoretical and empirical results, and discuss the possible ways to reduce its undesirable effects.

log-normality and skewness, reinforcement learning, state action value, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Unifying Count-Based Exploration and Intrinsic Motivation

Bellemare, Marc, Srinivasan, Sriram, Ostrovski, Georg, Schaul, Tom, Saxton, David, Munos, Remi

Neural Information Processing SystemsFeb-14-2020, 08:58:14 GMT

We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across states. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels.

count-based exploration and intrinsic motivation, density model, unifying count-based exploration, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Simple random search of static linear policies is competitive for reinforcement learning

Mania, Horia, Guy, Aurelia, Recht, Benjamin

Neural Information Processing SystemsFeb-14-2020, 08:43:34 GMT

Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling dynamical systems without requiring models of the system dynamics. We introduce a model-free random search algorithm for training static, linear policies for continuous control problems. Common evaluation methodology shows that our method matches state-of-the-art sample efficiency on the benchmark MuJoCo locomotion tasks. Nonetheless, more rigorous evaluation reveals that the assessment of performance on these benchmarks is optimistic. We evaluate the performance of our method over hundreds of random seeds and many different hyperparameter configurations for each benchmark task.

reinforcement, simple random search, static linear policy, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Temporal Regularization for Markov Decision Process

Thodoroff, Pierre, Durand, Audrey, Pineau, Joelle, Precup, Doina

Neural Information Processing SystemsFeb-14-2020, 08:43:13 GMT

Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensional domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization.

markov decision process, temporal regularization, variance

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.56)

Add feedback

How hard is my MDP?" The distribution-norm to the rescue"

Maillard, Odalric-Ambrym, Mann, Timothy A., Mannor, Shie

Neural Information Processing SystemsFeb-14-2020, 08:42:42 GMT

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$. In many problems, a good approximation of $p$ is not needed. For instance, if from one state-action pair $(s,a)$, one can only transit to states with the same value, learning $p(\cdot s,a)$ accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the {\em distribution-norm}. The distribution-norm w.r.t. a measure $ u$ is defined on zero $ u$-mean functions $f$ by the standard variation of $f$ with respect to $ u$.

concentration inequality, hardness measure, state-action pair, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)

Add feedback

Genetic-Gated Networks for Deep Reinforcement Learning

Chang, Simyung, Yang, John, Choi, Jaeseok, Kwak, Nojun

Neural Information Processing SystemsFeb-14-2020, 08:42:34 GMT

We introduce the Genetic-Gated Networks (G2Ns), simple neural networks that combine a gate vector composed of binary genetic genes in the hidden layer(s) of networks. Our method can take both advantages of gradient-free optimization and gradient-based optimization methods, of which the former is effective for problems with multiple local minima, while the latter can quickly find local minima. In addition, multiple chromosomes can define different models, making it easy to construct multiple models and can be effectively applied to problems that require multiple models. We show that this G2N can be applied to typical reinforcement learning algorithms to achieve a large improvement in sample efficiency and performance. Papers published at the Neural Information Processing Systems Conference.

deep reinforcement learning, genetic-gated network, local minima

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback