Goto

Collaborating Authors

 Sherstan, Craig


Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation

arXiv.org Machine Learning

Here we propose using the successor representation (SR) to accelerate learning in a constructive knowledge system based on general value functions (GVFs). In real-world settings like robotics for unstructured and dynamic environments, it is infeasible to model all meaningful aspects of a system and its environment by hand due to both complexity and size. Instead, robots must be capable of learning and adapting to changes in their environment and task, incrementally constructing models from their own experience. GVFs, taken from the field of reinforcement learning (RL), are a way of modeling the world as predictive questions. One approach to such models proposes a massive network of interconnected and interdependent GVFs, which are incrementally added over time. It is reasonable to expect that new, incrementally added predictions can be learned more swiftly if the learning process leverages knowledge gained from past experience. The SR provides such a means of separating the dynamics of the world from the prediction targets and thus capturing regularities that can be reused across multiple GVFs. As a primary contribution of this work, we show that using SR-based predictions can improve sample efficiency and learning speed in a continual learning setting where new predictions are incrementally added and learned over time. We analyze our approach in a grid-world and then demonstrate its potential on data from a physical robot arm.


Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods

arXiv.org Artificial Intelligence

This paper investigates estimating the variance of a temporal-difference learning agent's update target. Most reinforcement learning methods use an estimate of the value function, which captures how good it is for the agent to be in a particular state and is mathematically expressed as the expected sum of discounted future rewards (called the return). These values can be straightforwardly estimated by averaging batches of returns using Monte Carlo methods. However, if we wish to update the agent's value estimates during learning--before terminal outcomes are observed--we must use a different estimation target called the {\lambda}-return, which truncates the return with the agent's own estimate of the value function. Temporal difference learning methods estimate the expected {\lambda}-return for each state, allowing these methods to update online and incrementally, and in most cases achieve better generalization error and faster learning than Monte Carlo methods. Naturally one could attempt to estimate higher-order moments of the {\lambda}-return. This paper is about estimating the variance of the {\lambda}-return. Prior work has shown that given estimates of the variance of the {\lambda}-return, learning systems can be constructed to (1) mitigate risk in action selection, and (2) automatically adapt the parameters of the learning process itself to improve performance. Unfortunately, existing methods for estimating the variance of the {\lambda}-return are complex and not well understood empirically. We contribute a method for estimating the variance of the {\lambda}-return directly using policy evaluation methods from reinforcement learning. Our approach is significantly simpler than prior methods that independently estimate the second moment of the {\lambda}-return. Empirically our new approach behaves at least as well as existing approaches, but is generally more robust.