This paper summarizes research on a new emerging framework for learning to plan using the Markov decision process model (MDP). In this paradigm, two approaches to learning to plan have traditionally been studied: the indirect model-based approach infers the state transition matrix and reward function from samples, and then solves the Bellman equation to find the optimal (action) value function; the direct model-free approach, most notably Q-learning, estimates the action value function directly. This paper describes a new harmonic analysis framework for planning based on estimating a diffusion model that captures information flow on a graph (discrete state space) or a manifold (continuous state space) using the Laplace heat equation. Diffusion models are significantly easier to learn than transition models, and yet provide similar speedups in performance over model-free methods. Two methods for constructing novel plan representations from diffusion models are described: Fourier methods diagonalize a symmetric diffusion operator called the Laplacian; Wavelet methods dilate unit basis functions progressively using powers of the diffusion operator. A new variant of policy iteration - called representation policy iteration - is described consisting of an outer loop that estimates new basis functions by diagonalization or dilation, and an inner loop that learns the best policy representable within the linear span of the current basis functions. Results from continuous and discrete MDPs are provided to illustrate the new approach.
N) operations, where n is the number of distinct eigenvalues, and T op is the complexity of multiplying T by a vector. This approach is based on diagonalizing the restriction of the operator to the Krylov space spanned by the operator and a projected function. Even further savings can be accrued by constructing an approximate Lanczos tridiagonal representation of the Krylov-space restricted operator. A key novelty of this paper is the use of Krylov-subspace modulated Lanczos acceleration for multi-resolution wavelet analysis. A challenging problem of learning to control a robot arm is used to test the proposed approach.
Harmonic analysis includes Fourier analysis, where new eigenvector representations are constructed by diagonalization of operators, and wavelet analysis, where new representations are constructed by dilation. The approach is presented specifically in the context of Markov decision processes (MDPs), a widely studied model of planning under uncertainty, although the approach is applicable more broadly to other areas of AI as well. This paper describes a novel harmonic analysis framework for planning based on estimating a diffusion model that models flow of information on a graph (discrete state space) or a manifold (continuous state space) using a discrete form of the Laplace heat equation. Two methods for constructing novel plan representations from diffusion models are described: Fourier methods diagonalize a symmetric diffusion operator called the Laplacian; wavelet methods dilate unit basis functions progressively using powers of the diffusion operator. A new planning framework called Representation Policy Iteration (RPI) is described consisting of an outer loop that estimates new basis functions by diagonalization or dilation, and an inner loop that learns the best policy representable within the linear span of the current basis functions. We demonstrate the flexibility of the approach, which allows basis functions to be adapted to a particular task or reward function, and the hierarchical temporally extended nature of actions.
Data sets are often modeled as point clouds in $R^D$, for $D$ large. It is often assumed that the data has some interesting low-dimensional structure, for example that of a $d$-dimensional manifold $M$, with $d$ much smaller than $D$. When $M$ is simply a linear subspace, one may exploit this assumption for encoding efficiently the data by projecting onto a dictionary of $d$ vectors in $R^D$ (for example found by SVD), at a cost $(n+D)d$ for $n$ data points. When $M$ is nonlinear, there are no "explicit" constructions of dictionaries that achieve a similar efficiency: typically one uses either random dictionaries, or dictionaries obtained by black-box optimization. In this paper we construct data-dependent multi-scale dictionaries that aim at efficient encoding and manipulating of the data. Their construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa. In addition, data points are guaranteed to have a sparse representation in terms of the dictionary. We think of dictionaries as the analogue of wavelets, but for approximating point clouds rather than functions.
We investigate the problem of automatically constructing efficient representations orbasis functions for approximating value functions based on analyzing the structure and topology of the state space. In particular, twonovel approaches to value function approximation are explored based on automatically constructing basis functions on state spaces that can be represented as graphs or manifolds: one approach uses the eigenfunctions ofthe Laplacian, in effect performing a global Fourier analysis on the graph; the second approach is based on diffusion wavelets, which generalize classical wavelets to graphs using multiscale dilations induced by powers of a diffusion operator or random walk on the graph. Together, these approaches form the foundation of a new generation of methods for solving large Markov decision processes, in which the underlying representation andpolicies are simultaneously learned.