Goto

Collaborating Authors

 Moore, Andrew W.


Generalization in Reinforcement Learning: Safely Approximating the Value Function

Neural Information Processing Systems

Reinforcement learning-the problem of getting an agent to learn to act from sparse, delayed rewards-has been advanced by techniques based on dynamic programming (DP). These algorithms compute a value function which gives, for each state, the minimum possible long-term cost commencing in that state. For the high-dimensional and continuous state spaces characteristic of real-world control tasks, a discrete representation of the value function is intractable; some form of generalization is required. A natural way to incorporate generalization into DP is to use a function approximator, rather than a lookup table, to represent the value function. This approach, which dates back to uses of Legendre polynomials in DP [Bellman et al., 19631, has recently worked well on several dynamic control problems [Mahadevan and Connell, 1990, Lin, 1993] and succeeded spectacularly on the game of backgammon [Tesauro, 1992, Boyan, 1992]. On the other hand, many sensible implementations have been less successful [Bradtke, 1993, Schraudolph et al., 1994]. Indeed, given the well-established success 370 Justin Boyan, Andrew W. Moore


The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-Spaces

Neural Information Processing Systems

Parti-game is a new algorithm for learning from delayed rewards in high dimensional real-valued state-spaces. In high dimensions it is essential that learning does not explore or plan over state space uniformly. Part i-game maintains a decision-tree partitioning of state-space and applies game-theory and computational geometry techniques to efficiently and reactively concentrate high resolution only on critical areas. Many simulated problems have been tested, ranging from 2-dimensional to 9-dimensional state-spaces, including mazes, path planning, nonlinear dynamics, and uncurling snake robots in restricted spaces. In all cases, a good solution is found in less than twenty trials and a few minutes. 1 REINFORCEMENT LEARNING Reinforcement learning [Samuel, 1959, Sutton, 1984, Watkins, 1989, Barto et al., 1991] is a promising method for control systems to program and improve themselves.


The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-Spaces

Neural Information Processing Systems

Andrew W. Moore School of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Abstract Parti-game is a new algorithm for learning from delayed rewards in high dimensional real-valued state-spaces. In high dimensions it is essential that learning does not explore or plan over state space uniformly. Parti-game maintains a decision-tree partitioning of state-space and applies game-theory and computational geometry techniquesto efficiently and reactively concentrate high resolution onlyon critical areas. Many simulated problems have been tested, ranging from 2-dimensional to 9-dimensional state-spaces, including mazes, path planning, nonlinear dynamics, and uncurling snakerobots in restricted spaces. In all cases, a good solution is found in less than twenty trials and a few minutes. 1 REINFORCEMENT LEARNING Reinforcement learning [Samuel, 1959, Sutton, 1984, Watkins, 1989, Barto et al., 1991] is a promising method for control systems to program and improve themselves.


Memory-Based Methods for Regression and Classification

Neural Information Processing Systems

Moore School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Memory-based learning methods operate by storing all (or most) of the training data and deferring analysis of that data until "run time" (i.e., when a query is presented and a decision or prediction must be made). When a query is received, these methods generally answer the query by retrieving and analyzing a small subset of the training data-namely, data in the immediate neighborhood of the query point. In short, memory-based methods are "lazy" (they wait until the query) and "local" (they use only a local neighborhood). The purpose of this workshop was to review the state-of-the-art in memory-based methods and to understand their relationship to "eager" and "global" learning algorithms such as batch backpropagation. There are two essential components to any memory-based algorithm: the method for defining the "local neighborhood" and the learning method that is applied to the training examples in the local neighborhood.




Memory-Based Methods for Regression and Classification

Neural Information Processing Systems

Memory-based learning methods operate by storing all (or most) of the training data and deferring analysis of that data until "run time" (i.e., when a query is presented and a decision or prediction must be made). When a query is received, these methods generally answer the query by retrieving and analyzing a small subset of the training data-namely, data in the immediate neighborhood of the query point. In short, memory-based methods are "lazy" (they wait until the query) and "local" (they use only a local neighborhood). The purpose of this workshop was to review the state-of-the-art in memory-based methods and to understand their relationship to "eager" and "global" learning algorithms such as batch backpropagation. There are two essential components to any memory-based algorithm: the method for defining the "local neighborhood" and the learning method that is applied to the training examples in the local neighborhood.


The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-Spaces

Neural Information Processing Systems

Parti-game is a new algorithm for learning from delayed rewards in high dimensional real-valued state-spaces. In high dimensions it is essential that learning does not explore or plan over state space uniformly. Part i-game maintains a decision-tree partitioning of state-space and applies game-theory and computational geometry techniques to efficiently and reactively concentrate high resolution only on critical areas. Many simulated problems have been tested, ranging from 2-dimensional to 9-dimensional state-spaces, including mazes, path planning, nonlinear dynamics, and uncurling snake robots in restricted spaces. In all cases, a good solution is found in less than twenty trials and a few minutes. 1 REINFORCEMENT LEARNING Reinforcement learning [Samuel, 1959, Sutton, 1984, Watkins, 1989, Barto et al., 1991] is a promising method for control systems to program and improve themselves.


Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

Neural Information Processing Systems

We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classicalmethods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamicprogramming sweeps and to guide the exploration of statespace.


Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

Neural Information Processing Systems

We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of statespace. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems. It successfully solves large state-space real time problems with which other methods have difficulty.