Goto

Collaborating Authors

 Asia


Robust, Efficient, Globally-Optimized Reinforcement Learning with the Parti-Game Algorithm

Neural Information Processing Systems

Parti-game (Moore 1994a; Moore 1994b; Moore and Atkeson 1995) is a reinforcement learning (RL) algorithm that has a lot of promise in overcoming thecurse of dimensionality that can plague RL algorithms when applied to high-dimensional problems. In this paper we introduce modifications tothe algorithm that further improve its performance and robustness. In addition, while parti-game solutions can be improved locally by standard local path-improvement techniques, we introduce an add-on algorithm in the same spirit as parti-game that instead tries to improve solutions in a non-local manner. 1 INTRODUCTION Parti-game operates on goal problems by dynamically partitioning the space into hyperrectangular cellsof varying sizes, represented using a k-d tree data structure. It assumes the existence of a pre-specified local controller that can be commanded to proceed from the current state to a given state. The algorithm uses a game-theoretic approach to assign costs to cells based on past experiences using a minimax algorithm.


Convergence of the Wake-Sleep Algorithm

Neural Information Processing Systems

The WS (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. Buteven for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding ofthe WS algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the WS algorithm for the factor analysis model. We also show the condition for the convergence in general models.


Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes

Neural Information Processing Systems

Partially Observable Markov Decision Processes (pO"MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no longer be effective, and memory-based methods which remove partial observability via state-estimation are notoriously expensive. An alternative approach is to seek a stochastic memoryless policy which for each observation of the environment prescribes a probability distribution over available actions that maximizes the average reward per timestep. A reinforcement learning algorithm which learns a locally optimal stochastic memoryless policy has been proposed by Jaakkola, Singh and Jordan, but not empirically verified. We present a variation of this algorithm, discuss its implementation, and demonstrate its viability using four test problems.


Reinforcement Learning Based on On-Line EM Algorithm

Neural Information Processing Systems

The actor and the critic are approximated by Normalized Gaussian Networks (NGnet), which are networks of local linear regression units. The NGnet is trained by the online EM algorithm proposed in our previous paper.We apply our RL method to the task of swinging-up and stabilizing a single pendulum and the task of balancing a double pendulumnear the upright position. The experimental results show that our RL method can be applied to optimal control problems havingcontinuous state/action spaces and that the method achieves good control with a small number of trial-and-errors. 1 INTRODUCTION Reinforcement learning (RL) methods (Barto et al., 1990) have been successfully applied to various Markov decision problems having finite state/action spaces, such as the backgammon game (Tesauro, 1992) and a complex task in a dynamic environment (Lin,1992). On the other hand, applications to continuous state/action problems (Werbos, 1990; Doya, 1996; Sofge & White, 1992) are much more difficult than the finite state/action cases. Good function approximation methods and fast learning algorithms are crucial for successful applications.



Viewing Classifier Systems as Model Free Learning in POMDPs

Neural Information Processing Systems

Classifier systems are now viewed disappointing because of their problems suchas the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed ahybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.


Classification in Non-Metric Spaces

Neural Information Processing Systems

A key question in vision is how to represent our knowledge of previously encountered objects to classify new ones. The answer depends on how we determine the similarity of two objects. Similarity tells us how relevant each previously seen object is in determining the category to which a new object belongs.


Learning a Continuous Hidden Variable Model for Binary Data

Neural Information Processing Systems

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuousGaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binarydistribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing datawith a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold.


A Randomized Algorithm for Pairwise Clustering

Neural Information Processing Systems

We present a stochastic clustering algorithm based on pairwise similarity ofdatapoints. Our method extends existing deterministic methods, including agglomerative algorithms, min-cut graph algorithms, andconnected components. Thus it provides a common framework for all these methods. Our graph-based method differs from existing stochastic methods which are based on analogy to physical systems. The stochastic nature of our method makes it more robust against noise, including accidental edges and small spurious clusters. We demonstrate the superiority of our algorithm using an example with 3 spiraling bands and a lot of noise. 1 Introduction Clustering algorithms can be divided into two categories: those that require a vectorial representationof the data, and those which use only pairwise representation. In the former case, every data item must be represented as a vector in a real normed space, while in the second case only pairwise relations of similarity or dissimilarity areused.


A Theory of Mean Field Approximation

Neural Information Processing Systems

I present a theory of mean field approximation based on information geometry. Thistheory includes in a consistent way the naive mean field approximation, as well as the TAP approach and the linear response theorem instatistical physics, giving clear information-theoretic interpretations to them. 1 INTRODUCTION Many problems of neural networks, such as learning and pattern recognition, can be cast into a framework of statistical estimation problem. How difficult it is to solve a particular problem depends on a statistical model one employs in solving the problem. For Boltzmann machines[ 1] for example, it is computationally very hard to evaluate expectations of state variables from the model parameters. Mean field approximation[2], which is originated in statistical physics, has been frequently used in practical situations in order to circumvent this difficulty.