Asia
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
Williams, John K., Singh, Satinder P.
Partially Observable Markov Decision Processes (pO "MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no longer be effective, and memory-based methods which remove partial observability via state-estimation are notoriously expensive. An alternative approach is to seek a stochastic memoryless policy which for each observation of the environment prescribes a probability distribution over available actions that maximizes the average reward per timestep. A reinforcement learning algorithm which learns a locally optimal stochastic memoryless policy has been proposed by Jaakkola, Singh and Jordan, but not empirically verified. We present a variation of this algorithm, discuss its implementation, and demonstrate its viability using four test problems.
On the Optimality of Incremental Neural Network Algorithms
We study the approximation of functions by two-layer feedforward neural networks, focusing on incremental algorithms which greedily add units, estimating single unit parameters at each stage. As opposed to standard algorithms for fixed architectures, the optimization at each stage is performed over a small number of parameters, mitigating many of the difficult numerical problems inherent in high-dimensional nonlinear optimization. We establish upper bounds on the error incurred by the algorithm, when approximating functions from the Sobolev class, thereby extending previous results which only provided rates of convergence for functions in certain convex hulls of functional spaces. By comparing our results to recently derived lower bounds, we show that the greedy algorithms are nearly optimal. Combined with estimation error results for greedy algorithms, a strong case can be made for this type of approach.
Unsupervised and Supervised Clustering: The Mutual Information between Parameters and Observations
Herschkowitz, Didier, Nadal, Jean-Pierre
Recent works in parameter estimation and neural coding have demonstrated that optimal performance are related to the mutual information between parameters and data. We consider the mutual information in the case where the dependency in the parameter (a vector 8) of the conditional p.d.f. of each observation (a vector
Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks
Bartlett, Peter L., Maiorov, Vitaly, Meir, Ron
We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC dimension grows as W log W, where W is the number of parameters in the network. The VC dimension is an important measure of the complexity of a class of binaryvalued functions, since it characterizes the amount of data required for learning in the PAC setting (see [BEHW89, Vap82]). In this paper, we establish upper and lower bounds on the VC dimension of a specific class of multi-layered feedforward neural networks. Let F be the class of binary-valued functions computed by a feed forward neural network with W weights and k computational (non-input) units, each with a piecewise polynomial activation function.
A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
Suematsu, Nobuo, Hayashi, Akira
We have proved that the model learned by BLHT converges to the optimal model in given hypothesis space, 1{, which provides the most accurate predictions of percepts and rewards, given short-term memory. We believe this fact provides a solid basis for BLHT, and BLHT can be compared favorably with other methods using short-term memory.
Blind Separation of Filtered Sources Using State-Space Approach
Zhang, Liqing, Cichocki, Andrzej
In this paper we present a novel approach to multichannel blind separation/generalized deconvolution, assuming that both mixing and demixing models are described by stable linear state-space systems. Based on the minimization of Kullback-Leibler Divergence, we develop a novel learning algorithm to train the matrices in the output equation. To estimate the state of the demixing model, we introduce a new concept, called hidden innovation, to numerically implement the Kalman filter. Computer simulations are given to show the validity and high effectiveness of the state-space approach. The blind source separation problem is to recover independent sources from sensor outputs without assuming any priori knowledge of the original signals besides certain statistic features.
A Randomized Algorithm for Pairwise Clustering
Gdalyahu, Yoram, Weinshall, Daphna, Werman, Michael
We present a stochastic clustering algorithm based on pairwise similarity of datapoints. Our method extends existing deterministic methods, including agglomerative algorithms, min-cut graph algorithms, and connected components. Thus it provides a common framework for all these methods. Our graph-based method differs from existing stochastic methods which are based on analogy to physical systems. The stochastic nature of our method makes it more robust against noise, including accidental edges and small spurious clusters. We demonstrate the superiority of our algorithm using an example with 3 spiraling bands and a lot of noise. 1 Introduction Clustering algorithms can be divided into two categories: those that require a vectorial representation of the data, and those which use only pairwise representation. In the former case, every data item must be represented as a vector in a real normed space, while in the second case only pairwise relations of similarity or dissimilarity are used.
Risk Sensitive Reinforcement Learning
Neuneier, Ralph, Mihatsch, Oliver
A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuous Gaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binary distribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing data with a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold. In particular for PCA, this manifold is described by a linear hyperplane whose characteristic directions are given by the eigenvectors of the correlation matrix with the largest eigenvalues. The success of PCA and closely related techniques such as Factor Analysis (FA) and PCA mixtures clearly indicate that much real world data exhibit the low dimensional manifold structure assumed by these models [2, 3]. However, the linear manifold structure of PCA is not appropriate for data with binary valued variables.