Country
Balancing Multiple Sources of Reward in Reinforcement Learning
For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar components. Examplesof such problems include agents with multiple goals and agents with multiple users. Creating a single reward value by combining themultiple components can throwaway vital information and can lead to incorrect solutions. We describe the multiple reward source problem and discuss the problems with applying traditional reinforcement learning.We then present an new algorithm for finding a solution and results on simulated environments.
Natural Sound Statistics and Divisive Normalization in the Auditory System
Schwartz, Odelia, Simoncelli, Eero P.
We explore the statistical properties of natural sound stimuli preprocessed witha bank of linear filters. The responses of such filters exhibit a striking form of statistical dependency, in which the response variance of each filter grows with the response amplitude of filters tuned for nearby frequencies. These dependencies may be substantially reduced usingan operation known as divisive normalization, in which the response of each filter is divided by a weighted sum of the rectified responsesof other filters. The weights may be chosen to maximize the independence of the normalized responses for an ensemble of natural sounds.We demonstrate that the resulting model accounts for nonlinearities inthe response characteristics of the auditory nerve, by comparing model simulations to electrophysiological recordings. In previous work (NIPS, 1998) we demonstrated that an analogous model derived from the statistics of natural images accounts for nonlinear properties of neurons in primary visual cortex.
Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
Wainwright, Martin J., Sudderth, Erik B., Willsky, Alan S.
We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees,it computes the conditional means with an efficiency comparable to or better than other techniques. Theerror covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree.In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative totrees, with only a minor increase in estimation complexity. 1 Introduction Graphical models are an invaluable tool for defining and manipulating probability distributions. In modeling stochastic processes with graphical models, two basic problems arise: (i) specifying a class of graphs with which to model or approximate the process; and (ii) determining efficient techniques for statistical inference. At one extreme are tree-structured graphs: although they lead to highly efficient algorithms for estimation [1, 2], their modeling power is often limited.
Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
Malzahn, Dörthe, Opper, Manfred
Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves for Gaussian processregression models. The approximation works well in the large sample size limit and for arbitrary dimensionality of the input space. We explain how the approximation can be systematically improvedand argue that similar techniques can be applied to general likelihood models. 1 Introduction Gaussian process (GP) models have gained considerable interest in the Neural Computation Community(see e.g.[I, 2, 3, 4]) in recent years. Being nonparametric models by construction their theoretical understanding seems to be less well developed comparedto simpler parametric models like neural networks. We are especially interested in developing theoretical approaches which will at least give good approximations togeneralization errors when the number of training data is sufficiently large.
The Unscented Particle Filter
Merwe, Rudolph van der, Doucet, Arnaud, Freitas, Nando de, Wan, Eric A.
In this paper, we propose a new particle filter based on sequential importance sampling. The algorithm uses a bank of unscented filters toobtain the importance proposal distribution. This proposal has two very "nice" properties. Firstly, it makes efficient use of the latest available information and, secondly, it can have heavy tails. As a result, we find that the algorithm outperforms standard particlefiltering and other nonlinear filtering methods very substantially.
A Tighter Bound for Graphical Models
Leisink, Martijn A. R., Kappen, Hilbert J.
Theneurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.
Reinforcement Learning with Function Approximation Converges to a Region
Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation ofeither the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.For example, fitted value iteration can diverge even for Markov processes [2]; Q-Iearning with linear function approximators can diverge, even when the states are updated according to a fixed update policy [3]; and SARSA(O) can oscillate between multiple policies with different value functions [4].
From Mixtures of Mixtures to Adaptive Transform Coding
Archer, Cynthia, Leen, Todd K.
We establish a principled framework for adaptive transform coding. Transform coders are often constructed by concatenating an ad hoc choice of transform with suboptimal bit allocation and quantizer design. Instead, we start from a probabilistic latent variable model in the form of a mixture of constrained Gaussian mixtures. From this model we derive a transform coding algorithm, which is a constrained version of the generalized Lloyd algorithm for vector quantizer design. A byproduct of our derivation is the introduction of a new transform basis, which unlike other transforms (PCA, DCT, etc.) is explicitly optimized for coding.