Goto

Collaborating Authors

 Country


Learning Informative Statistics: A Nonparametnic Approach

Neural Information Processing Systems

We discuss an information theoretic approach for categorizing and modeling dynamicprocesses. The approach can learn a compact and informative statistic which summarizes past states to predict future observations. Furthermore, the uncertainty of the prediction is characterized nonparametrically bya joint density over the learned statistic and present observation. We discuss the application of the technique to both noise driven dynamical systems and random processes sampled from a density which is conditioned on the past. In the first case we show results in which both the dynamics of random walk and the statistics of the driving noise are captured. In the second case we present results in which a summarizing statistic is learned on noisy random telegraph waves with differing dependencies onpast states. In both cases the algorithm yields a principled approach for discriminating processes with differing dynamics and/or dependencies. Themethod is grounded in ideas from information theory and nonparametric statistics.


Rules and Similarity in Concept Learning

Neural Information Processing Systems

This paper argues that two apparently distinct modes of generalizing concepts -abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework. Bayesexplains the specific workings of these two modes - which rules are abstracted, how similarity is measured - as well as why generalization shouldappear rule-or similarity-based in different situations. This analysis also suggests why the rules/similarity distinction, even if not computationally fundamental, may still be useful at the algorithmic level as part of a principled approximation to fully Bayesian learning.


Lower Bounds on the Complexity of Approximating Continuous Functions by Sigmoidal Neural Networks

Neural Information Processing Systems

This is one of the theoretical results most frequently cited to justify the use of sigmoidal neural networks in applications. By this statement one refers to the fact that sigmoidal neural networks have been shown to be able to approximate any continuous function arbitrarily well. Numerous results in the literature have established variants of this universal approximation property by considering distinct function classes to be approximated by network architectures using different types of neural activation functions with respect to various approximation criteria, see for instance [1, 2, 3, 5, 6, 11, 12, 14, 15].


Reinforcement Learning Using Approximate Belief States

Neural Information Processing Systems

The problem of developing good policies for partially observable Markov decision problems (POMDPs) remains one of the most challenging areas ofresearch in stochastic planning. One line of research in this area involves the use of reinforcement learning with belief states, probability distributionsover the underlying model states. This is a promising methodfor small problems, but its application is limited by the intractability ofcomputing or representing a full belief state for large problems. Recent work shows that, in many settings, we can maintain an approximate belief state, which is fairly close to the true belief state. In particular, great success has been shown with approximate belief states that marginalize out correlations between state variables. In this paper, we investigate two methods of full belief state reinforcement learning and one novel method for reinforcement learning using factored approximate belief states. We compare the performance of these algorithms on several well-known problem from the literature. Our results demonstrate the importance ofapproximate belief state representations for large problems.


Neural System Model of Human Sound Localization

Neural Information Processing Systems

This paper examines the role of biological constraints in the human auditory localizationprocess. A psychophysical and neural system modeling approach was undertaken in which performance comparisons between competing models and a human subject explore the relevant biologically plausible"realism constraints". The directional acoustical cues, upon which sound localization is based, were derived from the human subject's head-related transfer functions (HRTFs). Sound stimuli were generated by convolving bandpass noise with the HRTFs and were presented toboth the subject and the model. The input stimuli to the model was processed using the Auditory Image Model of cochlear processing.



Online Independent Component Analysis with Local Learning Rate Adaptation

Neural Information Processing Systems

Stochastic meta-descent (SMD) is a new technique for online adaptation oflocal learning rates in arbitrary twice-differentiable systems. Like matrix momentum it uses full second-order information while retaining O(n) computational complexity by exploiting the efficient computation of Hessian-vector products. Here we apply SMD to independent component analysis, and employ the resulting algorithmfor the blind separation of time-varying mixtures. By matching individual learning rates to the rate of change in each source signal's mixture coefficients, our technique is capable of simultaneously trackingsources that move at very different, a priori unknown speeds. 1 Introduction Independent component analysis (ICA) methods are typically run in batch mode in order to keep the stochasticity of the empirical gradient low. Often this is combined with a global learning rate annealing scheme that negotiates the tradeoff between fast convergence and good asymptotic performance.


Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Neural Information Processing Systems

Visualization of input data and feature selection are intimately related. A good feature selection algorithm can identify meaningful coordinate projections for low dimensional data visualization. Conversely, a good visualization technique can suggest meaningfulfeatures to include in a model. Input variable selection is the most important step in the model selection process. Given a target variable, a set of input variables can be selected as explanatory variables by some prior knowledge.


A MCMC Approach to Hierarchical Mixture Modelling

Neural Information Processing Systems

There are many hierarchical clustering algorithms available, but these lack a firm statistical basis. Here we set up a hierarchical probabilistic mixture model, where data is generated in a hierarchical tree-structured manner. Markov chain Monte Carlo (MCMC) methods are demonstrated which can be used to sample from the posterior distribution over trees containing variable numbers of hidden units.


Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers

Neural Information Processing Systems

We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes.The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the possibility touse sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models. 1 Introduction Bayesian techniques have been widely and successfully used in the neural networks and statistics community and are appealing because of their conceptual simplicity, generality and consistency with which they solve learning problems. In this paper we present a new method for applying the Bayesian methodology to Support Vector machines. We will briefly review Gaussian Process and Support Vector classification in this section and clarify their relationship by pointing out the common roots. Although we focus on classification here, it is straightforward to apply the methods to regression problems as well. In section 2 we introduce our algorithm and show relations to existing methods. Finally, we present experimental results in section 3 and close with a discussion in section 4. Let X be a measure space (e.g.