Goto

Collaborating Authors

 Europe



Sparse Kernel Principal Component Analysis

Neural Information Processing Systems

'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisationof the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation intoa feature space wherein standard PCA is performed. Unfortunately, thetechnique is not'sparse', since the components thus obtained are expressed in terms of kernels associated with every trainingvector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of example vectors,using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness. 1 Introduction Principal component analysis (PCA) is a well-established technique for dimensionality reduction,and examples of its many applications include data compression, image processing, visualisation, exploratory data analysis, pattern recognition and time series prediction.


A Mathematical Programming Approach to the Kernel Fisher Algorithm

Neural Information Processing Systems

We investigate a new kernel-based classifier: the Kernel Fisher Discriminant (KFD).A mathematical programming formulation based on the observation thatKFD maximizes the average margin permits an interesting modification of the original KFD algorithm yielding the sparse KFD. We find that both, KFD and the proposed sparse KFD, can be understood in an unifying probabilistic context. Furthermore, we show connections to Support Vector Machines and Relevance Vector Machines. From this understanding, we are able to outline an interesting kernel-regression technique based upon the KFD algorithm.


Computing with Finite and Infinite Networks

Neural Information Processing Systems

Using statistical mechanics results, I calculate learning curves (average generalization error) for Gaussian processes (GPs) and Bayesian neural networks (NNs) used for regression. Applying the results to learning a teacher defined by a two-layer network, I can directly compare GP and Bayesian NN learning.


Occam's Razor

Neural Information Processing Systems

The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We analyze the complexity of functions for some linear in the parameter models that are equivalent to Gaussian Processes, and always find Occam's Razor at work. 1 Introduction Occam's Razor is a well known principle of "parsimony of explanations" which is influential inscientific thinking in general and in problems of statistical inference in particular. In this paper we review its consequences for Bayesian statistical models, where its behaviour can be easily demonstrated and quantified.


Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations

Neural Information Processing Systems

Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves for Gaussian processregression models. The approximation works well in the large sample size limit and for arbitrary dimensionality of the input space. We explain how the approximation can be systematically improvedand argue that similar techniques can be applied to general likelihood models. 1 Introduction Gaussian process (GP) models have gained considerable interest in the Neural Computation Community(see e.g.[I, 2, 3, 4]) in recent years. Being nonparametric models by construction their theoretical understanding seems to be less well developed comparedto simpler parametric models like neural networks. We are especially interested in developing theoretical approaches which will at least give good approximations togeneralization errors when the number of training data is sufficiently large. In this paper we present a step in this direction which is based on a statistical mechanics approach.In contrast to most previous applications of statistical mechanics to learning theory we are not limited to the so called "thermodynamic" limit which would require a high dimensional input space.


Foundations for a Circuit Complexity Theory of Sensory Processing

Neural Information Processing Systems

We introduce total wire length as salient complexity measure for an analysis ofthe circuit complexity of sensory processing in biological neural systems and neuromorphic engineering. This new complexity measure is applied to a set of basic computational problems that apparently need to be solved by circuits for translation-and scale-invariant sensory processing. Weexhibit new circuit design strategies for these new benchmark functions that can be implemented within realistic complexity bounds, in particular with linear or almost linear total wire length. 1 Introduction Circuit complexity theory is a classical area of theoretical computer science, that provides estimates for the complexity of circuits for computing specific benchmark functions, such as binary addition, multiplication and sorting (see, e.g.


Finding the Key to a Synapse

Neural Information Processing Systems

Experimental data have shown that synapses are heterogeneous: different synapses respond with different sequences of amplitudes of postsynaptic responses to the same spike train. Neither the role of synaptic dynamics itself nor the role of the heterogeneity of synaptic dynamics for computations inneural circuits is well understood. We present in this article methods that make it feasible to compute for a given synapse with known synaptic parameters the spike train that is optimally fitted to the synapse, for example in the sense that it produces the largest sum of postsynaptic responses.To our surprise we find that most of these optimally fitted spike trains match common firing patterns of specific types of neurons that are discussed in the literature. 1 Introduction A large number of experimental studies have shown that biological synapses have an inherent dynamics,which controls how the pattern of amplitudes of postsynaptic responses depends on the temporal pattern of the incoming spike train. Various quantitative models have been proposed involving a small number of characteristic parameters, that allow us to predict the response of a given synapse to a given spike train once proper values for these characteristic synaptic parameters have been found. The analysis of this article is based on the model of [1], where three parameters U, F, D control the dynamics of a synapse and a fourth parameter A - which corresponds to the synaptic "weight" in static synapse models - scales the absolute sizes of the postsynaptic responses. The resulting model predicts theamplitude Ak for the kth spike in a spike train with interspike intervals (lSI's) .60


Hippocampally-Dependent Consolidation in a Hierarchical Model of Neocortex

Neural Information Processing Systems

In memory consolidation, declarative memories which initially require the hippocampus for their recall, ultimately become independent of it. Consolidation has been the focus of numerous experimental and qualitative modelingstudies, but only little quantitative exploration. We present a consolidation model in which hierarchical connections in the cortex, that initially instantiate purely semantic information acquired through probabilistic unsupervised learning, come to instantiate episodic information aswell. The hippocampus is responsible for helping complete partial input patterns before consolidation is complete, while also training thecortex to perform appropriate completion by itself.


Toward Conversational Human-Computer Interaction

AI Magazine

The belief that humans will be able to interact with computers in conversational speech has long been a favorite subject in science fiction, reflecting the persistent belief that spoken dialogue would be the most natural and powerful user interface to computers. With recent improvements in computer technology and in speech and language processing, such systems are starting to appear feasible. There are significant technical problems that still need to be solved before speech-driven interfaces become truly conversational. This article describes the results of a 10-year effort building robust spoken dialogue systems at the University of Rochester.