Industry
Feature Selection in Mixture-Based Clustering
Law, Martin H., Jain, Anil K., Figueiredo, Mário
There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difficult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the first one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami's mutual-informationbased feature relevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classified as a "wrapper", since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.
Reconstructing Stimulus-Driven Neural Networks from Spike Times
We present a method to distinguish direct connections between two neurons from common input originating from other, unmeasured neurons. The distinction is computed from the spike times of the two neurons in response to a white noise stimulus. Although the method is based on a highly idealized linear-nonlinear approximation of neural response, we demonstrate via simulation that the approach can work with a more realistic, integrate-and-fire neuron model. We propose that the approach exemplified by this analysis may yield viable tools for reconstructing stimulus-driven neural networks from data gathered in neurophysiology experiments.
Artefactual Structure from Least-Squares Multidimensional Scaling
Hughes, Nicholas P., Lowe, David
We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SS
A Bilinear Model for Sparse Coding
Grimes, David B., Rao, Rajesh P. N.
Recent algorithms for sparse coding and independent component analysis (ICA) have demonstrated how localized features can be learned from natural images. However, these approaches do not take image transformations into account. As a result, they produce image codes that are redundant because the same feature is learned at multiple locations. We describe an algorithm for sparse coding based on a bilinear generative model of images. By explicitly modeling the interaction between image features and their transformations, the bilinear approach helps reduce redundancy in the image code and provides a basis for transformationinvariant vision.
Dynamical Constraints on Computing with Spike Timing in the Cortex
Banerjee, Arunava, Pouget, Alexandre
If the cortex uses spike timing to compute, the timing of the spikes must be robust to perturbations. Based on a recent framework that provides a simple criterion to determine whether a spike sequence produced by a generic network is sensitive to initial conditions, and numerical simulations of a variety of network architectures, we argue within the limits set by our model of the neuron, that it is unlikely that precise sequences of spike timings are used for computation under conditions typically found in the cortex. 1 Introduction
Approximate Inference and Protein-Folding
Side-chain prediction is an important subtask in the protein-folding problem. We show that finding a minimal energy side-chain configuration is equivalent to performing inference in an undirected graphical model. The graphical model is relatively sparse yet has many cycles. We used this equivalence to assess the performance of approximate inference algorithms in a real-world setting. Specifically we compared belief propagation (BP), generalized BP (GBP) and naive mean field (MF).
An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition
They are very well suited to handle discrete of continuous sequences of varying sizes. Moreover, an efficient training algorithm (EM) is available, as well as an efficient decoding algorithm (Viterbi), which provides the optimal sequence of states (and the corresponding sequence of high level events) associated with a given sequence of low-level data. On the other hand, multimodal information processing is currently a very challenging framework of applications including multimodal person authentication, multimodal speech recognition, multimodal event analyzers, etc. In that framework, the same sequence of events is represented not only by a single sequence of data but by a series of sequences of data, each of them coming eventually from a different modality: video streams with various viewpoints, audio stream(s), etc. One such task, which will be presented in this paper, is multimodal speech recognition using both a microphone and a camera recording a speaker simultaneously while he (she) speaks.
Effective Dimension and Generalization of Kernel Learning
We investigate the generalization performance of some learning problems in Hilbert function Spaces. We introduce a concept of scalesensitive effective data dimension, and show that it characterizes the convergence rate of the underlying learning problem. Using this concept, we can naturally extend results for parametric estimation problems in finite dimensional spaces to nonparametric kernel learning methods. We derive upper bounds on the generalization performance and show that the resulting convergent rates are optimal under various circumstances.
Convergence Properties of Some Spike-Triggered Analysis Techniques
All of our results are obtained in the setting of a (possibly multidimensional) linear-nonlinear (LN) cascade model for stimulus-driven neural activity. We start by giving exact rate of convergence results for the common spike-triggered average (STA) technique. Next, we analyze a spike-triggered covariance method, variants of which have been recently exploited successfully by Bialek, Simoncelli, and colleagues. These first two methods suffer from extraneous conditions on their convergence; therefore, we introduce an estimator for the LN model parameters which is designed to be consistent under general conditions. We provide an algorithm for the computation of this estimator and derive its rate of convergence. We close with a brief discussion of the efficiency of these estimators and an application to data recorded from the primary motor cortex of awake, behaving primates.
Kernel Dependency Estimation
Weston, Jason, Chapelle, Olivier, Vapnik, Vladimir, Elisseeff, André, Schölkopf, Bernhard
We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images. 1 Introduction In this article we consider the rather general learning problem of finding a dependency between inputs x E X and outputs y E Y given a training set (Xl,yl),...,(xm, Ym) E X x Y This includes conventional pattern recognition and regression estimation. It also encompasses more complex dependency estimation tasks, e.g mapping of a certain class of strings to a certain class of graphs (as in text parsing) or the mapping of text descriptions to images.