Goto

Collaborating Authors

 Country


Rate-coded Restricted Boltzmann Machines for Face Recognition

Neural Information Processing Systems

We describe a neurally-inspired, unsupervised learning algorithm that builds a nonlinear generative model for pairs of face images from the same individual. Individuals are then recognized by finding the highest relative probability pair among all pairs that consist of a test image and an image whose identity is known. Our method compares favorably with other methods in the literature. The generative model consists of a single layer of rate-coded, nonlinear feature detectors and it has the property that, given a data vector, the true posterior probability distribution over the feature detector activities can be inferred rapidly without iteration or approximation. The weights of the feature detectors are learned by comparing thecorrelations of pixel intensities and feature activations in two phases: When the network is observing real data and when it is observing reconstructions of real data generated from the feature activations.


Redundancy and Dimensionality Reduction in Sparse-Distributed Representations of Natural Objects in Terms of Their Local Features

Neural Information Processing Systems

Low-dimensional representations are key to solving problems in highlevel vision,such as face compression and recognition. Factorial coding strategies for reducing the redundancy present in natural images on the basis of their second-order statistics have been successful in accounting forboth psychophysical and neurophysiological properties of early vision. Class-specific representations are presumably formed later, at the higher-level stages of cortical processing. Here we show that when retinotopic factorial codes are derived for ensembles of natural objects, such as human faces, not only redundancy, but also dimensionality is reduced. Wealso show that objects are built from parts in a non-Gaussian fashion which allows these local-feature codes to have dimensionalities that are substantially lower than the respective Nyquist sampling rates.


Learning and Tracking Cyclic Human Motion

Neural Information Processing Systems

We estimate a statistical model of typical activities from a large set of 3D periodic human motion data by segmenting these data automatically into "cycles". Then the mean and the principal componentsof the cycles are computed using a new algorithm that accounts for missing information and enforces smooth transitions betweencycles. The learned temporal model provides a prior probability distribution over human motions that can be used in a Bayesian framework for tracking human subjects in complex monocular video sequences and recovering their 3D motion. 1 Introduction The modeling and tracking of human motion in video is important for problems as varied as animation, video database search, sports medicine, and human-computer interaction. Technically, the human body can be approximated by a collection of articulated limbs and its motion can be thought of as a collection of time-series describing the joint angles as they evolve over time. A key challenge in modeling these joint angles involves decomposing the time-series into suitable temporal primitives.


Learning Sparse Image Codes using a Wavelet Pyramid Architecture

Neural Information Processing Systems

We show how a wavelet basis may be adapted to best represent natural images in terms of sparse coefficients. The wavelet basis, which may be either complete or overcomplete, is specified by a small number of spatial functions which are repeated across space and combined in a recursive fashion so as to be self-similar across scale. These functions are adapted to minimize the estimated code length under a model that assumes images are composed of a linear superposition of sparse, independent components. When adapted to natural images, the wavelet bases take on different orientations and they evenly tile the orientation domain, in stark contrast to the standard, non-oriented wavelet bases used in image compression. When the basis set is allowed to be overcomplete, it also yields higher coding efficiency than standard wavelet bases. 1 Introduction The general problem we address here is that of learning efficient codes for representing naturalimages.


Partially Observable SDE Models for Image Sequence Recognition Tasks

Neural Information Processing Systems

This paper explores a framework for recognition of image sequences using partially observable stochastic differential equation (SDE) models. Monte-Carlo importance sampling techniques are used for efficient estimation of sequence likelihoods and sequence likelihood gradients. Once the network dynamics are learned, we apply the SDE models to sequence recognition tasks in a manner similar to the way Hidden Markov models (HMMs) are commonly applied. The potential advantage of SDEs over HMMS is the use of continuous statedynamics. We present encouraging results for a video sequence recognition task in which SDE models provided excellent performance when compared to hidden Markov models. 1 Introduction This paper explores a framework for recognition of image sequences using partially observable stochastic differential equations (SDEs). In particular we use SDE models oflow-power nonlinear RC circuits with a significant thermal noise component. We call them diffusion networks. A diffusion network consists of a set of n nodes coupled via a vector of adaptive impedance parameters ' which are tuned to optimize thenetwork's behavior.


Learning Segmentation by Random Walks

Neural Information Processing Systems

We present a new view of image segmentation by pairwise similarities. Weinterpret the similarities as edge flows in a Markov random walk and study the eigenvalues and eigenvectors of the walk's transition matrix. This interpretation shows that spectral methods for clustering and segmentation have a probabilistic foundation. Inparticular, we prove that the Normalized Cut method arises naturally from our framework. Finally, the framework provides aprincipled method for learning the similarity function as a combination of features.


Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes

Neural Information Processing Systems

The human visual system encodes the chromatic signals conveyed by the three types of retinal cone photoreceptors in an opponent fashion. This color opponency has been shown to constitute an efficient encoding by spectral decorrelation of the receptor signals. We analyze the spatial and chromatic structure of natural scenes by decomposing the spectral images into a set of linear basis functions such that they constitute a representation with minimal redundancy. Independentcomponent analysis finds the basis functions that transforms the spatiochromatic data such that the outputs (activations) are statistically as independent as possible, i.e. least redundant. The resulting basis functions show strong opponency along an achromatic direction (luminance edges), along a blueyellow direction,and along a red-blue direction.


Keeping Flexible Active Contours on Track using Metropolis Updates

Neural Information Processing Systems

Condensation, a form of likelihood-weighted particle filtering, has been successfully used to infer the shapes of highly constrained "active" contours invideo sequences. However, when the contours are highly flexible (e.g. for tracking fingers of a hand), a computationally burdensome number ofparticles is needed to successfully approximate the contour distribution. Weshow how the Metropolis algorithm can be used to update a particle set representing a distribution over contours at each frame in a video sequence. We compare this method to condensation using a video sequence that requires highly flexible contours, and show that the new algorithm performs dramatically better that the condensation algorithm. We discuss the incorporation of this method into the "active contour" framework where a shape-subspace is used constrain shape variation.


Feature Correspondence: A Markov Chain Monte Carlo Approach

Neural Information Processing Systems

When trying to recover 3D structure from a set of images, the most difficult problem is establishing the correspondence between the measurements. Most existing approaches assume that features can be tracked across frames, whereas methods that exploit rigidity constraints to facilitate matching do so only under restricted camera motion.In this paper we propose a Bayesian approach that avoids the brittleness associated with singling out one "best" correspondence, andinstead consider the distribution over all possible correspondences. We treat both a fully Bayesian approach that yields a posterior distribution, and a MAP approach that makes use of EM to maximize this posterior. We show how Markov chain Monte Carlo methods can be used to implement these techniques in practice, and present experimental results on real data.


The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

Neural Information Processing Systems

Our focus, however, is on the discovery of scene statistics which are useful for solving visual inference problems. For example, in related work [5] we have analyzed the statistics of filter responses on and off edges and hence derived effective edge detectors.