Country
Redundancy and Dimensionality Reduction in Sparse-Distributed Representations of Natural Objects in Terms of Their Local Features
Low-dimensional representations are key to solving problems in highlevel vision, such as face compression and recognition. Factorial coding strategies for reducing the redundancy present in natural images on the basis of their second-order statistics have been successful in accounting for both psychophysical and neurophysiological properties of early vision. Class-specific representations are presumably formed later, at the higher-level stages of cortical processing. Here we show that when retinotopic factorial codes are derived for ensembles of natural objects, such as human faces, not only redundancy, but also dimensionality is reduced. We also show that objects are built from parts in a non-Gaussian fashion which allows these local-feature codes to have dimensionalities that are substantially lower than the respective Nyquist sampling rates.
Learning Sparse Image Codes using a Wavelet Pyramid Architecture
Olshausen, Bruno A., Sallee, Phil, Lewicki, Michael S.
We show how a wavelet basis may be adapted to best represent natural images in terms of sparse coefficients. The wavelet basis, which may be either complete or overcomplete, is specified by a small number of spatial functions which are repeated across space and combined in a recursive fashion so as to be self-similar across scale. These functions are adapted to minimize the estimated code length under a model that assumes images are composed of a linear superposition of sparse, independent components. When adapted to natural images, the wavelet bases take on different orientations and they evenly tile the orientation domain, in stark contrast to the standard, non-oriented wavelet bases used in image compression. When the basis set is allowed to be overcomplete, it also yields higher coding efficiency than standard wavelet bases. 1 Introduction The general problem we address here is that of learning efficient codes for representing natural images.
Partially Observable SDE Models for Image Sequence Recognition Tasks
Movellan, Javier R., Mineiro, Paul, Williams, Ruth J.
This paper explores a framework for recognition of image sequences using partially observable stochastic differential equation (SDE) models. Monte-Carlo importance sampling techniques are used for efficient estimation of sequence likelihoods and sequence likelihood gradients. Once the network dynamics are learned, we apply the SDE models to sequence recognition tasks in a manner similar to the way Hidden Markov models (HMMs) are commonly applied. The potential advantage of SDEs over HMMS is the use of continuous state dynamics. We present encouraging results for a video sequence recognition task in which SDE models provided excellent performance when compared to hidden Markov models. 1 Introduction This paper explores a framework for recognition of image sequences using partially observable stochastic differential equations (SDEs). In particular we use SDE models of low-power nonlinear RC circuits with a significant thermal noise component. We call them diffusion networks. A diffusion network consists of a set of n nodes coupled via a vector of adaptive impedance parameters ' which are tuned to optimize the network's behavior.
Learning Segmentation by Random Walks
We present a new view of image segmentation by pairwise similarities. We interpret the similarities as edge flows in a Markov random walk and study the eigenvalues and eigenvectors of the walk's transition matrix. This interpretation shows that spectral methods for clustering and segmentation have a probabilistic foundation. In particular, we prove that the Normalized Cut method arises naturally from our framework. Finally, the framework provides a principled method for learning the similarity function as a combination of features.
Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes
Lee, Te-Won, Wachtler, Thomas, Sejnowski, Terrence J.
The human visual system encodes the chromatic signals conveyed by the three types of retinal cone photoreceptors in an opponent fashion. This color opponency has been shown to constitute an efficient encoding by spectral decorrelation of the receptor signals. We analyze the spatial and chromatic structure of natural scenes by decomposing the spectral images into a set of linear basis functions such that they constitute a representation with minimal redundancy. Independent component analysis finds the basis functions that transforms the spatiochromatic data such that the outputs (activations) are statistically as independent as possible, i.e. least redundant. The resulting basis functions show strong opponency along an achromatic direction (luminance edges), along a blueyellow direction, and along a red-blue direction.
Keeping Flexible Active Contours on Track using Metropolis Updates
Kristjansson, Trausti T., Frey, Brendan J.
Condensation, a form of likelihood-weighted particle filtering, has been successfully used to infer the shapes of highly constrained "active" contours in video sequences. However, when the contours are highly flexible (e.g. for tracking fingers of a hand), a computationally burdensome number of particles is needed to successfully approximate the contour distribution. We show how the Metropolis algorithm can be used to update a particle set representing a distribution over contours at each frame in a video sequence. We compare this method to condensation using a video sequence that requires highly flexible contours, and show that the new algorithm performs dramatically better that the condensation algorithm. We discuss the incorporation of this method into the "active contour" framework where a shape-subspace is used constrain shape variation.
Feature Correspondence: A Markov Chain Monte Carlo Approach
Dellaert, Frank, Seitz, Steven M., Thrun, Sebastian, Thorpe, Charles E.
When trying to recover 3D structure from a set of images, the most difficult problem is establishing the correspondence between the measurements. Most existing approaches assume that features can be tracked across frames, whereas methods that exploit rigidity constraints to facilitate matching do so only under restricted camera motion. In this paper we propose a Bayesian approach that avoids the brittleness associated with singling out one "best" correspondence, and instead consider the distribution over all possible correspondences. We treat both a fully Bayesian approach that yields a posterior distribution, and a MAP approach that makes use of EM to maximize this posterior. We show how Markov chain Monte Carlo methods can be used to implement these techniques in practice, and present experimental results on real data.
Shape Context: A New Descriptor for Shape Matching and Object Recognition
Belongie, Serge, Malik, Jitendra, Puzicha, Jan
We develop an approach to object recognition based on matching shapes and using a resulting measure of similarity in a nearest neighbor classifier. The key algorithmic problem here is that of finding pointwise correspondences between an image shape and a stored prototype shape. We introduce a new shape descriptor, the shape context, which makes this possible, using a simple and robust algorithm. We demonstrate that shape contexts greatly simplify recovery of correspondences between points of two given shapes. Once shapes are aligned, shape contexts are used to define a robust score for measuring shape similarity. We have used this score in a nearest-neighbor classifier for recognition of hand written digits as well as 3D objects, using exactly the same distance function.
Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition
Tchorz, Jürgen, Kleinschmidt, Michael, Kollmeier, Birger
For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequencies in higher stages of the mammalian auditory system. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppression is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speakerindependent digit recognition experiments and compared to noise suppression by Spectral Subtraction. 1 Introduction One of the major problems in automatic speech recognition (ASR) systems is their lack of robustness in noise, which severely degrades their usefulness in many practical applications. Several proposals have been made to increase the robustness of ASR systems, e.g. by model compensation or more noise-robust feature extraction [1, 2]. Another method to increase robustness of ASR systems is to suppress the background noise before feature extraction. Classical approaches for single-channel noise suppression are Spectral Subtraction [3] and related schemes, e.g.