Goto

Collaborating Authors

 Genre


Sex with Support Vector Machines

Neural Information Processing Systems

Ming-Hsuan Yang University of Illinois at Urbana-Champaign Urbana, IL 61801 USA mhyang avision.ai.uiuc.edu Abstract Nonlinear Support Vector Machines (SVMs) are investigated for visual sex classification with low resolution "thumbnail" faces (21-by-12 pixels) processed from 1,755 images from the FERET face database. The performance of SVMs is shown to be superior to traditional pattern classifiers (Linear, Quadratic, Fisher Linear Discriminant, Nearest-Neighbor)as well as more modern techniques such as Radial Basis Function (RBF) classifiers and large ensemble RBF networks. Furthermore, the SVM performance (3.4% error) is currently the best result reported in the open literature. 1 Introduction In recent years, SVMs have been successfully applied to various tasks in computational face-processing.These include face detection [14], face pose discrimination [12] and face recognition [16]. Although facial sex classification has attracted much attention in the psychological literature [1, 4, 8, 15], relatively few computatinal learning methods have been proposed.


Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes

Neural Information Processing Systems

The human visual system encodes the chromatic signals conveyed by the three types of retinal cone photoreceptors in an opponent fashion. This color opponency has been shown to constitute an efficient encoding by spectral decorrelation of the receptor signals. We analyze the spatial and chromatic structure of natural scenes by decomposing the spectral images into a set of linear basis functions such that they constitute a representation with minimal redundancy. Independentcomponent analysis finds the basis functions that transforms the spatiochromatic data such that the outputs (activations) are statistically as independent as possible, i.e. least redundant. The resulting basis functions show strong opponency along an achromatic direction (luminance edges), along a blueyellow direction,and along a red-blue direction.


The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

Neural Information Processing Systems

Our focus, however, is on the discovery of scene statistics which are useful for solving visual inference problems. For example, in related work [5] we have analyzed the statistics of filter responses on and off edges and hence derived effective edge detectors.


Active Learning for Parameter Estimation in Bayesian Networks

Neural Information Processing Systems

Bayesian networks are graphical representations of probability distributions. In virtually all of the work on learning these networks, the assumption is that we are presented with a data set consisting of randomly generated instances from the underlying distribution. In many situations, however, we also have the option of active learning, where we have the possibility of guiding the sampling process by querying for certain types of samples. This paper addresses the problem of estimating the parameters of Bayesian networks in an active learning setting. We provide a theoretical framework for this problem, and an algorithm that chooses which active learning queries to generate based on the model learned so far. We present experimental results showing that our active learning algorithm can significantly reduce the need for training data in many situations.


Constrained Independent Component Analysis

Neural Information Processing Systems

The paper presents a novel technique of constrained independent component analysis (CICA) to introduce constraints into the classical ICAand solve the constrained optimization problem by using Lagrange multiplier methods. This paper shows that CICA can be used to order the resulted independent components in a specific manner and normalize the demixing matrix in the signal separation procedure. It can systematically eliminate the ICA's indeterminacy on permutation and dilation. The experiments demonstrate the use of CICA in ordering of independent components while providing normalized demixing processes. Keywords: Independent component analysis, constrained independent componentanalysis, constrained optimization, Lagrange multiplier methods 1 Introduction Independent component analysis (ICA) is a technique to transform a multivariate randomsignal into a signal with components that are mutually independent in complete statistical sense [1]. There has been a growing interest in research for efficient realization of ICA neural networks (ICNNs).


Stagewise Processing in Error-correcting Codes and Image Restoration

Neural Information Processing Systems

Hidetoshi Nishimori Department of Physics, Tokyo Institute of Technology, Oh-Okayama, Meguro-ku, Tokyo 152-8551, Japan nishi@stat.phys.titech.ac.jp Abstract We introduce stagewise processing in error-correcting codes and image restoration, by extracting information from the former stage and using it selectively to improve the performance of the latter one. Both mean-field analysis using the cavity method and simulations showthat it has the advantage of being robust against uncertainties in hyperparameter estimation. 1 Introduction In error-correcting codes [1] and image restoration [2], the choice of the so-called hyperparameters is an important factor in determining their performances. In error correction, they determine the statistical significance given to the paritychecking termsand the received bits. Similarly in image restoration, they determine the statistical weights given to the prior knowledge and the received data. It was shown, by the use of inequalities, that the choice of the hyperparameters is optimal whenthere is a match between the source and model priors [3].


Occam's Razor

Neural Information Processing Systems

The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We analyze the complexity of functions for some linear in the parameter models that are equivalent to Gaussian Processes, and always find Occam's Razor at work. 1 Introduction Occam's Razor is a well known principle of "parsimony of explanations" which is influential inscientific thinking in general and in problems of statistical inference in particular. In this paper we review its consequences for Bayesian statistical models, where its behaviour can be easily demonstrated and quantified.


Some New Bounds on the Generalization Error of Combined Classifiers

Neural Information Processing Systems

In this paper we develop the method of bounding the generalization error of a classifier in terms of its margin distribution which was introduced in the recent papers of Bartlett and Schapire, Freund, Bartlett and Lee. The theory of Gaussian and empirical processes allow us to prove the margin type inequalities for the most general functional classes, the complexity of the class being measured via the so called Gaussian complexity functions. Asa simple application of our results, we obtain the bounds of Schapire, Freund, Bartlett and Lee for the generalization error of boosting. Wealso substantially improve the results of Bartlett on bounding the generalization error of neural networks in terms of h -norms of the weights of neurons. Furthermore, under additional assumptions on the complexity of the class of hypotheses we provide some tighter bounds, which in the case of boosting improve the results of Schapire, Freund, Bartlett and Lee.


Finding the Key to a Synapse

Neural Information Processing Systems

Experimental data have shown that synapses are heterogeneous: different synapses respond with different sequences of amplitudes of postsynaptic responses to the same spike train. Neither the role of synaptic dynamics itself nor the role of the heterogeneity of synaptic dynamics for computations inneural circuits is well understood. We present in this article methods that make it feasible to compute for a given synapse with known synaptic parameters the spike train that is optimally fitted to the synapse, for example in the sense that it produces the largest sum of postsynaptic responses.To our surprise we find that most of these optimally fitted spike trains match common firing patterns of specific types of neurons that are discussed in the literature. 1 Introduction A large number of experimental studies have shown that biological synapses have an inherent dynamics,which controls how the pattern of amplitudes of postsynaptic responses depends on the temporal pattern of the incoming spike train. Various quantitative models have been proposed involving a small number of characteristic parameters, that allow us to predict the response of a given synapse to a given spike train once proper values for these characteristic synaptic parameters have been found. The analysis of this article is based on the model of [1], where three parameters U, F, D control the dynamics of a synapse and a fourth parameter A - which corresponds to the synaptic "weight" in static synapse models - scales the absolute sizes of the postsynaptic responses. The resulting model predicts theamplitude Ak for the kth spike in a spike train with interspike intervals (lSI's) .60


The Early Word Catches the Weights

Neural Information Processing Systems

The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently,there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for,more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contributionof AoA to naming latency, as well as conditions under which frequency provides an independent contribution.