Goto

Collaborating Authors

 Country


Contextual Modulation of Target Saliency

Neural Information Processing Systems

In real-world scenes, intrinsic object information is often degraded due to occlusion, low contrast, and poor resolution. In such situations, the object recognition problem based on intrinsic object representations is ill-posed. A more comprehensive representation of an object should include contextual information [11,13]: Obj.


Active Information Retrieval

Neural Information Processing Systems

In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system responds to the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information that can be presented to the user per iteration.


Multiplicative Updates for Classification by Mixture Models

Neural Information Processing Systems

We investigate a learning algorithm for the classification of nonnegative data by mixture models. Multiplicative update rules are derived that directly optimize the performance of these models as classifiers. The update rules have a simple closed form and an intuitive appeal. Our algorithm retains the main virtues of the Expectation-Maximization (EM) algorithm--its guarantee of monotonic improvement, and its absence of tuning parameters--with the added advantage of optimizing a discriminative objective function. The algorithm reduces as a special case to the method of generalized iterative scaling for log-linear models. The learning rate of the algorithm is controlled by the sparseness of the training data. We use the method of nonnegative matrix factorization (NMF) to discover sparse distributed representations of the data. This form of feature selection greatly accelerates learning and makes the algorithm practical on large problems. Experiments show that discriminatively trained mixture models lead to much better classification than comparably sized models trained by EM.


Using Vocabulary Knowledge in Bayesian Multinomial Estimation

Neural Information Processing Systems

Recent approaches have used uncertainty over the vocabulary of symbols in a multinomial distribution as a means of accounting for sparsity. We present a Bayesian approach that allows weak prior knowledge, in the form of a small set of approximate candidate vocabularies, to be used to dramatically improve the resulting estimates. We demonstrate these improvements in applications to text compression and estimating distributions over words in newsgroup data.


Probabilistic principles in unsupervised learning of visual structure: human data and a model

Neural Information Processing Systems

To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow's criterion of "suspicious coincidence" (the ratio of joint probability to the product of marginals). We then compared the part verification response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the significance of their co-occurrence as estimated by Barlow's criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain's strategies for unsupervised acquisition of structural information in vision.


An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games

Neural Information Processing Systems

We describe a new algorithm for computing a Nash equilibrium in graphical games, a compact representation for multi-agent systems that we introduced in previous work. The algorithm is the first to compute equilibria both efficiently and exactly for a nontrivial class of graphical games. 1 Introduction Seeking to replicate the representational and computational benefits that graphical models have provided to probabilistic inference, several recent works have introduced graph-theoretic frameworks for the study of multi-agent systems (La Mura 2000; Koller and Milch 2001; Kearns et al. 2001). In the simplest of these formalisms, each vertex represents a single agent, and the edges represent pairwise interaction between agents. As with many familiar network models, the macroscopic behavior of a large system is thus implicitly described by its local interactions, and the computational challenge is to extract the global states of interest. Classical game theory is typically used to model multi-agent interactions, and the global states of interest are thus the so-called Nash equilibria, in which no agent has a unilateral incentive to deviate.


The Concave-Convex Procedure (CCCP)

Neural Information Processing Systems

This paper describes a simple geometrical Concave-Convex procedure (CCCP) for constructing discrete time dynamical systems which can be guaranteed to decrease almost any global optimization/energy function (see technical conditions in section (2)). We prove that there is a relationship between CCCP and optimization techniques based on introducing auxiliary variables using Legendre transforms. We distinguish between Legendre min-max and Legendre minimization. In the former, see [6], the introduction of auxiliary variables converts the problem to a min-max problem where the goal is to find a saddle point. By contrast, in Legendre minimization, see [8], the problem remains a minimization one (and so it becomes easier to analyze convergence).


Face Recognition Using Kernel Methods

Neural Information Processing Systems

Principal Component Analysis and Fisher Linear Discriminant methods have demonstrated their success in face detection, recognition, and tracking. The representation in these subspace methods is based on second order statistics of the image set, and does not address higher order statistical dependencies such as the relationships among three or more pixels. Recently Higher Order Statistics and Independent Component Analysis (ICA) have been used as informative low dimensional representations for visual recognition. In this paper, we investigate the use of Kernel Principal Component Analysis and Kernel Fisher Linear Discriminant for learning low dimensional representations for face recognition, which we call Kernel Eigenface and Kernel Fisherface methods. While Eigenface and Fisherface methods aim to find projection directions based on the second order correlation of samples, Kernel Eigenface and Kernel Fisherface methods provide generalizations which take higher order correlations into account.


Improvisation and Learning

Neural Information Processing Systems

This article presents a 2-phase computational learning model and application. As a demonstration, a system has been built, called CHIME for Computer Human Interacting Musical Entity. In phase 1 of training, recurrent back-propagation trains the machine to reproduce 3 jazz melodies. The recurrent network is expanded and is further trained in phase 2 with a reinforcement learning algorithm and a critique produced by a set of basic rules for jazz improvisation.


Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions

Neural Information Processing Systems

The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms. However, it suffers from computational complexity proportional to the square of the number of samples in the training data. This sets a practical limit to using large databases. We suggest immediate divorce of the two methods and remarriage of Renyi entropy with a semi-parametric density estimation method, such as a Gaussian Mixture Models (GMM). This allows all of the computation to take place in the low dimensional target space, and it reduces computational complexity proportional to square of the number of components in the mixtures. Furthermore, a convenient extension to Hidden Markov Models as commonly used in speech recognition becomes possible.