Goto

Collaborating Authors

 Technology



Max-margin classification of incomplete data

Neural Information Processing Systems

We consider the problem of learning classifiers for structurally incomplete data, where some objects have a subset of features inherently absent due to complex relationships between the features. The common approach for handling missingfeatures is to begin with a preprocessing phase that completes the missing features, and then use a standard classification procedure. In this paper we show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. Weformulate this task using a geometrically-inspired objective function, and discuss two optimization approaches: The linearly separable case is written as a set of convex feasibility problems, and the non-separable case has a non-convex objective that we optimize iteratively. By avoiding the pre-processing phase in which the data is completed, these approaches offer considerable computational savings. More importantly, we show that by elegantly handlingcomplex patterns of missing values, our approach is both competitive with other methods when the values are missing at random and outperforms them when the missing values have nontrivial structure. We demonstrate our results on two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images.


Automated Hierarchy Discovery for Planning in Partially Observable Environments

Neural Information Processing Systems

Planning in partially observable domains is a notoriously difficult problem. However, inmany real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems. Several approaches have been proposed to optimize a policy that decomposes according to a hierarchy specified a priori. In this paper, we investigate the problem of automatically discovering the hierarchy. More precisely, we frame the optimization of a hierarchical policy as a non-convex optimization problem that can be solved with general nonlinear solvers, a mixed-integer nonlinear approximation or a form of bounded hierarchical policyiteration. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters) and often easier to understand given the decomposition induced by the hierarchy.



Conditional mean field

Neural Information Processing Systems

Despite all the attention paid to variational methods based on sum-product message passing(loopy belief propagation, tree-reweighted sum-product), these methods are still bound to inference on a small set of probabilistic models. Mean field approximations have been applied to a broader set of problems, but the solutions are often poor. We propose a new class of conditionally-specified variational approximations basedon mean field theory. While not usable on their own, combined with sequential Monte Carlo they produce guaranteed improvements over conventional mean field. Moreover, experiments on a well-studied problem-- inferring the stable configurations of the Ising spin glass--show that the solutions can be significantly better than those obtained using sum-product-based methods.


Similarity by Composition

Neural Information Processing Systems

We propose a new approach for measuring similarity between two signals, which is applicable to many machine learning tasks, and to many signal types.


Dirichlet-Enhanced Spam Filtering based on Biased Samples

Neural Information Processing Systems

We study a setting that is motivated by the problem of filtering spam messages for many users. Each user receives messages according to an individual, unknown distribution, reflected only in the unlabeled inbox. The spam filter for a user is required to perform well with respect to this distribution. Labeled messages from publicly available sources can be utilized, but they are governed by a distinct distribution, notadequately representing most inboxes. We devise a method that minimizes a loss function with respect to a user's personal distribution based on the available biased sample. A nonparametric hierarchical Bayesian model furthermore generalizesacross users by learning a common prior which is imposed on new email accounts. Empirically, we observe that bias-corrected learning outperforms naivereliance on the assumption of independent and identically distributed data; Dirichlet-enhanced generalization across users outperforms a single ("one size fits all") filter as well as independent filters for all users.


Analysis of Representations for Domain Adaptation

Neural Information Processing Systems

Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. In many situations, though, we have labeled training data for a source domain, and we wish to learn a classifier which performs well on a target domain with a different distribution. Under what conditions can we adapt a classifier trained on the source domain for use in the target domain? Intuitively, a good feature representation is a crucial factor in the success of domain adaptation. We formalize this intuition theoretically with a generalization bound for domain adaption. Our theory illustrates the tradeoffs inherent indesigning a representation for domain adaptation and gives a new justification for a recently proposed model. It also points toward a promising new model for domain adaptation: one which explicitly minimizes the difference between the source and target domains, while at the same time maximizing the margin of the training set.


A selective attention multi--chip system with dynamic synapses and spiking neurons

Neural Information Processing Systems

Selective attention is the strategy used by biological sensory systems to solve the problem of limited parallel processing capacity: salient subregions of the input stimuliare serially processed, while non-salient regions are suppressed. We present an mixed mode analog/digital Very Large Scale Integration implementation ofa building block for a multi-chip neuromorphic hardware model of selective attention. We describe the chip's architecture and its behavior, when its is part of a multi-chip system with a spiking retina as input, and show how it can be used to implement in real-time flexible models of bottom-up attention.


AdaBoost is Consistent

Neural Information Processing Systems

The risk, or probability of error, of the classifier produced by the AdaBoost algorithm isinvestigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency.