Goto

Collaborating Authors

 Learning Graphical Models


Domain Adaptation with Multiple Sources

Neural Information Processing Systems

This paper presents a theoretical analysis of the problem of adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most \epsilon are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most \epsilon with respect to *any* target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3\epsilon. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.


Multiscale Random Fields with Application to Contour Grouping

Neural Information Processing Systems

We introduce a new interpretation of multiscale random fields (MSRFs) that admits efficient optimization in the framework of regular (single level) random fields (RFs). It is based on a new operator, called append, that combines sets of random variables (RVs) to single RVs. We assume that a MSRF can be decomposed into disjoint trees that link RVs at different pyramid levels. The append operator is then applied to map RVs in each tree structure to a single RV. We demonstrate the usefulness of the proposed approach on a challenging task involving grouping contours of target shapes in images. MSRFs provide a natural representation of multiscale contour models, which are needed in order to cope with unstable contour decompositions. The append operator allows us to find optimal image labels using the classical framework of relaxation labeling, Alternative methods like Markov Chain Monte Carlo (MCMC) could also be used.


Improved Moves for Truncated Convex Models

Neural Information Processing Systems

We consider the problem of obtaining the approximate maximum a posteriori estimate of a discrete random field characterized by pairwise potentials that form a truncated convex model. For this problem, we propose an improved st-mincut based move making algorithm. Unlike previous move making approaches, which either provide a loose bound or no bound on the quality of the solution (in terms of the corresponding Gibbs energy), our algorithm achieves the same guarantees as the standard linear programming (LP) relaxation. Compared to previous approaches based on the LP relaxation, e.g. interior-point algorithms or tree-reweighted message passing (TRW), our method is faster as it uses only the efficient st-mincut algorithm in its design. Furthermore, it directly provides us with a primal solution (unlike TRW and other related methods which attempt to solve the dual of the LP). We demonstrate the effectiveness of the proposed approach on both synthetic and standard real data problems. Our analysis also opens up an interesting question regarding the relationship between move making algorithms (such as $\alpha$-expansion and the algorithms presented in this paper) and the randomized rounding schemes used with convex relaxations. We believe that further explorations in this direction would help design efficient algorithms for more complex relaxations.



On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

Neural Information Processing Systems

We provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes. These bounds make short work of providing a number of corollaries including: risk bounds for linear prediction (including settings where the weight vectors are constrained by either $L_2$ or $L_1$ constraints), margin bounds (including both $L_2$ and $L_1$ margins, along with more general notions based on relative entropy), a proof of the PAC-Bayes theorem, and $L_2$ covering numbers (with $L_p$ norm constraints and relative entropy constraints). In addition to providing a unified analysis, the results herein provide some of the sharpest risk and margin bounds (improving upon a number of previous results). Interestingly, our results show that the uniform convergence rates of empirical risk minimization algorithms tightly match the regret bounds of online learning algorithms for linear prediction (up to a constant factor of 2).


Fast Prediction on a Tree

Neural Information Processing Systems

Given an $n$-vertex weighted tree with structural diameter $S$ and a subset of $m$ vertices, we present a technique to compute a corresponding $m \times m$ Gram matrix of the pseudoinverse of the graph Laplacian in $O(n+ m^2 + m S)$ time. We discuss the application of this technique to fast label prediction on a generic graph. We approximate the graph with a spanning tree and then we predict with the kernel perceptron. We address the approximation of the graph with either a minimum spanning tree or a shortest path tree. The fast computation of the pseudoinverse enables us to address prediction problems on large graphs. To this end we present experiments on two web-spam classification tasks, one of which includes a graph with 400,000 nodes and more than 10,000,000 edges. The results indicate that the accuracy of our technique is competitive with previous methods using the full graph information.


Shape-Based Object Localization for Descriptive Classification

Neural Information Processing Systems

Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are interested inmore refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely outlining objectsthat exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering.


Unifying the Sensory and Motor Components of Sensorimotor Adaptation

Neural Information Processing Systems

Adaptation of visually guided reaching movements in novel visuomotor environments (e.g.wearing prism goggles) comprises not only motor adaptation but also substantial sensory adaptation, corresponding to shifts in the perceived spatial location of visual and proprioceptive cues. Previous computational modelsof the sensory component of visuomotor adaptation have assumed that it is driven purely by the discrepancy introduced between visual andproprioceptive estimates of hand position and is independent of any motor component of adaptation. We instead propose a unified model in which sensory and motor adaptation are jointly driven by optimal Bayesian estimation of the sensory and motor contributions to perceived errors. Our model is able to account for patterns of performance errors during visuomotor adaptationas well as the subsequent perceptual aftereffects. This unified model also makes the surprising prediction that force field adaptation willelicit similar perceptual shifts, even though there is never any discrepancy between visual and proprioceptive observations. We confirm this prediction with an experiment.


Support Vector Machines with a Reject Option

Neural Information Processing Systems

We consider the problem of binary classification where the classifier may abstain instead of classifying each observation. The Bayes decision rule for this setup, known as Chow's rule, is defined by two thresholds on posterior probabilities. From simple desiderata, namely the consistency and the sparsity of the classifier, we derive the double hinge loss function that focuses on estimating conditional probabilities only in the vicinity of the threshold points of the optimal decision rule. We show that, for suitable kernel machines, our approach is universally consistent. We cast the problem of minimizing the double hinge loss as a quadratic program akin to the standard SVM optimization problem and propose an active set method to solve it efficiently. We finally provide preliminary experimental results illustrating the interest of our constructive approach to devising loss functions.


An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering

Neural Information Processing Systems

We propose an efficient sequential Monte Carlo inference scheme for the recently proposed coalescent clustering model (Teh et al, 2008). Our algorithm has a quadratic runtime while those in (Teh et al, 2008) is cubic. In experiments, we were surprised to find that in addition to being more efficient, it is also a better sequential Monte Carlo sampler than the best in (Teh et al, 2008), when measured in terms of variance of estimated likelihood and effective sample size.