Goto

Collaborating Authors

 Banff


Graph Laplacian Regularization for Large-Scale Semidefinite Programming

Neural Information Processing Systems

In many areas of science and engineering, the problem arises how to discover low dimensional representations of high dimensional data. Recently, a number of researchers have converged on common solutions to this problem using methods from convex optimization. In particular, many results have been obtained by constructing semidefinite programs (SDPs) with low rank solutions. While the rank of matrix variables in SDPs cannot be directly constrained, it has been observed that low rank solutions emerge naturally by computing high variance or maximal trace solutions that respect local distance constraints. In this paper, we show how to solve very large problems of this type by a matrix factorization that leads to much smaller SDPs than those previously studied. The matrix factorization is derived by expanding the solution of the original problem in terms of the bottom eigenvectors of a graph Laplacian. The smaller SDPs obtained from this matrix factorization yield very good approximations to solutions of the original problem. Moreover, these approximations can be further refined by conjugate gradient descent. We illustrate the approach on localization in large scale sensor networks, where optimizations involving tens of thousands of nodes can be solved in just a few minutes.


Prediction on a Graph with a Perceptron

Neural Information Processing Systems

We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph - the graph diameter and the cut size of a partitioning of the graph - which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated.


Graph Laplacian Regularization for Large-Scale Semidefinite Programming

Neural Information Processing Systems

In many areas of science and engineering, the problem arises how to discover low dimensional representations of high dimensional data. Recently, a number of researchers have converged on common solutions to this problem using methods from convex optimization. In particular, many results have been obtained by constructing semidefinite programs (SDPs) with low rank solutions. While the rank of matrix variables in SDPs cannot be directly constrained, it has been observed that low rank solutions emerge naturally by computing high variance or maximal trace solutions that respect local distance constraints. In this paper, we show how to solve very large problems of this type by a matrix factorization that leads to much smaller SDPs than those previously studied. The matrix factorization is derived by expanding the solution of the original problem in terms of the bottom eigenvectors of a graph Laplacian. The smaller SDPs obtained from this matrix factorization yield very good approximations to solutions of the original problem. Moreover, these approximations can be further refined by conjugate gradient descent. We illustrate the approach on localization in large scale sensor networks, where optimizations involving tens of thousands of nodes can be solved in just a few minutes.


Prediction on a Graph with a Perceptron

Neural Information Processing Systems

We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph - the graph diameter and the cut size of a partitioning of the graph - which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated.


Query-time Entity Resolution

Journal of Artificial Intelligence Research

Entity resolution is the problem of reconciling database references corresponding to the same real-world entities. Given the abundance of publicly available databases that have unresolved entities, we motivate the problem of query-time entity resolution quick and accurate resolution for answering queries over such `unclean' databases at query-time. Since collective entity resolution approaches --- where related references are resolved jointly --- have been shown to be more accurate than independent attribute-based resolution for off-line entity resolution, we focus on developing new algorithms for collective resolution for answering entity resolution queries at query-time. For this purpose, we first formally show that, for collective resolution, precision and recall for individual entities follow a geometric progression as neighbors at increasing distances are considered. Unfolding this progression leads naturally to a two stage `expand and resolve' query processing strategy. In this strategy, we first extract the related records for a query using two novel expansion operators, and then resolve the extracted records collectively. We then show how the same strategy can be adapted for query-time entity resolution by identifying and resolving only those database references that are the most helpful for processing the query. We validate our approach on two large real-world publication databases where we show the usefulness of collective resolution and at the same time demonstrate the need for adaptive strategies for query processing. We then show how the same queries can be answered in real-time using our adaptive approach while preserving the gains of collective resolution. In addition to experiments on real datasets, we use synthetically generated data to empirically demonstrate the validity of the performance trends predicted by our analysis of collective entity resolution over a wide range of structural characteristics in the data.


Probabilistic Planning via Heuristic Forward Search and Weighted Model Counting

Journal of Artificial Intelligence Research

We present a new algorithm for probabilistic planning with no observability. Our algorithm, called Probabilistic-FF, extends the heuristic forward-search machinery of Conformant-FF to problems with probabilistic uncertainty about both the initial state and action effects. Specifically, Probabilistic-FF combines Conformant-FF's techniques with a powerful machinery for weighted model counting in (weighted) CNFs, serving to elegantly define both the search space and the heuristic function. Our evaluation of Probabilistic-FF shows its fine scalability in a range of probabilistic domains, constituting a several orders of magnitude improvement over previous results in this area. We use a problematic case to point out the main open issue to be addressed by further research.


An Algebraic Graphical Model for Decision with Uncertainties, Feasibilities, and Utilities

Journal of Artificial Intelligence Research

Numerous formalisms and dedicated algorithms have been designed in the last decades to model and solve decision making problems. Some formalisms, such as constraint networks, can express "simple" decision problems, while others are designed to take into account uncertainties, unfeasible decisions, and utilities. Even in a single formalism, several variants are often proposed to model different types of uncertainty (probability, possibility...) or utility (additive or not). In this article, we introduce an algebraic graphical model that encompasses a large number of such formalisms: (1) we first adapt previous structures from Friedman, Chu and Halpern for representing uncertainty, utility, and expected utility in order to deal with generic forms of sequential decision making; (2) on these structures, we then introduce composite graphical models that express information via variables linked by "local" functions, thanks to conditional independence; (3) on these graphical models, we finally define a simple class of queries which can represent various scenarios in terms of observabilities and controllabilities. A natural decision-tree semantics for such queries is completed by an equivalent operational semantics, which induces generic algorithms. The proposed framework, called the Plausibility-Feasibility-Utility (PFU) framework, not only provides a better understanding of the links between existing formalisms, but it also covers yet unpublished frameworks (such as possibilistic influence diagrams) and unifies formalisms such as quantified boolean formulas and influence diagrams. Our backtrack and variable elimination generic algorithms are a first step towards unified algorithms.


Distance Metric Learning for Large Margin Nearest Neighbor Classification

Neural Information Processing Systems

We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification--for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.


A Probabilistic Approach for Optimizing Spectral Clustering

Neural Information Processing Systems

Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named "Soft Cut". It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm.


Distance Metric Learning for Large Margin Nearest Neighbor Classification

Neural Information Processing Systems

We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification--for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.