Goto

Collaborating Authors

 Mathematical & Statistical Methods


Determinantal point processes for machine learning

arXiv.org Machine Learning

Determinantal point processes (DPPs) are elegant probabilistic models of repulsion that arise in quantum physics and random matrix theory. In contrast to traditional structured models like Markov random fields, which become intractable and hard to approximate in the presence of negative correlations, DPPs offer efficient and exact algorithms for sampling, marginalization, conditioning, and other inference tasks. We provide a gentle introduction to DPPs, focusing on the intuitions, algorithms, and extensions that are most relevant to the machine learning community, and show how DPPs can be applied to real-world applications like finding diverse sets of high-quality search results, building informative summaries by selecting diverse sentences from documents, modeling non-overlapping human poses in images or video, and automatically building timelines of important news stories.


Interpreting prediction markets: a stochastic approach

Neural Information Processing Systems

We strengthen recent connections between prediction markets and learning byshowing that a natural class of market makers can be understood as performing stochastic mirror descent when trader demands are sequentially drawnfrom a fixed distribution. This provides new insights into how market prices (and price paths) may be interpreted as a summary of the market's belief distribution by relating them to the optimization problem being solved. In particular, we show that under certain conditions the stationary pointof the stochastic process of prices generated by the market is equal to the market's Walrasian equilibrium of classic market analysis. Together, these results suggest how traditional market making mechanisms might be replaced with general purpose learning algorithms while still retaining guaranteesabout their behaviour.


A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

Neural Information Processing Systems

We propose a new stochastic gradient method for optimizing the sum of
 a finite set of smooth functions, where the sum is strongly convex.
 While standard stochastic gradient methods
 converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence 
rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard
 algorithms, both in terms of optimizing the training error and reducing the test error quickly.


Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

Neural Information Processing Systems

Hypothesis testing on signals defined on surfaces (such as the cortical surface) is a fundamental component of a variety of studies in Neuroscience. The goal here is to identify regions that exhibit changes as a function of the clinical condition under study. As the clinical questions of interest move towards identifying very early signs of diseases, the corresponding statistical differences at the group level invariably become weaker and increasingly hard to identify. Indeed, after a multiple comparisons correction is adopted (to account for correlated statistical tests over all surface points), very few regions may survive. In contrast to hypothesis tests on point-wise measurements, in this paper, we make the case for performing statistical analysis on multi-scale shape descriptors that characterize the local topological context of the signal around each surface vertex. Our descriptors are based on recent results from harmonic analysis, that show how wavelet theory extends to non-Euclidean settings (i.e., irregular weighted graphs). We provide strong evidence that these descriptors successfully pick up group-wise differences, where traditional methods either fail or yield unsatisfactory results. Other than this primary application, we show how the framework allows performing cortical surface smoothing in the native space without mappint to a unit sphere.


Factoring nonnegative matrices with linear programs

Neural Information Processing Systems

This paper describes a new approach for computing nonnegative matrix factorizations (NMFs) with linear programming. The key idea is a data-driven model for the factorization, in which the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X = CX and some linear constraints. The matrix C selects features, which are then used to compute a low-rank NMF of X. A theoretical analysis demonstrates that this approach has the same type of guarantees as the recent NMF algorithm of Arora et al.~(2012). In contrast with this earlier work, the proposed method has (1) better noise tolerance, (2) extends to more general noise models, and (3) leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation of the new algorithm can factor a multi-Gigabyte matrix in a matter of minutes.


The Lovász ϑ function, SVMs and finding large dense subgraphs

Neural Information Processing Systems

The Lovasz $\theta$ function of a graph, is a fundamental tool in combinatorial optimization and approximation algorithms. Computing $\theta$ involves solving a SDP and is extremely expensive even for moderately sized graphs. In this paper we establish that the Lovasz $\theta$ function is equivalent to a kernel learning problem related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call $SVM-\theta$ graphs, on which the Lovasz $\theta$ function can be approximated well by a one-class SVM. This leads to a novel use of SVM techniques to solve algorithmic problems in large graphs e.g. identifying a planted clique of size $\Theta({\sqrt{n}})$ in a random graph $G(n,\frac{1}{2})$. A classic approach for this problem involves computing the $\theta$ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of $SVM-\theta$ graph, and as a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. Further, we introduce the notion of a ''common orthogonal labeling'' which extends the notion of a ''orthogonal labelling of a single graph (used in defining the $\theta$ function) to multiple graphs. The problem of finding the optimal common orthogonal labelling is cast as a Multiple Kernel Learning problem and is used to identify a large common dense region in multiple graphs. The proposed algorithm achieves an order of magnitude scalability compared to the state of the art.


Newton-Like Methods for Sparse Inverse Covariance Estimation

Neural Information Processing Systems

We propose two classes of second-order optimization methods for solving the sparse inverse covariance estimation problem. The first approach, which we call the Newton-LASSO method, minimizes a piecewise quadratic model of the objective function at every iteration to generate a step. We employ the fast iterative shrinkage thresholding method (FISTA) to solve this subproblem. The second approach, which we call the Orthant-Based Newton method, is a two-phase algorithm that first identifies an orthant face and then minimizes a smooth quadratic approximation of the objective function using the conjugate gradient method. These methods exploit the structure of the Hessian to efficiently compute the search direction and to avoid explicitly storing the Hessian. We show that quasi-Newton methods are also effective in this context, and describe a limited memory BFGS variant of the orthant-based Newton method. We present numerical results that suggest that all the techniques described in this paper have attractive properties and constitute useful tools for solving the sparse inverse covariance estimation problem. Comparisons with the method implemented in the QUIC software package are presented.


Spectral Estimation of Conditional Random Graph Models for Large-Scale Network Data

arXiv.org Machine Learning

Generative models for graphs have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are either only suitable for characterizing some particular network properties (such as degree distribution or clustering coefficient), or they are aimed at estimating joint probability distributions, which is often intractable in large-scale networks. In this paper, we first propose a novel network statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random graph model, switching the focus from the estimation of joint probability distributions to a more tractable conditional estimation setting. After analyzing the dependence structure characterizing Fiedler random graphs, we evaluate them experimentally in edge prediction over several real-world networks, showing that they allow to reach a much higher prediction accuracy than various alternative statistical models.


An example illustrating the imprecision of the efficient approach for diagnosis of Petri nets via integer linear programming

arXiv.org Artificial Intelligence

This document demonstrates that the efficient approach for diagnosis of Petri nets via integer linear programming may be unable to detect a fault even if the system is diagnosable.


Intelligent Computation of Reachability Sets for Space Missions

AAAI Conferences

This paper introduces a new technique for intelligently exploring the reachability set of a spacecraft: the set of trajectories from a given initial condition that are possible under a specified range of control actions. The high dimension of this problem and the nonlinear nature of gravitational interactions make the geometry of these sets complicated, hard to compute, and all but impossible to visualize. Currently, exploration of a problem’s state space is done heuristically, based on previously identified solutions. This potentially misses out on improved mission design solutions that are not close to previous approaches. The goal of the work described here is to map out reachability sets automatically. This would not only aid human mission planners, but also allow a spacecraft to determine its own course without input from Earth-based controllers. Brute-force approaches to this are computationally prohibitive, so one must focus the effort on regions that are of interest: where neighboring trajectories diverge quickly, for instance, or come close to a body that the spacecraft is orbiting. In this paper, we focus on the first of those two criteria; the goal is to identify regions in the system’s state space where small changes have large effects— or vice versa—and concentrate the computational mesh accordingly.