Goto

Collaborating Authors

 Mathematical & Statistical Methods


A unified framework for spectral clustering in sparse graphs

arXiv.org Machine Learning

One of the most natural tasks in graph theory is community detection, i.e., the identification of similarity groups on a given network. Practically, for an unweighted and undirected graph G(V, E) with V n nodes and E edges, community detection consists in finding a non-overlapping partition of the nodes that identifies underlying communities in a completely unsupervised manner. There is no unique definition of a community, but a general criterion is to impose that nodes in the same community have more interconnections than nodes in different communities, as a consequence of the stronger affinity among members of the same community [17]. There exist many ways of formalizing this intuition, some of them under the form of a cost function to minimize, such as the MinCut, RatioCut, and NormalizedCut costs [53]. The resulting optimizations are however NPhard problems and, as a consequence, many algorithms consist in retrieving relaxed continuous solutions of the problem.


Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing Systems

Stochastic Gradient Boosting (SGB) is a widely used approach to regularization of boosting models based on decision trees. It was shown that, in many cases, random sampling at each iteration can lead to better generalization performance of the model and can also decrease the learning time. Different sampling approaches were proposed, where probabilities are not uniform, and it is not currently clear which approach is the most effective. In this paper, we formulate the problem of randomization in SGB in terms of optimization of sampling probabilities to maximize the estimation accuracy of split scoring used to train decision trees.This optimization problem has a closed-form nearly optimal solution, and it leads to a new sampling technique, which we call Minimal Variance Sampling (MVS).The method both decreases the number of examples needed for each iteration of boosting and increases the quality of the model significantly as compared to the state-of-the art sampling methods. The superiority of the algorithm was confirmed by introducing MVS as a new default option for subsampling in CatBoost, a gradient boosting library achieving state-of-the-art quality on various machine learning tasks. Papers published at the Neural Information Processing Systems Conference.


Random Quadratic Forms with Dependence: Applications to Restricted Isometry and Beyond

Neural Information Processing Systems

Several important families of computational and statistical results in machine learning and randomized algorithms rely on uniform bounds on quadratic forms of random vectors or matrices. Such results include the Johnson-Lindenstrauss (J-L) Lemma, the Restricted Isometry Property (RIP), randomized sketching algorithms, and approximate linear algebra. The existing results critically depend on statistical independence, e.g., independent entries for random vectors, independent rows for random matrices, etc., which prevent their usage in dependent or adaptive modeling settings. In this paper, we show that such independence is in fact not needed for such results which continue to hold under fairly general dependence structures. In particular, we present uniform bounds on random quadratic forms of stochastic processes which are conditionally independent and sub-Gaussian given another (latent) process.


Fast and Accurate Stochastic Gradient Estimation

Neural Information Processing Systems

Stochastic Gradient Descent or SGD is the most popular optimization algorithm for large-scale problems. SGD estimates the gradient by uniform sampling with sample size one. There have been several other works that suggest faster epoch-wise convergence by using weighted non-uniform sampling for better gradient estimates. Unfortunately, the per-iteration cost of maintaining this adaptive distribution for gradient estimation is more than calculating the full gradient itself, which we call the chicken-and-the-egg loop. As a result, the false impression of faster convergence in iterations, in reality, leads to slower convergence in time.


Robustness Verification of Tree-based Models

Neural Information Processing Systems

We study the robustness verification problem of tree based models, including random forest (RF) and gradient boosted decision tree (GBDT). Formal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches cast this verification problem into a mixed integer linear programming (MILP) problem, which finds the minimal adversarial distortion in exponential time so is impractical for large ensembles. Although this verification problem is NP-complete in general, we give a more precise complexity characterization. We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles the verification problem can be cast as a max-clique problem on a multi-partite boxicity graph.


Understanding the Role of Momentum in Stochastic Gradient Methods

Neural Information Processing Systems

The use of momentum in stochastic gradient methods has become a widespread practice in machine learning. Different variants of momentum, including heavy-ball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum (QHM), have demonstrated success on various tasks. Despite these empirical successes, there is a lack of clear understanding of how the momentum parameters affect convergence and various performance measures of different algorithms. In this paper, we use the general formulation of QHM to give a unified analysis of several popular algorithms, covering their asymptotic convergence conditions, stability regions, and properties of their stationary distributions. In addition, by combining the results on convergence rates and stationary distributions, we obtain sometimes counter-intuitive practical guidelines for setting the learning rate and momentum parameters.


(Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs

Neural Information Processing Systems

We consider the graph matching/similarity problem of determining how similar two given graphs $G_0,G_1$ are and recovering the permutation $\pi$ on the vertices of $G_1$ that minimizes the symmetric difference between the edges of $G_0$ and $\pi(G_1)$. Graph matching/similarity has applications for pattern matching, vision, social network anonymization, malware analysis, and more. We give the first efficient algorithms proven to succeed in the correlated Erdös-Rényi model (Pedarsani and Grossglauser, 2011). Specifically, we give a polynomial time algorithm for the graph similarity/hypothesis testing task which works for every constant level of correlation between the two graphs that can be arbitrarily close to zero. We also give a quasi-polynomial ($n {O(\log n)}$ time) algorithm for the graph matching task of recovering the permutation minimizing the symmetric difference in this model. This is the first algorithm to do so without requiring as additional input a seed'' of the values of the ground truth permutation on at least $n {\Omega(1)}$ vertices.


A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Neural Information Processing Systems

We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic version of Nesterov's method with a specific restart and parameters selected to achieve the fastest reduction in the bias-variance terms in the convergence rate bounds. Papers published at the Neural Information Processing Systems Conference.


Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction

Neural Information Processing Systems

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice. In this paper, we propose a Stochastic Recursive Variance-Reduced gradient HMC (SRVR-HMC) algorithm. It makes use of a semi-stochastic gradient estimator that recursively accumulates the gradient information to reduce the variance of the stochastic gradient. We provide a convergence analysis of SRVR-HMC for sampling from a class of non-log-concave distributions and show that SRVR-HMC converges faster than all existing HMC-type algorithms based on underdamped Langevin dynamics. Thorough experiments on synthetic and real-world datasets validate our theory and demonstrate the superiority of SRVR-HMC.


Learning Erdos-Renyi Random Graphs via Edge Detecting Queries

Neural Information Processing Systems

In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $\Omega( \min\{ k 2 \log n, n 2\})$ tests (even when a small probability of error is allowed), we show that learning an Erd\H{o}s-R\'enyi random graph with an average of $\kbar$ edges is much easier; namely, one can attain asymptotically vanishing error probability with only $O(\kbar \log n)$ tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of $O(\kbar \log 2 \kbar \kbar \log n)$. Papers published at the Neural Information Processing Systems Conference.