Goto

Collaborating Authors

 Sarkar, Purnamrita


Higher-Order Correct Multiplier Bootstraps for Count Functionals of Networks

arXiv.org Machine Learning

Subgraph counts play a central role in both graph limit theory and network data analysis. In recent years, substantial progress has been made in the area of uncertainty quantification for these functionals; several procedures are now known to be consistent for the problem. In this paper, we propose a new class of multiplier bootstraps for count functionals. We show that a bootstrap procedure with a multiplicative weights exhibits higher-order correctness under appropriate sparsity conditions. Since this bootstrap is computationally expensive, we propose linear and quadratic approximations to the multiplier bootstrap, which correspond to the first and second-order Hayek projections of an approximating U-statistic, respectively. We show that the quadratic bootstrap procedure achieves higher-order correctness under analogous conditions to the multiplicative bootstrap while having much better computational properties. We complement our theoretical results with a simulation study and verify that our procedure offers state-of-the-art performance for several functionals.


A Unified Framework for Tuning Hyperparameters in Clustering Problems

arXiv.org Machine Learning

Selecting hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation. However, this issue is prevalent in machine learning, especially in clustering problems with examples including the Lagrange multipliers of penalty terms in semidefinite programming (SDP) relaxations and the bandwidths used for constructing kernel similarity matrices for Spectral Clustering. Despite this, there are not many provable algorithms for tuning these hyperparameters. In this paper, we provide a unified framework with provable guarantees for the above class of problems. We demonstrate our method on two distinct models. First, we show how to tune the hyperparameters in widely used SDP algorithms for community detection in networks. In this case, our method can also be used for model selection. Second, we show the same framework works for choosing the bandwidth for the kernel similarity matrix in Spectral Clustering for subgaussian mixtures under suitable model specification. In a variety of simulation experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings.


Overlapping Clustering Models, and One (class) SVM to Bind Them All

Neural Information Processing Systems

People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace. Many existing overlapping clustering methods model each person (or word, or book) as a non-negative weighted combination of "exemplars" who belong solely to one community, with some small noise. Geometrically, each person is a point on a cone whose corners are these exemplars. This basic form encompasses the widely used Mixed Membership Stochastic Blockmodel of networks and its degree-corrected variants, as well as topic models such as LDA. We show that a simple one-class SVM yields provably consistent parameter inference for all such models, and scales to large datasets. Experimental results on several simulated and real datasets show our algorithm (called SVM-cone) is both accurate and scalable.


Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues

Neural Information Processing Systems

Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean field, theoretical studies of its loss function surface and the convergence behavior of iterative updates for optimizing the loss are far from complete. In this paper, we focus on the problem of community detection for a simple two-class Stochastic Blockmodel (SBM). Using batch co-ordinate ascent (BCAVI) for updates, we give a complete characterization of all the critical points and show different convergence behaviors with respect to initializations. When the parameters are known, we show a significant proportion of random initializations will converge to ground truth. On the other hand, when the parameters themselves need to be estimated, a random initialization will converge to an uninformative local optimum.


Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues

Neural Information Processing Systems

Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean field, theoretical studies of its loss function surface and the convergence behavior of iterative updates for optimizing the loss are far from complete. In this paper, we focus on the problem of community detection for a simple two-class Stochastic Blockmodel (SBM). Using batch co-ordinate ascent (BCAVI) for updates, we give a complete characterization of all the critical points and show different convergence behaviors with respect to initializations. When the parameters are known, we show a significant proportion of random initializations will converge to ground truth. On the other hand, when the parameters themselves need to be estimated, a random initialization will converge to an uninformative local optimum.


Hierarchical community detection by recursive bi-partitioning

arXiv.org Machine Learning

The problem of community detection in networks is usually formulated as finding a single partition of the network into some "correct" number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive bi-partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stopping rule suggests there are no further communities. Such an algorithm is model-free, extremely fast, and requires no tuning other than selecting a stopping rule. We show that there are regimes where it outperforms $K$-way spectral clustering, and propose a natural model for analyzing the algorithm's theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We also apply the algorithm to a dataset of statistics papers to construct a hierarchical tree of statistical research communities.


Overlapping Clustering Models, and One (class) SVM to Bind Them All

arXiv.org Machine Learning

People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace. Many existing overlapping clustering methods model each person (or word, or book) as a non-negative weighted combination of "exemplars" who belong solely to one community, with some small noise. Geometrically, each person is a point on a cone whose corners are these exemplars. This basic form encompasses the widely used Mixed Membership Stochastic Blockmodel of networks (Airoldi et al., 2008) and its degree-corrected variants (Karrer et al. 2011; Jin et al., 2017), as well as topic models such as LDA (Blei et al., 2003). We show that a simple one-class SVM yields provably consistent parameter inference for all such models, and scales to large datasets. Experimental results on several simulated and real datasets show our algorithm (called SVM-cone) is both accurate and scalable.


Provable Estimation of the Number of Blocks in Block Models

arXiv.org Machine Learning

Community detection is a fundamental unsupervised learning problem for unlabeled networks which has a broad range of applications. Many community detection algorithms assume that the number of clusters $r$ is known apriori. In this paper, we propose an approach based on semi-definite relaxations, which does not require prior knowledge of model parameters like many existing convex relaxation methods and recovers the number of clusters and the clustering matrix exactly under a broad parameter regime, with probability tending to one. On a variety of simulated and real data experiments, we show that the proposed method often outperforms state-of-the-art techniques for estimating the number of clusters.


Convergence of Gradient EM on Multi-component Mixture of Gaussians

Neural Information Processing Systems

In this paper, we study convergence properties of the gradient variant of Expectation-Maximization algorithm~\cite{lange1995gradient} for Gaussian Mixture Models for arbitrary number of clusters and mixing coefficients. We derive the convergence rate depending on the mixing coefficients, minimum and maximum pairwise distances between the true centers, dimensionality and number of components; and obtain a near-optimal local contraction radius. While there have been some recent notable works that derive local convergence rates for EM in the two symmetric mixture of Gaussians, in the more general case, the derivations need structurally different and non-trivial arguments. We use recent tools from learning theory and empirical processes to achieve our theoretical results.


On clustering network-valued data

Neural Information Processing Systems

Community detection, which focuses on clustering nodes or detecting communities in (mostly) a single network, is a problem of considerable practical interest and has received a great deal of attention in the research community. While being able to cluster within a network is important, there are emerging needs to be able to \emph{cluster multiple networks}. This is largely motivated by the routine collection of network data that are generated from potentially different populations. These networks may or may not have node correspondence. When node correspondence is present, we cluster networks by summarizing a network by its graphon estimate, whereas when node correspondence is not present, we propose a novel solution for clustering such networks by associating a computationally feasible feature vector to each network based on trace of powers of the adjacency matrix. We illustrate our methods using both simulated and real data sets, and theoretical justifications are provided in terms of consistency.