AITopics

2009.0617

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.63)

Industry: Information Technology (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Machine LearningOct-17-2019

A Unified Framework for Tuning Hyperparameters in Clustering Problems

Fan, Xinjie, Yue, Yuguang, Sarkar, Purnamrita, Wang, Y. X. Rachel

Selecting hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation. However, this issue is prevalent in machine learning, especially in clustering problems with examples including the Lagrange multipliers of penalty terms in semidefinite programming (SDP) relaxations and the bandwidths used for constructing kernel similarity matrices for Spectral Clustering. Despite this, there are not many provable algorithms for tuning these hyperparameters. In this paper, we provide a unified framework with provable guarantees for the above class of problems. We demonstrate our method on two distinct models. First, we show how to tune the hyperparameters in widely used SDP algorithms for community detection in networks. In this case, our method can also be used for model selection. Second, we show the same framework works for choosing the bandwidth for the kernel similarity matrix in Spectral Clustering for subgaussian mixtures under suitable model specification. In a variety of simulation experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings.

artificial intelligence, machine learning, null, (17 more...)

1910.08018

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Neural Information Processing SystemsDec-31-2018

Overlapping Clustering Models, and One (class) SVM to Bind Them All

Mao, Xueyu, Sarkar, Purnamrita, Chakrabarti, Deepayan

People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace. Many existing overlapping clustering methods model each person (or word, or book) as a non-negative weighted combination of "exemplars" who belong solely to one community, with some small noise. Geometrically, each person is a point on a cone whose corners are these exemplars. This basic form encompasses the widely used Mixed Membership Stochastic Blockmodel of networks and its degree-corrected variants, as well as topic models such as LDA. We show that a simple one-class SVM yields provably consistent parameter inference for all such models, and scales to large datasets. Experimental results on several simulated and real datasets show our algorithm (called SVM-cone) is both accurate and scalable.

artificial intelligence, bayesian inference, machine learning, (20 more...)

Country:

North America > United States > Texas (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Neural Information Processing SystemsDec-31-2018

Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues

Mukherjee, Soumendu Sundar, Sarkar, Purnamrita, Wang, Y. X. Rachel, Yan, Bowei

Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean field, theoretical studies of its loss function surface and the convergence behavior of iterative updates for optimizing the loss are far from complete. In this paper, we focus on the problem of community detection for a simple two-class Stochastic Blockmodel (SBM). Using batch co-ordinate ascent (BCAVI) for updates, we give a complete characterization of all the critical points and show different convergence behaviors with respect to initializations. When the parameters are known, we show a significant proportion of random initializations will converge to ground truth. On the other hand, when the parameters themselves need to be estimated, a random initialization will converge to an uninformative local optimum.

artificial intelligence, bayesian inference, initialization, (16 more...)

Country: North America > United States > Texas > Travis County > Austin (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Neural Information Processing SystemsDec-31-2018

Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues

Mukherjee, Soumendu Sundar, Sarkar, Purnamrita, Wang, Y. X. Rachel, Yan, Bowei

artificial intelligence, bayesian inference, initialization, (14 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningOct-2-2018

Hierarchical community detection by recursive bi-partitioning

Li, Tianxi, Bhattacharyya, Sharmodeep, Sarkar, Purnamrita, Bickel, Peter J., Levina, Elizaveta

The problem of community detection in networks is usually formulated as finding a single partition of the network into some "correct" number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive bi-partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stopping rule suggests there are no further communities. Such an algorithm is model-free, extremely fast, and requires no tuning other than selecting a stopping rule. We show that there are regimes where it outperforms $K$-way spectral clustering, and propose a natural model for analyzing the algorithm's theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We also apply the algorithm to a dataset of statistics papers to construct a hierarchical tree of statistical research communities.

bayesian inference, matrix, survey article, (22 more...)

1810.01509

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Machine LearningJun-18-2018

Overlapping Clustering Models, and One (class) SVM to Bind Them All

Mao, Xueyu, Sarkar, Purnamrita, Chakrabarti, Deepayan

People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace. Many existing overlapping clustering methods model each person (or word, or book) as a non-negative weighted combination of "exemplars" who belong solely to one community, with some small noise. Geometrically, each person is a point on a cone whose corners are these exemplars. This basic form encompasses the widely used Mixed Membership Stochastic Blockmodel of networks (Airoldi et al., 2008) and its degree-corrected variants (Karrer et al. 2011; Jin et al., 2017), as well as topic models such as LDA (Blei et al., 2003). We show that a simple one-class SVM yields provably consistent parameter inference for all such models, and scales to large datasets. Experimental results on several simulated and real datasets show our algorithm (called SVM-cone) is both accurate and scalable.

artificial intelligence, machine learning, matrix, (18 more...)

1806.06945

Country: North America > United States (0.45)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

arXiv.org Machine LearningMar-17-2018, 19:00:00 GMT

Provable Estimation of the Number of Blocks in Block Models

Yan, Bowei, Sarkar, Purnamrita, Cheng, Xiuyuan

Community detection is a fundamental unsupervised learning problem for unlabeled networks which has a broad range of applications. Many community detection algorithms assume that the number of clusters $r$ is known apriori. In this paper, we propose an approach based on semi-definite relaxations, which does not require prior knowledge of model parameters like many existing convex relaxation methods and recovers the number of clusters and the clustering matrix exactly under a broad parameter regime, with probability tending to one. On a variety of simulated and real data experiments, we show that the proposed method often outperforms state-of-the-art techniques for estimating the number of clusters.

artificial intelligence, machine learning, matrix, (16 more...)

1705.0858

Country: North America > United States > Texas (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Neural Information Processing SystemsDec-31-2017

Convergence of Gradient EM on Multi-component Mixture of Gaussians

Yan, Bowei, Yin, Mingzhang, Sarkar, Purnamrita

In this paper, we study convergence properties of the gradient variant of Expectation-Maximization algorithm~\cite{lange1995gradient} for Gaussian Mixture Models for arbitrary number of clusters and mixing coefficients. We derive the convergence rate depending on the mixing coefficients, minimum and maximum pairwise distances between the true centers, dimensionality and number of components; and obtain a near-optimal local contraction radius. While there have been some recent notable works that derive local convergence rates for EM in the two symmetric mixture of Gaussians, in the more general case, the derivations need structurally different and non-trivial arguments. We use recent tools from learning theory and empirical processes to achieve our theoretical results.

artificial intelligence, gradient em, machine learning, (14 more...)

Country: North America > United States > Texas (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Neural Information Processing SystemsDec-31-2017

On clustering network-valued data

Mukherjee, Soumendu Sundar, Sarkar, Purnamrita, Lin, Lizhen

Community detection, which focuses on clustering nodes or detecting communities in (mostly) a single network, is a problem of considerable practical interest and has received a great deal of attention in the research community. While being able to cluster within a network is important, there are emerging needs to be able to \emph{cluster multiple networks}. This is largely motivated by the routine collection of network data that are generated from potentially different populations. These networks may or may not have node correspondence. When node correspondence is present, we cluster networks by summarizing a network by its graphon estimate, whereas when node correspondence is not present, we propose a novel solution for clustering such networks by associating a computationally feasible feature vector to each network based on trace of powers of the adjacency matrix. We illustrate our methods using both simulated and real data sets, and theoretical justifications are provided in terms of consistency.

artificial intelligence, data mining, matrix, (16 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications > Networks (0.88)