AITopics

Twenty-Third International Joint Conference on Artificial Intelligence

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Kent, Brian P., Rinaldo, Alessandro, Verstynen, Timothy

DeBaCl: A Python Package for Interactive DEnsity-BAsed CLustering

arXiv.org Machine LearningJul-30-2013

The level set tree approach of Hartigan (1975) provides a probabilistically based and highly interpretable encoding of the clustering behavior of a dataset. By representing the hierarchy of data modes as a dendrogram of the level sets of a density estimator, this approach offers many advantages for exploratory analysis and clustering, especially for complex and high-dimensional data. Several R packages exist for level set tree estimation, but their practical usefulness is limited by computational inefficiency, absence of interactive graphical capabilities and, from a theoretical perspective, reliance on asymptotic approximations. To make it easier for practitioners to capture the advantages of level set trees, we have written the Python package DeBaCl for DEnsity-BAsed CLustering. In this article we illustrate how DeBaCl's level set tree estimates can be used for difficult clustering tasks and interactive graphical data analysis. The package is intended to promote the practical use of level set trees through improvements in computational efficiency and a high degree of user customization. In addition, the flexible algorithms implemented in DeBaCl enjoy finite sample accuracy, as demonstrated in recent literature on density clustering. Finally, we show the level set tree framework can be easily extended to deal with functional data. Keywords: density-based clustering, level set tree, Python, interactive graphics, functional data analysis.

algorithm, debacl, python package, (15 more...)

1307.8136

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.64)

Industry: Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Balakrishnan, Sivaraman, Narayanan, Srivatsan, Rinaldo, Alessandro, Singh, Aarti, Wasserman, Larry

Cluster Trees on Manifolds

arXiv.org Machine LearningJul-24-2013

In this paper we investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$. We analyze a modified version of a $k$-nearest neighbor based algorithm recently proposed by Chaudhuri and Dasgupta. The main results of this paper show that under mild assumptions on $f$ and $M$, we obtain rates of convergence that depend on $d$ only but not on the ambient dimension $D$. We also show that similar (albeit non-algorithmic) results can be obtained for kernel density estimators. We sketch a construction of a sample complexity lower bound instance for a natural class of manifold oblivious clustering algorithms. We further briefly consider the known manifold case and show that in this case a spatially adaptive algorithm achieves better rates.

algorithm, artificial intelligence, machine learning, (17 more...)

1307.6515

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

arXiv.org Machine LearningJul-22-2013

Performance comparison of State-of-the-art Missing Value Imputation Algorithms on Some Bench mark Datasets

Kumar, M. Naresh

The presence of missing values influences the selection of appropriate set of attributes that render degradation in classification accuracies of the classifiers. Missing values are a common problem in almost all real world data sets [1] used in knowledge discovery and data mining(KDD) applications. Specifically they are more frequent in clinical databases [2, 3, 4] and temporal climate databases [5, 6]. Their presence would greatly affect the performance of classifiers [7]. The missing values in the databases may arise due various reasons such as the value being lost (erased or deleted) or not recorded, incorrect measurements, equipment errors, or possibly due to an expert not attaching any importance to a particular procedure. The incomplete data can be identified by looking for null values in the data set. However, this is not always true, since missing values can appear in the form of outliers or even wrong data (i.e.

accuracy, data mining, machine learning, (13 more...)

1307.5599

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (0.69)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Tan, Kean Ming, Witten, Daniela, Shojaie, Ali

The Cluster Graphical Lasso for improved estimation of Gaussian graphical models

arXiv.org Machine LearningJul-19-2013

We consider the task of estimating a Gaussian graphical model in the high-dimensional setting. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to an l1 penalty, is a well-studied approach for this task. We begin by introducing a surprising connection between the graphical lasso and hierarchical clustering: the graphical lasso in effect performs a two-step procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) an l1-penalized log likelihood is maximized on the subset of variables within each connected component. In other words, the graphical lasso determines the connected components of the estimated network via single linkage clustering. Unfortunately, single linkage clustering is known to perform poorly in certain settings. Therefore, we propose the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphical lasso on the subset of variables within each cluster. We establish model selection consistency for this technique, and demonstrate its improved performance relative to the graphical lasso in a simulation study, as well as in applications to an equities data set, a university webpage data set, and a gene expression data set.

artificial intelligence, graphical lasso, machine learning, (17 more...)

1307.5339

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Shiokawa, Hiroaki (Nippon Telegraph and Telephone Corporation) | Fujiwara, Yasuhiro (Nippon Telegraph and Telephone Corporation) | Onizuka, Makoto (Nippon Telegraph and Telephone Corporation)

Fast Algorithm for Modularity-Based Graph Clustering

In AI and Web communities, modularity-based graph clustering algorithms are being applied to various applications. However, existing algorithms are not applied to large graphs because they have to scan all vertices/edges iteratively. The goal of this paper is to efficiently compute clusters with high modularity from extremely large graphs with more than a few billion edges. The heart of our solution is to compute clusters by incrementally pruning unnecessary vertices/edges and optimizing the order of vertex selections. Our experiments show that our proposal outperforms all other modularity-based algorithms in terms of computation time, and it finds clusters with high modularity.

fast algorithm, modularity-based graph clustering

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Towards Cohesive Anomaly Mining

Xiong, Yun (Fudan University) | Zhu, Yangyong (Fudan University) | Yu, Philip S. (University of Illinois at Chicago) | Pei, Jian (Simon Fraser University)

In some applications, such as bioinformatics, social network analysis, and computational criminology, it is desirable to find compact clusters formed by a (very) small portion of objects in a large data set. Since such clusters are comprised of a small number of objects, they are extraordinary and anomalous with respect to the entire data set. This specific type of clustering task cannot be solved well by the conventional clustering methods since generally those methods try to assign most of the data objects into clusters. In this paper, we model this novel and application-inspired task as the problem of mining cohesive anomalies. We propose a general framework and a principled approach to tackle the problem. The experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our approach.

artificial intelligence, cohesive anomaly mining, social media, (1 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Communications > Social Media (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Spectral Rotation versus K-Means in Spectral Clustering

Huang, Jin (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)

Spectral clustering has been a popular data clustering algorithm. This category of approaches often resort to other clustering methods, such as K-Means, to get the final cluster. The potential flaw of such common practice is that the obtained relaxed continuous spectral solution could severely deviate from the true discrete solution. In this paper, we propose to impose an additional orthonormal constraint to better approximate the optimal continuous solution to the graph cut objective functions. Such a method, called spectral rotation in literature, optimizes the spectral clustering objective functions better than K -Means, and improves the clustering accuracy. We would provide efficient algorithm to solve the new problem rigorously, which is not significantly more costly than K-Means. We also establish the connection between our method andK-Means to provide theoretical motivation of our method. Experimental results show that our algorithm consistently reaches better cut and meanwhile outperforms in clustering metrics than classic spectral clustering methods.

artificial intelligence, k-means, machine learning, (15 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Texas > Tarrant County > Arlington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Guo, Yuhong (Temple University)

Convex Subspace Representation Learning from Multi-View Data

Learning from multi-view data is important in many applications. In this paper, we propose a novel convex subspace representation learning method for unsupervised multi-view clustering. We ﬁrst formulate the subspace learning with multiple views as a joint optimization problem with a common subspace representation matrix and a group sparsity inducing norm. By exploiting the properties of dual norms, we then show a convex min-max dual formulation with a sparsity inducing trace norm can be obtained. We develop a proximal bundle optimization algorithm to globally solve the min-max optimization problem. Our empirical study shows the proposed subspace representation learning method can effectively facilitate multi-view clustering and induce superior clustering results than alternative multi-view clustering methods.

artificial intelligence, machine learning, representation, (15 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

North America > United States > Wisconsin (0.05)
North America > United States > Texas (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Formalizing Hierarchical Clustering as Integer Linear Programming

Gilpin, Sean (University of California, Davis) | Nijssen, Siegried (Katholieke Universiteit Leuven) | Davidson, Ian (University of California, Davis)

Hierarchical clustering is typically implemented as a greedy heuristic algorithm with no explicit objective function. In this work we formalize hierarchical clustering as an integer linear programming (ILP) problem with a natural objective function and the dendrogram properties enforced as linear constraints. Though exact solvers exists for ILP we show that a simple randomized algorithm and a linear programming (LP) relaxation can be used to provide approximate solutions faster. Formalizing hierarchical clustering also has the benefit that relaxing the constraints can produce novel problem variations such as overlapping clusterings. Our experiments show that our formulation is capable of outperforming standard agglomerative clustering algorithms in a variety of settings, including traditional hierarchical clustering as well as learning overlapping clusterings.

artificial intelligence, hierarchy, machine learning, (14 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

North America > United States > California > Yolo County > Davis (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Kansas (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)