AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Improved Clustering with Augmented k-means

Howe, J. Andrew

arXiv.org Machine LearningMay-22-2017

Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can't be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1705.07592

Country:

North America > United States > Illinois (0.14)
North America > United States > California (0.14)
Asia > Middle East > Saudi Arabia (0.14)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Data Filtering for Cluster Analysis by $\ell_0$-Norm Regularization

Cristofari, Andrea

arXiv.org Machine LearningMay-22-2017

A data filtering method for cluster analysis is proposed, based on minimizing a least squares function with a weighted $\ell_0$-norm penalty. To overcome the discontinuity of the objective function, smooth non-convex functions are employed to approximate the $\ell_0$-norm. The convergence of the global minimum points of the approximating problems towards global minimum points of the original problem is stated. The proposed method also exploits a suitable technique to choose the penalty parameter. Numerical results on synthetic and real data sets are finally provided, showing how some existing clustering methods can take advantages from the proposed filtering strategy.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1007/s11590-017-1152-7

1607.08756

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Graphons, mergeons, and so on!

Eldridge, Justin, Belkin, Mikhail, Wang, Yusu

arXiv.org Machine LearningMay-22-2017

In this work we develop a theory of hierarchical clustering for graphs. Our modeling assumption is that graphs are sampled from a graphon, which is a powerful and general model for generating graphs and analyzing large networks. Graphons are a far richer class of graph models than stochastic blockmodels, the primary setting for recent progress in the statistical theory of graph clustering. We define what it means for an algorithm to produce the "correct" clustering, give sufficient conditions in which a method is statistically consistent, and provide an explicit algorithm satisfying these properties.

artificial intelligence, cluster tree, machine learning, (16 more...)

arXiv.org Machine Learning

1607.01718

Country: North America > United States (1.00)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback

Co-clustering through Optimal Transport

Laclau, Charlotte, Redko, Ievgen, Matei, Basarab, Bennani, Younès, Brault, Vincent

arXiv.org Machine LearningMay-19-2017

In this paper, we present a novel method for co-clustering, an unsupervised learning approach that aims at discovering homogeneous groups of data instances and features by grouping them simultaneously. The proposed method uses the entropy regularized optimal transport between empirical measures defined on data instances and features in order to obtain an estimated joint probability density function represented by the optimal coupling matrix. This matrix is further factorized to obtain the induced row and columns partitions using multiscale representations approach. To justify our method theoretically, we show how the solution of the regularized optimal transport can be seen from the vari-ational inference perspective thus motivating its use for co-clustering. The algorithm derived for the proposed method and its kernelized version based on the notion of Gromov-Wasserstein distance are fast, accurate and can determine automatically the number of both row and column clusters. These features are vividly demonstrated through extensive experimental evaluations.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1705.06189

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Discovering the Graph Structure in the Clustering Results

Bauman, Evgeny, Bauman, Konstantin

arXiv.org Machine LearningMay-18-2017

In a standard cluster analysis, such as k-means, in addition to clusters locations and distances between them, it's important to know if they are connected or well separated from each other. The main focus of this paper is discovering the relations between the resulting clusters. We propose a new method which is based on pairwise overlapping k-means clustering, that in addition to means of clusters provides the graph structure of their relations. The proposed method has a set of parameters that can be tuned in order to control the sensitivity of the model and the desired relative size of the pairwise overlapping interval between means of two adjacent clusters, i.e., level of overlapping. We present the exact formula for calculating that parameter. The empirical study presented in the paper demonstrates that our approach works well not only on toy data but also compliments standard clustering results with a reasonable graph structure on real datasets, such as financial indices and restaurants.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1705.06753

Country:

Europe (0.69)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Consumer Products & Services > Restaurants (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Identification and Off-Policy Learning of Multiple Objectives Using Adaptive Clustering

Karimpanal, Thommen George, Wilhelm, Erik

arXiv.org Artificial IntelligenceMay-17-2017

In this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q-learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2017.04.074

1705.06342

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Singapore (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Uniform Hypergraph Partitioning: Provable Tensor Methods and Sampling Techniques

Ghoshdastidar, Debarghya, Dukkipati, Ambedkar

arXiv.org Machine LearningMay-17-2017

In a series of recent works, we have generalised the consistency results in the stochastic block model literature to the case of uniform and non-uniform hypergraphs. The present paper continues the same line of study, where we focus on partitioning weighted uniform hypergraphs---a problem often encountered in computer vision. This work is motivated by two issues that arise when a hypergraph partitioning approach is used to tackle computer vision problems: (i) The uniform hypergraphs constructed for higher-order learning contain all edges, but most have negligible weights. Thus, the adjacency tensor is nearly sparse, and yet, not binary. (ii) A more serious concern is that standard partitioning algorithms need to compute all edge weights, which is computationally expensive for hypergraphs. This is usually resolved in practice by merging the clustering algorithm with a tensor sampling strategy---an approach that is yet to be analysed rigorously. We build on our earlier work on partitioning dense unweighted uniform hypergraphs (Ghoshdastidar and Dukkipati, ICML, 2015), and address the aforementioned issues by proposing provable and efficient partitioning algorithms. Our analysis justifies the empirical success of practical sampling techniques. We also complement our theoretical findings by elaborate empirical comparison of various hypergraph partitioning schemes.

artificial intelligence, hypergraph, machine learning, (16 more...)

arXiv.org Machine Learning

1602.06516

Country: Asia (0.45)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Add feedback

Cluster Validation In Unsupervised Machine Learning

#artificialintelligenceMay-16-2017, 23:15:33 GMT

Just look at it: it not only gives you a summary of all the specified validation measures across different clustering algorithms and number of inspected clusters, but also it lists those algorithms and number of clusters pairs that performed best in regard to a given validation metric.

algorithm, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Identifying Original Projects in App Inventor

Mustafaraj, Eni (Wellesley College) | Turbak, Franklyn (Wellesley College) | Svanberg, Maja (Wellesley College)

AAAI ConferencesMay-16-2017

Millions of users use online, open-ended blocks programming environments like App Inventor to learn how to program and to build personally meaningful programs and apps. As part of understanding the computational thinking concepts being learned by these users, we want to distinguish original projects that they create from unoriginal ones that arise from learning activities like tutorials and exercises. Given all the projects of students taking an App Inventor course, we describe how to automatically classify them as original vs. unoriginal using a hierarchical clustering technique. Although our current analysis focuses only on a small group of users (16 students taking a course in our institution) and their 902 projects, our findings establish a foundation for extending this analysis to larger groups of users.

artificial intelligence, identifying original project, machine learning, (1 more...)

AAAI Conferences

The Thirtieth International Flairs Conference

Genre: Instructional Material (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Add feedback

Locally linear representation for image clustering

Zhen, Liangli, Yi, Zhang, Peng, Xi, Peng, Dezhong

arXiv.org Machine LearningMay-16-2017

It is a key to construct a similarity graph in graph-oriented subspace learning and clustering. In a similarity graph, each vertex denotes a data point and the edge weight represents the similarity between two points. There are two popular schemes to construct a similarity graph, i.e., pairwise distance based scheme and linear representation based scheme. Most existing works have only involved one of the above schemes and suffered from some limitations. Specifically, pairwise distance based methods are sensitive to the noises and outliers compared with linear representation based methods. On the other hand, there is the possibility that linear representation based algorithms wrongly select inter-subspaces points to represent a point, which will degrade the performance. In this paper, we propose an algorithm, called Locally Linear Representation (LLR), which integrates pairwise distance with linear representation together to address the problems. The proposed algorithm can automatically encode each data point over a set of points that not only could denote the objective point with less residual error, but also are close to the point in Euclidean space. The experimental results show that our approach is promising in subspace learning and subspace clustering.

artificial intelligence, machine learning, similarity graph, (14 more...)

arXiv.org Machine Learning

1304.6487

Country: Asia > China (0.16)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.70)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback