AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Gotta catch them all

#artificialintelligenceAug-21-2016, 21:55:16 GMT

When data becomes high-dimensional, the inherent relational structure between the variables can sometimes become unclear or indistinct. One, might want to find clusters for numerous amounts of reasons – me, I want to use it to better understand my childhood. To be more specific, I will be using clustering to highlight different groupings of pokemon. The results of this analysis can then retrospectively be applied to a younger me having to choose which pokemon I catch and keep, or perhaps which I must rather use in battle to gain experience points. The clusters should help me identify groupings of pokemons that assimilate with my style of play, be it catching pokemon who are specialist of their type, strong attackers, survivalist who have good defensive capabilities or pokemon who have the potential to become great as soon as they evolve.

artificial intelligence, machine learning, pokémon, (18 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback

An initial investigation: K-Means and Bisecting K-Means Algorithms for Clustering

#artificialintelligenceAug-21-2016, 13:27:47 GMT

Clustering is a class of Machine Learning Algorithms that looks to determine for clusters that represent similarity between groups of related data they each hold. While it is technically an Unsupervised type algorithm, in that it does not predict for a target variable, its application results in taking data that you might hypothesize has clusters that can categorize groups of related data, and forming clusters that represent data that have similarities. Thus, the effect of Clustering Algorithms could be viewed with the same effect as that of Classification Algorithms (a class type Supervised Algorithm). There are of course a number of type clustering algorithms, one being the K-Means Clustering Algorithm. The algorithm is shown as below.

algorithm, artificial intelligence, machine learning, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Iterative Views Agreement: An Iterative Low-Rank based Structured Optimization Method to Multi-View Spectral Clustering

Wang, Yang, Zhang, Wenjie, Wu, Lin, Lin, Xuemin, Fang, Meng, Pan, Shirui

arXiv.org Machine LearningAug-19-2016

Multi-view spectral clustering, which aims at yielding an agreement or consensus data objects grouping across multi-views with their graph laplacian matrices, is a fundamental clustering problem. Among the existing methods, Low-Rank Representation (LRR) based method is quite superior in terms of its effectiveness, intuitiveness and robustness to noise corruptions. However, it aggressively tries to learn a common low-dimensional subspace for multi-view data, while inattentively ignoring the local manifold structure in each view, which is critically important to the spectral clustering; worse still, the low-rank minimization is enforced to achieve the data correlation consensus among all views, failing to flexibly preserve the local manifold structure for each view. In this paper, 1) we propose a multi-graph laplacian regularized LRR with each graph laplacian corresponding to one view to characterize its local manifold structure. 2) Instead of directly enforcing the low-rank minimization among all views for correlation consensus, we separately impose low-rank constraint on each view, coupled with a mutual structural consensus constraint, where it is able to not only well preserve the local manifold structure but also serve as a constraint for that from other views, which iteratively makes the views more agreeable. Extensive experiments on real-world multi-view data sets demonstrate its superiority.

artificial intelligence, machine learning, spectral, (19 more...)

arXiv.org Machine Learning

1608.0556

Country: Oceania > Australia (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Local Network Community Detection with Continuous Optimization of Conductance and Weighted Kernel K-Means

van Laarhoven, Twan, Marchiori, Elena

arXiv.org Machine LearningAug-17-2016

Local network community detection is the task of finding a single community of nodes concentrated around few given seed nodes in a localized way. Conductance is a popular objective function used in many algorithms for local community detection. This paper studies a continuous relaxation of conductance. We show that continuous optimization of this objective still leads to discrete communities. We investigate the relation of conductance with weighted kernel k-means for a single community, which leads to the introduction of a new objective function, $\sigma$-conductance. Conductance is obtained by setting $\sigma$ to $0$. Two algorithms, EMc and PGDc, are proposed to locally optimize $\sigma$-conductance and automatically tune the parameter $\sigma$. They are based on expectation maximization and projected gradient descent, respectively. We prove locality and give performance guarantees for EMc and PGDc for a class of dense and well separated communities centered around the seeds. Experiments are conducted on networks with ground-truth communities, comparing to state-of-the-art graph diffusion algorithms for conductance optimization. On large graphs, results indicate that EMc and PGDc stay localized and produce communities most similar to the ground, while graph diffusion algorithms generate large communities of lower quality.

conductance, data mining, machine learning, (14 more...)

arXiv.org Machine Learning

1601.05775

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.67)
Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Lecture Notes on Spectral Graph Methods

Mahoney, Michael W.

arXiv.org Machine LearningAug-16-2016

These are lecture notes that are based on the lectures from a class I taught on the topic of Spectral Graph Methods at UC Berkeley during the Spring 2015 semester.

optimization problem, quality-of-approximation guarantee, survey article, (22 more...)

arXiv.org Machine Learning

1608.04845

Country: North America > United States > California (0.13)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Energy > Oil & Gas (0.92)
Education (0.87)
Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Add feedback

Consistency constraints for overlapping data clustering

Culbertson, Jared, Guralnik, Dan P., Hansen, Jakob, Stiller, Peter F.

arXiv.org Machine LearningAug-15-2016

We examine overlapping clustering schemes with functorial constraints, in the spirit of Carlsson--Memoli. This avoids issues arising from the chaining required by partition-based methods. Our principal result shows that any clustering functor is naturally constrained to refine single-linkage clusters and be refined by maximal-linkage clusters. We work in the context of metric spaces with non-expansive maps, which is appropriate for modeling data processing which does not increase information content.

artificial intelligence, machine learning, metric space, (18 more...)

arXiv.org Machine Learning

1608.04331

Country: North America > United States (0.46)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)

Add feedback

Incremental Method for Spectral Clustering of Increasing Orders

Chen, Pin-Yu, Zhang, Baichuan, Hasan, Mohammad Al, Hero, Alfred O.

arXiv.org Machine LearningAug-13-2016

The smallest eigenvalues and the associated eigenvectors (i.e., eigenpairs) of a graph Laplacian matrix have been widely used for spectral clustering and community detection. However, in real-life applications the number of clusters or communities (say, $K$) is generally unknown a-priori. Consequently, the majority of the existing methods either choose $K$ heuristically or they repeat the clustering method with different choices of $K$ and accept the best clustering result. The first option, more often, yields suboptimal result, while the second option is computationally expensive. In this work, we propose an incremental method for constructing the eigenspectrum of the graph Laplacian matrix. This method leverages the eigenstructure of graph Laplacian matrix to obtain the $K$-th eigenpairs of the Laplacian matrix given a collection of all the $K-1$ smallest eigenpairs. Our proposed method adapts the Laplacian matrix such that the batch eigenvalue decomposition problem transforms into an efficient sequential leading eigenpair computation problem. As a practical application, we consider user-guided spectral clustering. Specifically, we demonstrate that users can utilize the proposed incremental method for effective eigenpair computation and determining the desired number of clusters based on multiple clustering metrics.

data mining, eigenpair, machine learning, (15 more...)

arXiv.org Machine Learning

1512.07349

Country: North America > United States (0.94)

Genre: Research Report (0.50)

Industry: Energy > Power Industry (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Mini-Batch Spectral Clustering

Han, Yufei, Filippone, Maurizio

arXiv.org Machine LearningAug-12-2016

The cost of computing the spectrum of Laplacian matrices hinders the application of spectral clustering to large data sets. While approximations recover computational tractability, they can potentially affect clustering performance. This paper proposes a practical approach to learn spectral clustering based on adaptive stochastic gradient optimization. Crucially, the proposed approach recovers the exact spectrum of Laplacian matrices in the limit of the iterations, and the cost of each iteration is linear in the number of samples. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering for a given computational budget.

artificial intelligence, machine learning, spectral, (15 more...)

arXiv.org Machine Learning

1607.02024

Country: North America > United States (0.94)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Community Detection in Political Twitter Networks using Nonnegative Matrix Factorization Methods

Ozer, Mert, Kim, Nyunsu, Davulcu, Hasan

arXiv.org Machine LearningAug-5-2016

Community detection is a fundamental task in social network analysis. In this paper, first we develop an endorsement filtered user connectivity network by utilizing Heider's structural balance theory and certain Twitter triad patterns. Next, we develop three Nonnegative Matrix Factorization frameworks to investigate the contributions of different types of user connectivity and content information in community detection. We show that user content and endorsement filtered connectivity information are complementary to each other in clustering politically motivated users into pure political communities. Word usage is the strongest indicator of users' political orientation among all content categories. Incorporating user-word matrix and word similarity regularizer provides the missing link in connectivity only methods which suffer from detection of artificially large number of clusters for Twitter networks.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1608.01771

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Services (1.00)
Government (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

A balanced k-means algorithm for weighted point sets

Borgwardt, Steffen, Brieden, Andreas, Gritzmann, Peter

arXiv.org Machine LearningAug-3-2016

The classical $k$-means algorithm for partitioning $n$ points in $\mathbb{R}^d$ into $k$ clusters is one of the most popular and widely spread clustering methods. The need to respect prescribed lower bounds on the cluster sizes has been observed in many scientific and business applications. In this paper, we present and analyze a generalization of $k$-means that is capable of handling weighted point sets and prescribed lower and upper bounds on the cluster sizes. We call it weight-balanced $k$-means. The key difference to existing models lies in the ability to handle the combination of weighted point sets with prescribed bounds on the cluster sizes. This imposes the need to perform partial membership clustering, and leads to significant differences. For example, while finite termination of all $k$-means variants for unweighted point sets is a simple consequence of the existence of only finitely many partitions of a given set of points, the situation is more involved for weighted point sets, as there are infinitely many partial membership clusterings. Using polyhedral theory, we show that the number of iterations of weight-balanced $k$-means is bounded above by $n^{O(dk)}$, so in particular it is polynomial for fixed $k$ and $d$. This is similar to the known worst-case upper bound for classical $k$-means for unweighted point sets and unrestricted cluster sizes, despite the much more general framework. We conclude with the discussion of some additional favorable properties of our method.

artificial intelligence, machine learning, power diagram, (18 more...)

arXiv.org Machine Learning

1308.4004

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback