AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Fitting a Simplicial Complex using a Variation of k-means

Beben, Piotr

arXiv.org Machine LearningAug-2-2016

We give a simple and effective two stage algorithm for approximating a point cloud $\mathcal{S}\subset\mathbb{R}^m$ by a simplicial complex $K$. The first stage is an iterative fitting procedure that generalizes k-means clustering, while the second stage involves deleting redundant simplices. A form of dimension reduction of $\mathcal{S}$ is obtained as a consequence.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Machine Learning

1607.03849

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Symmetry-free SDP Relaxations for Affine Subspace Clustering

Silvestri, Francesco, Reinelt, Gerhard, Schnörr, Christoph

arXiv.org Machine LearningJul-25-2016

We consider clustering problems where the goal is to determine an optimal partition of a given point set in Euclidean space in terms of a collection of affine subspaces. While there is vast literature on heuristics for this kind of problem, such approaches are known to be susceptible to poor initializations and getting trapped in bad local optima. We alleviate these issues by introducing a semidefinite relaxation based on Lasserre's method of moments. While a similiar approach is known for classical Euclidean clustering problems, a generalization to our more general subspace scenario is not straightforward, due to the high symmetry of the objective function that weakens any convex relaxation. We therefore introduce a new mechanism for symmetry breaking based on covering the feasible region with polytopes. Additionally, we introduce and analyze a deterministic rounding heuristic.

artificial intelligence, clustering, machine learning, (19 more...)

arXiv.org Machine Learning

1607.07387

Country: Europe > Germany (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

KMeans Clustering Implementation with TensorFlow and Performance Comparison with SkLearn KMeans - Deep Cognition Labs

#artificialintelligenceJul-21-2016, 01:51:32 GMT

This post describes implementation of K-Means Clustering algorithm using TensorFlow. I have tested the code with GPU (Nvidia GTX 1080 Founders Edition) accelerated TensorFlow and for large dataset it seems to be 2-3 times faster than the CPU based sklearn Kmeans implementation based on number of samples.

artificial intelligence, machine learning, tensorflow and performance comparison, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.83)

Add feedback

Admissible Hierarchical Clustering Methods and Algorithms for Asymmetric Networks

Carlsson, Gunnar, Mémoli, Facundo, Ribeiro, Alejandro, Segarra, Santiago

arXiv.org Machine LearningJul-21-2016

This paper characterizes hierarchical clustering methods that abide by two previously introduced axioms -- thus, denominated admissible methods -- and proposes tractable algorithms for their implementation. We leverage the fact that, for asymmetric networks, every admissible method must be contained between reciprocal and nonreciprocal clustering, and describe three families of intermediate methods. Grafting methods exchange branches between dendrograms generated by different admissible methods. The convex combination family combines admissible methods through a convex operation in the space of dendrograms, and thirdly, the semi-reciprocal family clusters nodes that are related by strong cyclic influences in the network. Algorithms for the computation of hierarchical clusters generated by reciprocal and nonreciprocal clustering as well as the grafting, convex combination, and semi-reciprocal families are derived using matrix operations in a dioid algebra. Finally, the introduced clustering methods and algorithms are exemplified through their application to a network describing the interrelation between sectors of the United States (U.S.) economy.

banking & finance, dissimilarity, us government, (19 more...)

arXiv.org Machine Learning

1607.06335

Country:

North America > United States (1.00)
Europe (0.14)

Genre: Research Report (0.40)

Industry:

Banking & Finance (1.00)
Energy > Oil & Gas (0.93)
Government > Regional Government > North America Government > United States Government (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Hierarchical Clustering of Asymmetric Networks

Carlsson, Gunnar, Mémoli, Facundo, Ribeiro, Alejandro, Segarra, Santiago

arXiv.org Machine LearningJul-21-2016

This paper considers networks where relationships between nodes are represented by directed dissimilarities. The goal is to study methods that, based on the dissimilarity structure, output hierarchical clusters, i.e., a family of nested partitions indexed by a connectivity parameter. Our construction of hierarchical clustering methods is built around the concept of admissible methods, which are those that abide by the axioms of value - nodes in a network with two nodes are clustered together at the maximum of the two dissimilarities between them - and transformation - when dissimilarities are reduced, the network may become more clustered but not less. Two particular methods, termed reciprocal and nonreciprocal clustering, are shown to provide upper and lower bounds in the space of admissible methods. Furthermore, alternative clustering methodologies and axioms are considered. In particular, modifying the axiom of value such that clustering in two-node networks occurs at the minimum of the two dissimilarities entails the existence of a unique admissible clustering method.

artificial intelligence, dissimilarity, machine learning, (16 more...)

arXiv.org Machine Learning

1607.06294

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Enhance Machine Learning with Standardizing, Binning, Reducing

#artificialintelligenceJul-20-2016, 18:51:04 GMT

Now let's run the NbClust algorithm to estimate the ideal number of clusters (Figure 4). Figure 4. Number of clusters chosen by 26 indices, post standardization.

artificial intelligence, machine learning, standardization, (16 more...)

#artificialintelligence

Country: Europe > Italy (0.05)

Genre: Research Report (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.59)

Add feedback

A demo of K-Means clustering on the handwritten digits data -- scikit-learn 0.17.1 documentation

#artificialintelligenceJul-14-2016, 23:55:35 GMT

In this example we compare the various initialization strategies for K-means in terms of runtime and quality of the results. As the ground truth is known here, we also apply different cluster quality metrics to judge the goodness of fit of the cluster labels to the ground truth.

artificial intelligence, handwritten digit data, machine learning, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback

Algorithms for Generalized Cluster-wise Linear Regression

Park, Young Woong, Jiang, Yan, Klabjan, Diego, Williams, Loren

arXiv.org Machine LearningJul-11-2016

Cluster-wise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. We generalize the CLR problem by allowing each entity to have more than one observation, and refer to it as generalized CLR. We propose an exact mathematical programming based approach relying on column generation, a column generation based heuristic algorithm that clusters predefined groups of entities, a metaheuristic genetic algorithm with adapted Lloyd's algorithm for K-means clustering, a two-stage approach, and a modified algorithm of Sp{\"a}th \cite{Spath1979} for solving generalized CLR. We examine the performance of our algorithms on a stock keeping unit (SKU) clustering problem employed in forecasting halo and cannibalization effects in promotions using real-world retail data from a large supermarket chain. In the SKU clustering problem, the retailer needs to cluster SKUs based on their seasonal effects in response to promotions. The seasonal effects are the results of regressions with predictors being promotion mechanisms and seasonal dummies performed over clusters generated. We compare the performance of all proposed algorithms for the SKU problem with real-world and synthetic data.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

doi: 10.1287/ijoc.2016.0729

1607.01417

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Industry:

Retail (0.88)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Document Clustering Games in Static and Dynamic Scenarios

Tripodi, Rocco, Pelillo, Marcello

arXiv.org Artificial IntelligenceJul-8-2016

In this work we propose a game theoretic model for document clustering. Each document to be clustered is represented as a player and each cluster as a strategy. The players receive a reward interacting with other players that they try to maximize choosing their best strategies. The geometry of the data is modeled with a weighted graph that encodes the pairwise similarity among documents, so that similar players are constrained to choose similar strategies, updating their strategy preferences at each iteration of the games. We used different approaches to find the prototypical elements of the clusters and with this information we divided the players into two disjoint sets, one collecting players with a definite strategy and the other one collecting players that try to learn from others the correct strategy to play. The latter set of players can be considered as new data points that have to be clustered according to previous information. This representation is useful in scenarios in which the data are streamed continuously. The evaluation of the system was conducted on 13 document datasets using different settings. It shows that the proposed method performs well compared to different document clustering algorithms.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-319-53375-9_2

1607.02436

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Efficient Estimation in the Tails of Gaussian Copulas

Nagaraj, Kalyani, Xu, Jie, Pasupathy, Raghu, Ghosh, Soumyadip

arXiv.org Machine LearningJul-5-2016

We consider the question of efficient estimation in the tails of Gaussian copulas. Our special focus is estimating expectations over multi-dimensional constrained sets that have a small implied measure under the Gaussian copula. We propose three estimators, all of which rely on a simple idea: identify certain \emph{dominating} point(s) of the feasible set, and appropriately shift and scale an exponential distribution for subsequent use within an importance sampling measure. As we show, the efficiency of such estimators depends crucially on the local structure of the feasible set around the dominating points. The first of our proposed estimators $\estOpt$ is the "full-information" estimator that actively exploits such local structure to achieve bounded relative error in Gaussian settings. The second and third estimators $\estExp$, $\estLap$ are "partial-information" estimators, for use when complete information about the constraint set is not available, they do not exhibit bounded relative error but are shown to achieve polynomial efficiency. We provide sharp asymptotics for all three estimators. For the NORTA setting where no ready information about the dominating points or the feasible set structure is assumed, we construct a multinomial mixture of the partial-information estimator $\estLap$ resulting in a fourth estimator $\estNt$ with polynomial efficiency, and implementable through the ecoNORTA algorithm. Numerical results on various example problems are remarkable, and consistent with theory.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

1607.01375

Country:

North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Modeling & Simulation (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)

Add feedback