AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation

Boecking, Benedikt, Jeanselme, Vincent, Dubrawski, Artur

arXiv.org Machine LearningMar-23-2022

Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete constraints to a continuous domain to ease optimization when learning kernels or metrics can harm generalization, as information which only encodes linkage is transformed to informing distances. We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints. To generalize well, our method is designed to maximize constraint satisfaction without relaxing pairwise constraints to a continuous domain where they inform distances. We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets, and we discuss how our method can scale to handling large data.

algorithm, constraint, kernel, (15 more...)

arXiv.org Machine Learning

2203.12546

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.88)

Add feedback

Adaptative clustering by minimization of the mixing entropy criterion

Dumont, Thierry

arXiv.org Machine LearningMar-22-2022

We present a clustering method and provide a theoretical analysis and an explanation to a phenomenon encountered in the applied statistical literature since the 1990's. This phenomenon is the natural adaptability of the order when using a clustering method derived from the famous EM algorithm. We define a new statistic, the relative entropic order, that represents the number of clumps in the target distribution. We prove in particular that the empirical version of this relative entropic order is consistent. Our approach is easy to implement and has a high potential of applications. Perspectives of this works are algorithmic and theoretical, with possible natural extensions to various cases such as dependent or multidimensional data.

algorithm, entropy, mixing entropy, (16 more...)

arXiv.org Machine Learning

2203.11517

Country:

Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Unsupervised machine learning approaches to the $q$-state Potts model

Tirelli, Andrea, Carvalho, Danyella O., Oliveira, Lucas A., Lima, J. P., Costa, Natanael C., Santos, Raimundo R. dos

arXiv.org Artificial IntelligenceMar-18-2022

In this paper with study phase transitions of the $q$-state Potts model, through a number of unsupervised machine learning techniques, namely Principal Component Analysis (PCA), $k$-means clustering, Uniform Manifold Approximation and Projection (UMAP), and Topological Data Analysis (TDA). Even though in all cases we are able to retrieve the correct critical temperatures $T_c(q)$, for $q = 3, 4$ and $5$, results show that non-linear methods as UMAP and TDA are less dependent on finite size effects, while still being able to distinguish between first and second order phase transitions. This study may be considered as a benchmark for the use of different unsupervised machine learning algorithms in the investigation of phase transitions.

artificial intelligence, machine learning, transition, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1140/epjb/s10051-022-00453-3

2112.06735

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Hierarchical Clustering and Matrix Completion for the Reconstruction of World Input-Output Tables

Metulini, Rodolfo, Gnecco, Giorgio, Biancalani, Francesco, Riccaboni, Massimo

arXiv.org Machine LearningMar-16-2022

World Input-Output (I/O) matrices provide the networks of within- and cross-country economic relations. In the context of I/O analysis, the methodology adopted by national statistical offices in data collection raises the issue of obtaining reliable data in a timely fashion and it makes the reconstruction of (part of) the I/O matrices of particular interest. In this work, we propose a method combining hierarchical clustering and Matrix Completion (MC) with a LASSO-like nuclear norm penalty, to impute missing entries of a partially unknown I/O matrix. Through simulations based on synthetic matrices we study the effectiveness of the proposed method to predict missing values from both previous years data and current data related to countries similar to the one for which current data are obscured. To show the usefulness of our method, an application based on World Input-Output Database (WIOD) tables - which are an example of industry-by-industry I/O tables - is provided. Strong similarities in structure between WIOD and other I/O tables are also found, which make the proposed approach easily generalizable to them.

application, matrix, submatrix, (12 more...)

arXiv.org Machine Learning

2203.08819

Country:

Europe > Italy (0.07)
Europe > France (0.04)
Europe > Spain (0.04)
(23 more...)

Genre: Research Report (0.50)

Industry:

Energy (0.67)
Banking & Finance > Economy (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Natural Hierarchical Cluster Analysis by Nearest Neighbors with Near-Linear Time Complexity

Gokcesu, Kaan, Gokcesu, Hakan

arXiv.org Machine LearningMar-15-2022

We propose a nearest neighbor based clustering algorithm that results in a naturally defined hierarchy of clusters. In contrast to the agglomerative and divisive hierarchical clustering algorithms, our approach is not dependent on the iterative working of the algorithm, in the sense that the partitions of the hierarchical clusters are purely defined in accordance with the input dataset. Our method is a universal hierarchical clustering approach since it can be implemented as bottom up or top down versions, both of which result in the same clustering. We show that for certain types of datasets, our algorithm has near-linear time and space complexity.

algorithm, hierarchy, nearest neighbor, (12 more...)

arXiv.org Machine Learning

2203.08027

Country: Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Geometric reconstructions of density based clusterings

Garcia-Pulido, A. L., Samardzhiev, K. P.

arXiv.org Machine LearningMar-14-2022

DBSCAN* and HDBSCAN* are well established density based clustering algorithms. However, obtaining the clusters of very large datasets is infeasible, limiting their use in real world applications. By exploiting the geometry of Euclidean space, we prove that it is possible to systematically construct the DBSCAN* and HDBSCAN* clusters of a finite $X\subset \mathbb{R}^n$ from specific subsets of $X$. We are able to control the size of these subsets and therefore our results make it possible to cluster very large datasets. To illustrate our theory, we cluster the Microsoft Building Footprint Database of the US, which is not possible using the standard implementations.

artificial intelligence, iteration, machine learning, (16 more...)

arXiv.org Machine Learning

2203.0802

Country:

Europe > United Kingdom (0.28)
North America > United States > Utah (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Workflow (1.00)
Research Report (0.70)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Generalized Spectral Clustering for Directed and Undirected Graphs

Sevi, Harry, Jonckheere, Matthieu, Kalogeratos, Argyris

arXiv.org Machine LearningMar-14-2022

Spectral clustering is a popular approach for clustering undirected graphs, but its extension to directed graphs (digraphs) is much more challenging. A typical workaround is to naively symmetrize the adjacency matrix of the directed graph, which can however lead to discarding valuable information carried by edge directionality. In this paper, we present a generalized spectral clustering framework that can address both directed and undirected graphs. Our approach is based on the spectral relaxation of a new functional that we introduce as the generalized Dirichlet energy of a graph function, with respect to an arbitrary positive regularizing measure on the graph edges. We also propose a practical parametrization of the regularizing measure constructed from the iterated powers of the natural random walk on the graph. We present theoretical arguments to explain the efficiency of our framework in the challenging setting of unbalanced classes. Experiments using directed K-NN graphs constructed from real datasets show that our graph partitioning method performs consistently well in all cases, while outperforming existing approaches in most of them.

generalized spectral clustering, graph, random walk, (13 more...)

arXiv.org Machine Learning

2203.03221

Country:

Europe > Montenegro (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Data Science > Data Mining (0.88)

Add feedback

Cluster Assignment in Multi-Agent Systems

Sharf, Miel, Zelazo, Daniel

arXiv.org Artificial IntelligenceMar-13-2022

Abstract--We study cluster assignment in multi-agent networks. The process of reaching an agreement between agents is In this work we focus on homogeneous networks, that is one of the fundamental tasks for a multi-agent system (MAS). The problem we aim to solve is how to design graphs computation [1], robotics [2], biochemical systems [3], and that ensure the networked system will converge to a prescribed sensor networks [4]. A natural extension to the agreement cluster configuration, i.e., specifying the number of clusters problem is the cluster agreement problem, which seeks to and the number of agents within each cluster. Employing tools drive agents into groups. All the agents within the same group from group theory, we show that it is possible to design an should then reach an agreement.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.23919/ASCC56756.2022.9828091

2203.06642

Country:

North America > United States > New York (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

From graph cuts to isoperimetric inequalities: Convergence rates of Cheeger cuts on data clouds

Trillos, Nicolas Garcia, Murray, Ryan, Thorpe, Matthew

arXiv.org Machine LearningMar-11-2022

In this work we study statistical properties of graph-based clustering algorithms that rely on the optimization of balanced graph cuts, the main example being the optimization of Cheeger cuts. We consider proximity graphs built from data sampled from an underlying distribution supported on a generic smooth compact manifold $M$. In this setting, we obtain high probability convergence rates for both the Cheeger constant and the associated Cheeger cuts towards their continuum counterparts. The key technical tools are careful estimates of interpolation operators which lift empirical Cheeger cuts to the continuum, as well as continuum stability estimates for isoperimetric problems. To our knowledge the quantitative estimates obtained here are the first of their kind.

cheeger, inequality, minimizer, (15 more...)

arXiv.org Machine Learning

2004.09304

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Indiana (0.04)
(9 more...)

Genre:

Instructional Material > Course Syllabus & Notes (0.67)
Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Sparse Subspace Clustering for Concept Discovery (SSCCD)

Vielhaben, Johanna, Blücher, Stefan, Strodthoff, Nils

arXiv.org Machine LearningMar-11-2022

Concepts are key building blocks of higher level human understanding. Explainable AI (XAI) methods have shown tremendous progress in recent years, however, local attribution methods do not allow to identify coherent model behavior across samples and therefore miss this essential component. In this work, we study concept-based explanations and put forward a new definition of concepts as low-dimensional subspaces of hidden feature layers. We novelly apply sparse subspace clustering to discover these concept subspaces. Moving forward, we derive insights from concept subspaces in terms of localized input (concept) maps, show how to quantify concept relevances and lastly, evaluate similarities and transferability between concepts. We empirically demonstrate the soundness of the proposed Sparse Subspace Clustering for Concept Discovery (SSCCD) method for a variety of different image classification tasks. This approach allows for deeper insights into the actual model behavior that would remain hidden from conventional input-level heatmaps.

concept relevance, feature space, subspace, (14 more...)

arXiv.org Machine Learning

2203.06043

Country:

Europe > Germany > Berlin (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback