AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

NBA2Vec: Dense feature representations of NBA players

Guan, Webster, Javed, Nauman, Lu, Peter

arXiv.org Artificial IntelligenceFeb-26-2023

Understanding a player's performance in a basketball game requires an evaluation of the player in the context of their teammates and the opposing lineup. Here, we present NBA2Vec, a neural network model based on Word2Vec which extracts dense feature representations of each player by predicting play outcomes without the use of hand-crafted heuristics or aggregate statistical measures. Specifically, our model aimed to predict the outcome of a possession given both the offensive and defensive players on the court. By training on over 3.5 million plays involving 1551 distinct players, our model was able to achieve a 0.3 K-L divergence with respect to the empirical play-by-play distribution. The resulting embedding space is consistent with general classifications of player position and style, and the embedding dimensions correlated at a significant level with traditional box score metrics. Finally, we demonstrate that NBA2Vec accurately predicts the outcomes to various 2017 NBA Playoffs series, and shows potential in determining optimal lineup match-ups. Future applications of NBA2Vec embeddings to characterize players' style may revolutionize predictive models for player acquisition and coaching decisions that maximize team success.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2302.13386

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > Italy (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Basketball (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Changes in Commuter Behavior from COVID-19 Lockdowns in the Atlanta Metropolitan Area

Santanam, Tejas, Trasatti, Anthony, Zhang, Hanyu, Riley, Connor, Van Hentenryck, Pascal, Krishnan, Ramayya

arXiv.org Artificial IntelligenceFeb-26-2023

This paper analyzes the impact of COVID-19 related lockdowns in the Atlanta, Georgia metropolitan area by examining commuter patterns in three periods: prior to, during, and after the pandemic lockdown. A cellular phone location dataset is utilized in a novel pipeline to infer the home and work locations of thousands of users from the Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. The coordinates derived from the clustering are put through a reverse geocoding process from which word embeddings are extracted in order to categorize the industry of each work place based on the workplace name and Point of Interest (POI) mapping. Frequencies of commute from home locations to work locations are analyzed in and across all three time periods. Public health and economic factors are discussed to explain potential reasons for the observed changes in commuter patterns.

artificial intelligence, machine learning, work location, (19 more...)

arXiv.org Artificial Intelligence

2302.13512

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.24)
North America > United States > Texas > Harris County > Houston (0.14)
North America > Mexico (0.04)
(2 more...)

Genre:

Research Report (0.90)
Overview (0.54)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Geographic Information Systems (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

A parameter-free graph reduction for spectral clustering and SpectralNet

Alshammari, Mashaan, Stavrakakis, John, Takatsuka, Masahiro

arXiv.org Artificial IntelligenceFeb-25-2023

Graph-based clustering methods like spectral clustering and SpectralNet are very efficient in detecting clusters of non-convex shapes. Unlike the popular $k$-means, graph-based clustering methods do not assume that each cluster has a single mean. However, these methods need a graph where vertices in the same cluster are connected by edges of large weights. To achieve this goal, many studies have proposed graph reduction methods with parameters. Unfortunately, these parameters have to be tuned for every dataset. We introduce a graph reduction method that does not require any parameters. First, the distances from every point $p$ to its neighbors are filtered using an adaptive threshold to only keep neighbors with similar surrounding density. Second, the similarities with close neighbors are computed and only high similarities are kept. The edges that survive these two filtering steps form the constructed graph that was passed to spectral clustering and SpectralNet. The experiments showed that our method provides a stable alternative, where other methods performance fluctuated according to the setting of their parameters.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.array.2022.100192

2302.13165

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback

Random projection tree similarity metric for SpectralNet

Alshammari, Mashaan, Stavrakakis, John, Ahmed, Adel F., Takatsuka, Masahiro

arXiv.org Artificial IntelligenceFeb-25-2023

SpectralNet is a graph clustering method that uses neural network to find an embedding that separates the data. So far it was only used with $k$-nn graphs, which are usually constructed using a distance metric (e.g., Euclidean distance). $k$-nn graphs restrict the points to have a fixed number of neighbors regardless of the local statistics around them. We proposed a new SpectralNet similarity metric based on random projection trees (rpTrees). Our experiments revealed that SpectralNet produces better clustering accuracy using rpTree similarity metric compared to $k$-nn graph with a distance metric. Also, we found out that rpTree parameters do not affect the clustering accuracy. These parameters include the leaf size and the selection of projection direction. It is computationally efficient to keep the leaf size in order of $\log(n)$, and project the points onto a random direction instead of trying to find the direction with the maximum dispersion.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.array.2022.100274

2302.13168

Country:

Asia > Middle East > Saudi Arabia > Eastern Province > Dhahran (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Semi-supervised Clustering with Two Types of Background Knowledge: Fusing Pairwise Constraints and Monotonicity Constraints

González-Almagro, Germán, Suárez, Juan Luis, Sánchez-Bermejo, Pablo, Cano, José-Ramón, García, Salvador

arXiv.org Artificial IntelligenceFeb-25-2023

This study addresses the problem of performing clustering in the presence of two types of background knowledge: pairwise constraints and monotonicity constraints. To achieve this, the formal framework to perform clustering under monotonicity constraints is, firstly, defined, resulting in a specific distance measure. Pairwise constraints are integrated afterwards by designing an objective function which combines the proposed distance measure and a pairwise constraint-based penalty term, in order to fuse both types of information. This objective function can be optimized with an EM optimization scheme. The proposed method serves as the first approach to the problem it addresses, as it is the first method designed to work with the two types of background knowledge mentioned above. Our proposal is tested in a variety of benchmark datasets and in a real-world case of study.

constraint, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.1406

Country:

North America > United States > Wisconsin (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Europe > Spain > Andalusia > Jaén Province > Jaén (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization

Singh, Prachi, Kaul, Amrit, Ganapathy, Sriram

arXiv.org Artificial IntelligenceFeb-24-2023

Conventional methods for speaker diarization involve windowing an audio file into short segments to extract speaker embeddings, followed by an unsupervised clustering of the embeddings. This multi-step approach generates speaker assignments for each segment. In this paper, we propose a novel Supervised HierArchical gRaph Clustering algorithm (SHARC) for speaker diarization where we introduce a hierarchical structure using Graph Neural Network (GNN) to perform supervised clustering. The supervision allows the model to update the representations and directly improve the clustering performance, thus enabling a single-step approach for diarization. In the proposed work, the input segment embeddings are treated as nodes of a graph with the edge weights corresponding to the similarity scores between the nodes. We also propose an approach to jointly update the embedding extractor and the GNN model to perform end-to-end speaker diarization (E2E-SHARC). During inference, the hierarchical clustering is performed using node densities and edge existence probabilities to merge the segments until convergence. In the diarization experiments, we illustrate that the proposed E2E-SHARC approach achieves 53% and 44% relative improvements over the baseline systems on benchmark datasets like AMI and Voxconverse, respectively.

artificial intelligence, diarization, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2302.12716

Country:

Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in Disease Progression

Qin, Yuchao, van der Schaar, Mihaela, Lee, Changhee

arXiv.org Artificial IntelligenceFeb-24-2023

Clustering time-series data in healthcare is crucial for clinical phenotyping to understand patients' disease progression patterns and to design treatment guidelines tailored to homogeneous patient subgroups. While rich temporal dynamics enable the discovery of potential clusters beyond static correlations, two major challenges remain outstanding: i) discovery of predictive patterns from many potential temporal correlations in the multi-variate time-series data and ii) association of individual temporal patterns to the target label distribution that best characterizes the underlying clinical progression. To address such challenges, we develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data. We introduce an efficient representation learning approach in frequency domain that can encode variable-length, irregularly-sampled time-series into a unified representation space, which is then applied to identify various temporal patterns that potentially contribute to the target label using a new notion of path-based similarity. Throughout the experiments on synthetic and real-world datasets, we show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines. We further demonstrate the utility of T-Phenotype by uncovering clinically meaningful patient subgroups characterized by unique temporal patterns.

artificial intelligence, machine learning, t-phenotype, (17 more...)

arXiv.org Artificial Intelligence

2302.12619

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > South Korea (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Epidemiology (1.00)
Health & Medicine > Diagnostic Medicine (0.92)
Health & Medicine > Health Care Technology (0.67)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Constraints Fusion-induced Symmetric Nonnegative Matrix Factorization Approach for Community Detection

Liu, Zhigang, Luo, Xin

arXiv.org Artificial IntelligenceFeb-23-2023

Community is a fundamental and critical characteristic of an undirected social network, making community detection be a vital yet thorny issue in network representation learning. A symmetric and non-negative matrix factorization (SNMF) model is frequently adopted to address this issue owing to its great interpretability and scalability. However, it adopts a single latent factor matrix to represent an undirected network for precisely representing its symmetry, which leads to loss of representation learning ability due to the reduced latent space. Motivated by this discovery, this paper proposes a novel Constraints Fusion-induced Symmetric Nonnegative Matrix Factorization (CFS) model that adopts three-fold ideas: a) Representing a target undirected network with multiple latent factor matrices, thus preserving its representation learning capacity; b) Incorporating a symmetry-regularizer that preserves the symmetry of the learnt low-rank approximation to the adjacency matrix into the loss function, thus making the resultant detector well-aware of the target network's symmetry; and c) Introducing a graph-regularizer that preserves local invariance of the network's intrinsic geometry, thus making the achieved detector well-aware of community structure within the target network. Extensively empirical studies on eight real-world social networks from industrial applications demonstrate that the proposed CFS model significantly outperforms state-of-the-art models in achieving highly-accurate community detection results.

data mining, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.12114

Country:

North America > United States (0.28)
Asia > China > Chongqing Province > Chongqing (0.05)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Add feedback

Ensemble Clustering via Co-association Matrix Self-enhancement

Jia, Yuheng, Tao, Sirui, Wang, Ran, Wang, Yongheng

arXiv.org Artificial IntelligenceFeb-23-2023

Ensemble clustering integrates a set of base clustering results to generate a stronger one. Existing methods usually rely on a co-association (CA) matrix that measures how many times two samples are grouped into the same cluster according to the base clusterings to achieve ensemble clustering. However, when the constructed CA matrix is of low quality, the performance will degrade. In this paper, we propose a simple yet effective CA matrix self-enhancement framework that can improve the CA matrix to achieve better clustering performance. Specifically, we first extract the high-confidence (HC) information from the base clusterings to form a sparse HC matrix. By propagating the highly-reliable information of the HC matrix to the CA matrix and complementing the HC matrix according to the CA matrix simultaneously, the proposed method generates an enhanced CA matrix for better clustering. Technically, the proposed model is formulated as a symmetric constrained convex optimization problem, which is efficiently solved by an alternating iterative algorithm with convergence and global optimum theoretically guaranteed. Extensive experimental comparisons with twelve state-of-the-art methods on eight benchmark datasets substantiate the effectiveness, flexibility and efficiency of the proposed model in ensemble clustering. The codes and datasets can be downloaded at https://github.com/Siritao/EC-CMS.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2205.05937

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (0.70)

Industry: Education (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

Approximate spectral clustering with eigenvector selection and self-tuned $k$

Alshammari, Mashaan, Takatsuka, Masahiro

arXiv.org Artificial IntelligenceFeb-22-2023

The recently emerged spectral clustering surpasses conventional clustering methods by detecting clusters of any shape without the convexity assumption. Unfortunately, with a computational complexity of $O(n^3)$, it was infeasible for multiple real applications, where $n$ could be large. This stimulates researchers to propose the approximate spectral clustering (ASC). However, most of ASC methods assumed that the number of clusters $k$ was known. In practice, manual setting of $k$ could be subjective or time consuming. The proposed algorithm has two relevance metrics for estimating $k$ in two vital steps of ASC. One for selecting the eigenvectors spanning the embedding space, and the other to discover the number of clusters in that space. The algorithm used a growing neural gas (GNG) approximation, GNG is superior in preserving input data topology. The experimental setup demonstrates the efficiency of the proposed algorithm and its ability to compete with similar methods where $k$ was set manually.

artificial intelligence, eigenvector, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.patrec.2019.02.006

2302.11297

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback