AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Rethinking cluster-conditioned diffusion models

Adaloglou, Nikolas, Kaiser, Tim, Michels, Felix, Kollmann, Markus

arXiv.org Artificial IntelligenceMar-1-2024

We present a comprehensive experimental study on image-level conditioning for diffusion models using cluster assignments. We elucidate how individual components regarding image clustering impact image synthesis across three datasets. By combining recent advancements from image clustering and diffusion models, we show that, given the optimal cluster granularity with respect to image synthesis (visual groups), cluster-conditioning can achieve state-of-the-art FID (i.e. 1.67, 2.17 on CIFAR10 and CIFAR100 respectively), while attaining a strong training sample efficiency. Finally, we propose a novel method to derive an upper cluster bound that reduces the search space of the visual groups using solely feature-based clustering. Unlike existing approaches, we find no significant connection between clustering and cluster-conditional image generation. The code and cluster assignments will be released.

arxiv preprint arxiv, dataset, diffusion model, (14 more...)

arXiv.org Artificial Intelligence

2403.0057

Country:

North America > United States (0.04)
Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Attacks Against Mobility Prediction in 5G Networks

Atiiq, Syafiq Al, Yuan, Yachao, Gehrmann, Christian, Sternby, Jakob, Barriga, Luis

arXiv.org Artificial IntelligenceFeb-29-2024

The $5^{th}$ generation of mobile networks introduces a new Network Function (NF) that was not present in previous generations, namely the Network Data Analytics Function (NWDAF). Its primary objective is to provide advanced analytics services to various entities within the network and also towards external application services in the 5G ecosystem. One of the key use cases of NWDAF is mobility trajectory prediction, which aims to accurately support efficient mobility management of User Equipment (UE) in the network by allocating ``just in time'' necessary network resources. In this paper, we show that there are potential mobility attacks that can compromise the accuracy of these predictions. In a semi-realistic scenario with 10,000 subscribers, we demonstrate that an adversary equipped with the ability to hijack cellular mobile devices and clone them can significantly reduce the prediction accuracy from 75\% to 40\% using just 100 adversarial UEs. While a defense mechanism largely depends on the attack and the mobility types in a particular area, we prove that a basic KMeans clustering is effective in distinguishing legitimate and adversarial UEs.

accuracy, mobility, prediction, (17 more...)

arXiv.org Artificial Intelligence

2402.19319

Country:

Asia > India (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Networks (0.88)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Communications > Mobile (1.00)
(2 more...)

Add feedback

Graph Construction with Flexible Nodes for Traffic Demand Prediction

Hou, Jinyan, Liu, Shan, Zhang, Ya, Qin, Haotong

arXiv.org Artificial IntelligenceFeb-29-2024

Graph neural networks (GNNs) have been widely applied in traffic demand prediction, and transportation modes can be divided into station-based mode and free-floating traffic mode. Existing research in traffic graph construction primarily relies on map matching to construct graphs based on the road network. However, the complexity and inhomogeneity of data distribution in free-floating traffic demand forecasting make road network matching inflexible. To tackle these challenges, this paper introduces a novel graph construction method tailored to free-floating traffic mode. We propose a novel density-based clustering algorithm (HDPC-L) to determine the flexible positioning of nodes in the graph, overcoming the computational bottlenecks of traditional clustering algorithms and enabling effective handling of large-scale datasets. Furthermore, we extract valuable information from ridership data to initialize the edge weights of GNNs. Comprehensive experiments on two real-world datasets, the Shenzhen bike-sharing dataset and the Haikou ride-hailing dataset, show that the method significantly improves the performance of the model. On average, our models show an improvement in accuracy of around 25\% and 19.5\% on the two datasets. Additionally, it significantly enhances computational efficiency, reducing training time by approximately 12% and 32.5% on the two datasets. We make our code available at https://github.com/houjinyan/HDPC-L-ODInit.

dataset, graph, node, (14 more...)

arXiv.org Artificial Intelligence

2403.00276

Country:

Asia > China > Guangdong Province > Shenzhen (0.27)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
(5 more...)

Genre: Research Report > Promising Solution (0.46)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Telecommunications > Networks (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Impact of network topology on the performance of Decentralized Federated Learning

Palmieri, Luigi, Boldrini, Chiara, Valerio, Lorenzo, Passarella, Andrea, Conti, Marco

arXiv.org Artificial IntelligenceFeb-28-2024

Fully decentralized learning is gaining momentum for training AI models at the Internet's edge, addressing infrastructure challenges and privacy concerns. In a decentralized machine learning system, data is distributed across multiple nodes, with each node training a local model based on its respective dataset. The local models are then shared and combined to form a global model capable of making accurate predictions on new data. Our exploration focuses on how different types of network structures influence the spreading of knowledge - the process by which nodes incorporate insights gained from learning patterns in data available on other nodes across the network. Specifically, this study investigates the intricate interplay between network structure and learning performance using three network topologies and six data distribution methods. These methods consider different vertex properties, including degree centrality, betweenness centrality, and clustering coefficient, along with whether nodes exhibit high or low values of these metrics. Our findings underscore the significance of global centrality metrics (degree, betweenness) in correlating with learning performance, while local clustering proves less predictive. We highlight the challenges in transferring knowledge from peripheral to central nodes, attributed to a dilution effect during model aggregation. Additionally, we observe that central nodes exert a pull effect, facilitating the spread of knowledge. In examining degree distribution, hubs in Barabasi-Albert networks positively impact learning for central nodes but exacerbate dilution when knowledge originates from peripheral nodes. Finally, we demonstrate the formidable challenge of knowledge circulation outside of segregated communities.

accuracy, centrality, node, (14 more...)

arXiv.org Artificial Intelligence

2402.18606

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.37)

Add feedback

Spot the bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans

Gromov, Vasilii A., Kogan, Alexandra S.

arXiv.org Artificial IntelligenceFeb-27-2024

Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations of datasets of n-grams from literary texts and texts generated by several bots. The hypothesis is that the structures and clusterizations are different. Our research supports the hypothesis. As the semantic structure may be different for different languages, we investigate Russian, English, German, and Vietnamese languages.

bot, coarse-grained partition, vector, (12 more...)

arXiv.org Artificial Intelligence

2402.17392

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.66)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Zampierin, Luca, Hacene, Ghouthi Boukli, Nguyen, Bac, Ravanelli, Mirco

arXiv.org Artificial IntelligenceFeb-26-2024

Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.

distillation, proc, representation, (15 more...)

arXiv.org Artificial Intelligence

2402.1683

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)

Add feedback

Self Supervised Correlation-based Permutations for Multi-View Clustering

Eisenberg, Ran, Svirsky, Jonathan, Lindenbaum, Ofir

arXiv.org Machine LearningFeb-26-2024

Fusing information from different modalities can enhance data analysis tasks, including clustering. However, existing multi-view clustering (MVC) solutions are limited to specific domains or rely on a suboptimal and computationally demanding two-stage procedure of representation and clustering. We propose an end-to-end deep learning-based MVC framework for general data (image, tabular, etc.). Our approach involves learning meaningful fused data representations with a novel permutation-based canonical correlation objective. Concurrently, we learn cluster assignments by identifying consistent pseudo-labels across multiple views. We demonstrate the effectiveness of our model using ten MVC benchmark datasets. Theoretically, we show that our model approximates the supervised linear discrimination analysis (LDA) representation. Additionally, we provide an error bound induced by false-pseudo label annotations.

lda, representation, self supervised correlation-based permutation, (12 more...)

arXiv.org Machine Learning

2402.16383

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Deep Contrastive Graph Learning with Clustering-Oriented Guidance

Chen, Mulin, Wang, Bocheng, Li, Xuelong

arXiv.org Artificial IntelligenceFeb-25-2024

Graph Convolutional Network (GCN) has exhibited remarkable potential in improving graph-based clustering. To handle the general clustering scenario without a prior graph, these models estimate an initial graph beforehand to apply GCN. Throughout the literature, we have witnessed that 1) most models focus on the initial graph while neglecting the original features. Therefore, the discriminability of the learned representation may be corrupted by a low-quality initial graph; 2) the training procedure lacks effective clustering guidance, which may lead to the incorporation of clustering-irrelevant information into the learned graph. To tackle these problems, the Deep Contrastive Graph Learning (DCGL) model is proposed for general data clustering. Specifically, we establish a pseudo-siamese network, which incorporates auto-encoder with GCN to emphasize both the graph structure and the original features. On this basis, feature-level contrastive learning is introduced to enhance the discriminative capacity, and the relationship between samples and centroids is employed as the clustering-oriented guidance. Afterward, a two-branch graph learning mechanism is designed to extract the local and global structural relationships, which are further embedded into a unified graph under the cluster-level contrastive guidance. Experimental results on several benchmark datasets demonstrate the superiority of DCGL against state-of-the-art algorithms.

contrastive learning, graph, learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v38i10.29016

2402.16012

Country:

Asia > China > Shanxi Province (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection

Chatziafratis, Vaggos, Karmarkar, Ishani, Vitercik, Ellen

arXiv.org Machine LearningFeb-25-2024

In clustering algorithm selection, we are given a massive dataset and must efficiently select which clustering algorithm to use. We study this problem in a semi-supervised setting, with an unknown ground-truth clustering that we can only access through expensive oracle queries. Ideally, the clustering algorithm's output will be structurally close to the ground truth. We approach this problem by introducing a notion of size generalization for clustering algorithm accuracy. We identify conditions under which we can (1) subsample the massive clustering instance, (2) evaluate a set of candidate algorithms on the smaller instance, and (3) guarantee that the algorithm with the best accuracy on the small instance will have the best accuracy on the original big instance. We provide theoretical size generalization guarantees for three classic clustering algorithms: single-linkage, k-means++, and (a smoothed variant of) Gonzalez's k-centers heuristic. We validate our theoretical analysis with empirical results, observing that on real-world clustering instances, we can use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.

algorithm, ground truth, singlelinkage, (16 more...)

arXiv.org Machine Learning

2402.14332

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Convergence Analysis of Blurring Mean Shift

Yamasaki, Ryoya, Tanaka, Toshiyuki

arXiv.org Artificial IntelligenceFeb-23-2024

Blurring mean shift (BMS) algorithm, a variant of the mean shift algorithm, is a kernel-based iterative method for data clustering, where data points are clustered according to their convergent points via iterative blurring. In this paper, we analyze convergence properties of the BMS algorithm by leveraging its interpretation as an optimization procedure, which is known but has been underutilized in existing convergence studies. Whereas existing results on convergence properties applicable to multi-dimensional data only cover the case where all the blurred data point sequences converge to a single point, this study provides a convergence guarantee even when those sequences can converge to multiple points, yielding multiple clusters. This study also shows that the convergence of the BMS algorithm is fast by further leveraging geometrical characterization of the convergent points.

algorithm, convergence, kernel, (14 more...)

arXiv.org Artificial Intelligence

2402.15146

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback