AITopics

arXiv.org Artificial IntelligenceAug-11-2024

Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering

Ling, Huaming, Bao, Chenglong, Song, Jiebo, Shi, Zuoqiang

In this paper, we introduce a Fast and Scalable Semi-supervised Multi-view Subspace Clustering (FSSMSC) method, a novel solution to the high computational complexity commonly found in existing approaches. FSSMSC features linear computational and space complexity relative to the size of the data. The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks. Unlike traditional methods that manage the anchor graph construction and the label propagation process separately, this paper proposes a unified optimization model that facilitates simultaneous learning of both. An effective alternating update algorithm with convergence guarantees is proposed to solve the unified optimization model. Additionally, the method employs the obtained anchor graph and landmarks' low-dimensional representations to deduce low-dimensional representations for raw data. Following this, a straightforward clustering approach is conducted on these low-dimensional representations to achieve the final clustering results. The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.

dataset, representation, subspace, (16 more...)

2408.05707

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Artificial IntelligenceAug-10-2024

A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation

Li, Yiran, Guo, Gongyao, Shi, Jieming, Yang, Renchi, Shen, Shiqi, Li, Qing, Luo, Jun

Attributed networks containing entity-specific information in node attributes are ubiquitous in modeling social networks, e-commerce, bioinformatics, etc. Their inherent network topology ranges from simple graphs to hypergraphs with high-order interactions and multiplex graphs with separate layers. An important graph mining task is node clustering, aiming to partition the nodes of an attributed network into k disjoint clusters such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes for effective clustering on multiple types of attributed networks. In this paper, we first present AHCKA as an efficient approach to attributed hypergraph clustering (AHC). AHCKA includes a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, a joint hypergraph random walk model to devise an effective AHC objective, and an efficient solver with speedup techniques for the objective optimization. The proposed techniques are extensible to various types of attributed networks, and thus, we develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and AHC. Moreover, we devise ANCKA with algorithmic designs tailored for GPU acceleration to boost efficiency. We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.

ancka, graph, hypergraph, (15 more...)

2408.05459

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.81)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceAug-10-2024

Image Clustering Algorithm Based on Self-Supervised Pretrained Models and Latent Feature Distribution Optimization

Zhu, Qiuyu, Hu, Liheng, Wang, Sijin

In the face of complex natural images, existing deep clustering algorithms fall significantly short in terms of clustering accuracy when compared to supervised classification methods, making them less practical. This paper introduces an image clustering algorithm based on self-supervised pretrained models and latent feature distribution optimization, substantially enhancing clustering performance. It is found that: (1) For complex natural images, we effectively enhance the discriminative power of latent features by leveraging self-supervised pretrained models and their fine-tuning, resulting in improved clustering performance. (2) In the latent feature space, by searching for k-nearest neighbor images for each training sample and shortening the distance between the training sample and its nearest neighbor, the discriminative power of latent features can be further enhanced, and clustering performance can be improved. (3) In the latent feature space, reducing the distance between sample features and the nearest predefined cluster centroids can optimize the distribution of latent features, therefore further improving clustering performance. Through experiments on multiple datasets, our approach outperforms the latest clustering algorithms and achieves state-of-the-art clustering results. When the number of categories in the datasets is small, such as CIFAR-10 and STL-10, and there are significant differences between categories, our clustering algorithm has similar accuracy to supervised methods without using pretrained models, slightly lower than supervised methods using pre-trained models. The code linked algorithm is https://github.com/LihengHu/semi.

algorithm, latent feature, loss function, (13 more...)

2408.0192

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Fragkedaki, Kleio, Jiang, Frank J., Johansson, Karl H., Mårtensson, Jonas

Pedestrian Motion Prediction Using Transformer-based Behavior Clustering and Data-Driven Reachability Analysis

In this work, we present a transformer-based framework for predicting future pedestrian states based on clustered historical trajectory data. In previous studies, researchers propose enhancing pedestrian trajectory predictions by using manually crafted labels to categorize pedestrian behaviors and intentions. However, these approaches often only capture a limited range of pedestrian behaviors and introduce human bias into the predictions. To alleviate the dependency on manually crafted labels, we utilize a transformer encoder coupled with hierarchical density-based clustering to automatically identify diverse behavior patterns, and use these clusters in data-driven reachability analysis. By using a transformer-based approach, we seek to enhance the representation of pedestrian trajectories and uncover characteristics or features that are subsequently used to group trajectories into different "behavior" clusters. We show that these behavior clusters can be used with data-driven reachability analysis, yielding an end-to-end data-driven approach to predicting the future motion of pedestrians. We train and evaluate our approach on a real pedestrian dataset, showcasing its effectiveness in forecasting pedestrian movements.

data-driven reachability analysis, reachability analysis, trajectory, (15 more...)

2408.1525

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (0.40)

Industry: Transportation > Ground > Road (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Unsupervised Episode Detection for Large-Scale News Events

Kargupta, Priyanka, Zhang, Yunyi, Jiao, Yizhu, Ouyang, Siru, Han, Jiawei

Episodic structures are inherently interpretable and adaptable to evolving large-scale key events. However, state-of-the-art automatic event detection methods overlook event episodes and, therefore, struggle with these crucial characteristics. This paper introduces a novel task, episode detection, aimed at identifying episodes from a news corpus containing key event articles. An episode describes a cohesive cluster of core entities (e.g., "protesters", "police") performing actions at a specific time and location. Furthermore, an episode is a significant part of a larger group of episodes under a particular key event. Automatically detecting episodes is challenging because, unlike key events and atomic actions, we cannot rely on explicit mentions of times and locations to distinguish between episodes or use semantic similarity to merge inconsistent episode co-references. To address these challenges, we introduce EpiMine, an unsupervised episode detection framework that (1) automatically identifies the most salient, key-event-relevant terms and segments, (2) determines candidate episodes in an article based on natural episodic partitions estimated through shifts in discriminative term combinations, and (3) refines and forms final episode clusters using large language model-based reasoning on the candidate episodes. We construct three diverse, real-world event datasets annotated at the episode level. EpiMine outperforms all baselines on these datasets by an average 59.2% increase across all metrics.

computational linguistic, key event, protester, (11 more...)

2408.04873

Country:

North America > Haiti (0.14)
Asia > China > Hong Kong (0.07)
North America > United States > Illinois (0.05)
(22 more...)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government (1.00)
Media > Television (0.94)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Oshima, Toshiyuki, Takagi, Kentaro, Nakata, Kouta

Clustering-friendly Representation Learning for Enhancing Salient Features

Recently, representation learning with contrastive learning algorithms has been successfully applied to challenging unlabeled datasets. However, these methods are unable to distinguish important features from unimportant ones under simply unsupervised settings, and definitions of importance vary according to the type of downstream task or analysis goal, such as the identification of objects or backgrounds. In this paper, we focus on unsupervised image clustering as the downstream task and propose a representation learning method that enhances features critical to the clustering task. We extend a clustering-friendly contrastive learning method and incorporate a contrastive analysis approach, which utilizes a reference dataset to separate important features from unimportant ones, into the design of loss functions. Conducting an experimental evaluation of image clustering for three datasets with characteristic backgrounds, we show that for all datasets, our method achieves higher clustering scores compared with conventional contrastive analysis and deep clustering methods.

clustering-friendly representation learning, dataset, representation, (11 more...)

doi: 10.1007/978-981-97-2242-6

2408.04891

Country: Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Min, Wenwen, Wang, Zhen, Zhu, Fangfang, Xu, Taosheng, Wang, Shunfang

Single-cell RNA sequencing (scRNA-seq) data analysis is pivotal for understanding cellular heterogeneity. However, the high sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods. To address these issues, we propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC), which integrates multiple advanced modules to improve clustering accuracy and robustness.Our approach employs a multi-layer graph convolutional network (GCN) to capture high-order structural relationships between cells, termed as the graph autoencoder module. To mitigate the oversmoothing issue in GCNs, we introduce a ZINB-based autoencoder module that extracts content information from the data and learns latent representations of gene expression. These modules are further integrated through an attention fusion mechanism, ensuring effective combination of gene expression and structural information at each layer of the GCN. Additionally, a self-supervised learning module is incorporated to enhance the robustness of the learned embeddings. Extensive experiments demonstrate that scASDC outperforms existing state-of-the-art methods, providing a robust and effective solution for single-cell clustering tasks. Our method paves the way for more accurate and meaningful analysis of single-cell RNA sequencing data, contributing to better understanding of cellular heterogeneity and biological processes. All code and public datasets used in this paper are available at \url{https://github.com/wenwenmin/scASDC} and \url{https://zenodo.org/records/12814320}.

autoencoder module, information, module, (11 more...)

2408.05258

Country:

Asia > China > Yunnan Province > Kunming (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Pereira, Roberto, Mestre, Xavier

Statistical Framework for Clustering MU-MIMO Wireless via Second Order Statistics

arXiv.org Artificial IntelligenceAug-8-2024

This work explores the clustering of wireless users by examining the distances between their channel covariance matrices, which reside on the Riemannian manifold of positive definite matrices. Specifically, we consider an estimator of the Log-Euclidean distance between multiple sample covariance matrices (SCMs) consistent when the number of samples and the observation size grow unbounded at the same rate. Within the context of multi-user MIMO (MU-MIMO) wireless communication systems, we develop a statistical framework that allows to accurate predictions of the clustering algorithm's performance under realistic conditions. Specifically, we present a central limit theorem that establishes the asymptotic Gaussianity of the consistent estimator of the log-Euclidean distance computed over two sample covariance matrices.

covariance matrix, estimator, matrix, (14 more...)

2408.04484

Country: Europe > Spain (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

arXiv.org Artificial IntelligenceAug-8-2024

Self-Supervised Contrastive Graph Clustering Network via Structural Information Fusion

Ji, Xiaoyang, Zhou, Yuchen, Yang, Haofu, Xu, Shiyue, Li, Jiahao

Graph clustering, a classical task in graph learning, involves partitioning the nodes of a graph into distinct clusters. This task has applications in various real-world scenarios, such as anomaly detection, social network analysis, and community discovery. Current graph clustering methods commonly rely on module pre-training to obtain a reliable prior distribution for the model, which is then used as the optimization objective. However, these methods often overlook deeper supervised signals, leading to sub-optimal reliability of the prior distribution. To address this issue, we propose a novel deep graph clustering method called CGCN. Our approach introduces contrastive signals and deep structural information into the pre-training process. Specifically, CGCN utilizes a contrastive learning mechanism to foster information interoperability among multiple modules and allows the model to adaptively adjust the degree of information aggregation for different order structures. Our CGCN method has been experimentally validated on multiple real-world graph datasets, showcasing its ability to boost the dependability of prior clustering distributions acquired through pre-training. As a result, we observed notable enhancements in the performance of the model.

graph, information, representation, (16 more...)

doi: 10.1109/CSCWD61410.2024.10580852

2408.04339

Country: Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)