Goto

Collaborating Authors

 oom oom 0



Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version)

Zhang, Lijun, Liu, Suyuan, Wang, Siwei, Yu, Shengju, Zhu, Xueling, Li, Miaomiao, Liu, Xinwang

arXiv.org Artificial Intelligence

Clustering is a fundamental task in unsupervised learning, but most existing methods heavily rely on hyperpa-rameters such as the number of clusters or other sensitive settings, limiting their applicability in real-world scenarios. To address this long-standing challenge, we propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax. Our framework performs hierarchical agglomerative clustering and cluster evaluation in a single, integrated process. At each step of agglomeration, it creates a new, structure-aware data representation through a self-supervised learning task guided by the current clustering structure. We then introduce a nearest neighbor consensus score, which measures the agreement between the nearest neighbor-based merge decisions suggested by the original representation and the self-supervised one. The moment at which consensus maximization occurs can serve as a criterion for determining the optimal number of clusters. Extensive experiments on multiple datasets demonstrate that the proposed framework outperforms existing clustering approaches designed for scenarios with an unknown number of clusters.


DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Neural Information Processing Systems

Therefore, they fall short in facilitating methodological advances in semantic modeling within dynamic graphs and exploring the impact of text attributes on downstream tasks.


DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Zhang, Jiasheng, Chen, Jialin, Yang, Menglin, Feng, Aosong, Liang, Shuang, Shao, Jie, Ying, Rex

arXiv.org Artificial Intelligence

Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address this gap, we introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs from diverse domains, with nodes and edges enriched by dynamically changing text attributes and categories. To facilitate the use of DTGB, we design standardized evaluation procedures based on four real-world use cases: future link prediction, destination node retrieval, edge classification, and textual relation generation. These tasks require models to understand both dynamic graph structures and natural language, highlighting the unique challenges posed by DyTAGs. Moreover, we conduct extensive benchmark experiments on DTGB, evaluating 7 popular dynamic graph learning algorithms and their variants of adapting to text attributes with LLM embeddings, along with 6 powerful large language models (LLMs). Our results show the limitations of existing models in handling DyTAGs. Our analysis also demonstrates the utility of DTGB in investigating the incorporation of structural and textual dynamics. The proposed DTGB fosters research on DyTAGs and their broad applications. It offers a comprehensive benchmark for evaluating and advancing models to handle the interplay between dynamic graph structures and natural language. The dataset and source code are available at https://github.com/zjs123/DTGB.