AITopics

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.48)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-17-2026, 05:17:11 GMT

a65d054a407f94c34ecfb598fb540a0d-Paper-Datasets_and_Benchmarks_Track.pdf

data mining, large language model, machine learning, (24 more...)

Country:

North America > United States > California (0.14)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Neural Information Processing SystemsFeb-10-2026, 05:43:48 GMT

37d00f567a18b478065f1a91b95622a0-Paper-Datasets_and_Benchmarks.pdf

dataset, node, representation, (16 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Hong Kong (0.04)

Industry: Information Technology (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
(2 more...)

Neural Information Processing SystemsDec-24-2025, 15:03:21 GMT

A Comprehensive Study on Text-attributed Graphs: Benchmarking and Rethinking

Text-attributed graphs (TAGs) are prevalent in various real-world scenarios, where each node is associated with a text description. The cornerstone of representation learning on TAGs lies in the seamless integration of textual semantics within individual nodes and the topological connections across nodes. Recent advancements in pre-trained language models (PLMs) and graph neural networks (GNNs) have facilitated effective learning on TAGs, garnering increased research interest. However, the absence of meaningful benchmark datasets and standardized evaluation procedures for TAGs has impeded progress in this field. In this paper, we propose CS-TAG, a comprehensive and diverse collection of challenging benchmark datasets for TAGs. The CS-TAG datasets are notably large in scale and encompass a wide range of domains, spanning from citation networks to purchase graphs. In addition to building the datasets, we conduct extensive benchmark experiments over CS-TAG with various learning paradigms, including PLMs, GNNs, PLM-GNN co-training methods, and the proposed novel topological pre-training of language models. In a nutshell, we provide an overview of the CS-TAG datasets, standardized evaluation procedures, and present baseline experiments.

benchmarking and rethinking, comprehensive study, text-attributed graph, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Sacha, Mikołaj, Jafri, Hammad, Terzolo, Mattie, Sinha, Ayan, Rabinovich, Andrew

GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace

arXiv.org Artificial IntelligenceDec-3-2025

Recommending matches in a text-rich, dynamic two-sided marketplace presents unique challenges due to evolving content and interaction graphs. We introduce GraphMatch, a new large-scale recommendation framework that fuses pre-trained language models with graph neural networks to overcome these challenges. Unlike prior approaches centered on standalone models, GraphMatch is a comprehensive recipe built on powerful text encoders and GNNs working in tandem. It employs adversarial negative sampling alongside point-in-time subgraph training to learn representations that capture both the fine-grained semantics of evolving text and the time-sensitive structure of the graph. We evaluated extensively on interaction data from Upwork, a leading labor marketplace, at large scale, and discuss our approach towards low-latency inference suitable for real-time use. In our experiments, GraphMatch outperforms language-only and graph-only baselines on matching tasks while being efficient at runtime. These results demonstrate that unifying language and graph representations yields a highly effective solution to text-rich, dynamic two-sided recommendations, bridging the gap between powerful pretrained LMs and large-scale graphs in practice.

data mining, machine learning, natural language, (19 more...)

2512.02849

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceDec-1-2025

Odin: Oriented Dual-module Integration for Text-rich Network Representation Learning

Hong, Kaifeng, Zhang, Yinglong, Hong, Xiaoying, Xia, Xuewen, Xu, Xing

Text-attributed graphs require models to effectively combine strong textual understanding with structurally informed reasoning. Existing approaches either rely on GNNs--limited by over-smoothing and hop-dependent diffusion--or employ Transformers that overlook graph topology and treat nodes as isolated sequences. We propose Odin (Oriented Dual-module INtegration), a new architecture that injects graph structure into Transformers at selected depths through an oriented dual-module mechanism. Unlike message-passing GNNs, Odin does not rely on multi-hop diffusion; instead, multi-hop structures are integrated at specific Transformer layers, yielding low-, mid-, and high-level structural abstraction aligned with the model's semantic hierarchy. Because aggregation operates on the global [CLS] representation, Odin fundamentally avoids over-smoothing and decouples structural abstraction from neighborhood size or graph topology. We further establish that Odin's expressive power strictly contains that of both pure Transformers and GNNs. To make the design efficient in large-scale or low-resource settings, we introduce Light Odin, a lightweight variant that preserves the same layer-aligned structural abstraction for faster training and inference. Experiments on multiple text-rich graph benchmarks show that Odin achieves state-of-the-art accuracy, while Light Odin delivers competitive performance with significantly reduced computational cost. Together, Odin and Light Odin form a unified, hop-free framework for principled structure-text integration. The source code of this model has been released at https://github.com/hongkaifeng/Odin.

large language model, machine learning, odin, (21 more...)

2511.21416

Country: Europe > Austria (0.28)

Genre: Research Report (0.63)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

arXiv.org Artificial IntelligenceNov-13-2025

GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs

Yang, Liangwei, Ma, Jing, Zhang, Jianguo, Liu, Zhiwei, Qiu, Jielin, Kokane, Shirley, Wang, Shiyu, Chen, Haolin, Murthy, Rithesh, Zhu, Ming, Wang, Huan, Yao, Weiran, Xiong, Caiming, Heinecke, Shelby

Graph neural networks (GNNs) on text--attributed graphs (TAGs) typically encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation. However, the representation spaces of modern PLMs are highly non--linear and geometrically structured, where textual embeddings reside on curved semantic manifolds rather than flat Euclidean spaces. Linear aggregation on such manifolds inevitably distorts geometry and causes semantic drift--a phenomenon where aggregated representations deviate from the intrinsic manifold, losing semantic fidelity and expressive power. To quantitatively investigate this problem, this work introduces a local PCA--based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure. Building upon these insights, we propose Geodesic Aggregation, a manifold--aware mechanism that aggregates neighbor information along geodesics via log--exp mappings on the unit sphere, ensuring that representations remain faithful to the semantic manifold during message passing. We further develop GeoGNN, a practical instantiation that integrates spherical attention with manifold interpolation. Extensive experiments across four benchmark datasets and multiple text encoders show that GeoGNN substantially mitigates semantic drift and consistently outperforms strong baselines, establishing the importance of manifold--aware aggregation in text--attributed graph learning.

data mining, machine learning, manifold, (21 more...)

2511.09042

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Data Science > Data Mining (0.94)
(2 more...)

Neural Information Processing SystemsOct-10-2025, 12:19:02 GMT

DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Therefore, they fall short in facilitating methodological advances in semantic modeling within dynamic graphs and exploring the impact of text attributes on downstream tasks.

dataset, oom 0, oom oom 0, (16 more...)

Country:

North America > United States > California (0.14)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Neural Information Processing SystemsOct-8-2025, 11:04:17 GMT

A Comprehensive Study on Text-attributed Graphs: Benchmarking and Rethinking Hao Y an

Recent advancements in pre-trained language models (PLMs) and graph neural networks (GNNs) have facilitated effective learning on T AGs, garnering increased research interest.

dataset, node, representation, (16 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Hong Kong (0.04)

Industry: Information Technology (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
(2 more...)

arXiv.org Artificial IntelligenceOct-3-2025

Dynamic Bundling with Large Language Models for Zero-Shot Inference on Text-Attributed Graphs

Zhao, Yusheng, Zhang, Qixin, Luo, Xiao, Zhang, Weizhi, Xiao, Zhiping, Ju, Wei, Yu, Philip S., Zhang, Ming

Large language models (LLMs) have been used in many zero-shot learning problems, with their strong generalization ability. Recently, adopting LLMs in text-attributed graphs (TAGs) has drawn increasing attention. However, the adoption of LLMs faces two major challenges: limited information on graph structure and unreliable responses. LLMs struggle with text attributes isolated from the graph topology. Worse still, they yield unreliable predictions due to both information insufficiency and the inherent weakness of LLMs (e.g., hallucination). Towards this end, this paper proposes a novel method named Dynamic Text Bundling Supervision (DENSE) that queries LLMs with bundles of texts to obtain bundle-level labels and uses these labels to supervise graph neural networks. Specifically, we sample a set of bundles, each containing a set of nodes with corresponding texts of close proximity. We then query LLMs with the bundled texts to obtain the label of each bundle. Subsequently, the bundle labels are used to supervise the optimization of graph neural networks, and the bundles are further refined to exclude noisy items. To justify our design, we also provide theoretical analysis of the proposed method. Extensive experiments across ten datasets validate the effectiveness of the proposed method.

large language model, machine learning, natural language, (18 more...)

2505.17599

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)