Zhang, Ying
DiffGED: Computing Graph Edit Distance via Diffusion-based Graph Matching
Huang, Wei, Wang, Hanchen, Wen, Dong, Zhang, Wenjie, Zhang, Ying, Lin, Xuemin
The Graph Edit Distance (GED) problem, which aims to compute the minimum number of edit operations required to transform one graph into another, is a fundamental challenge in graph analysis with wide-ranging applications. However, due to its NP-hard nature, traditional A* approaches often suffer from scalability issue, making them computationally intractable for large graphs. Many recent deep learning frameworks address GED by formulating it as a regression task, which, while efficient, fails to recover the edit path -- a central interest in GED. Furthermore, recent hybrid approaches that combine deep learning with traditional methods to recover the edit path often yield poor solution quality. These methods also struggle to generate candidate solutions in parallel, resulting in increased running times.In this paper, we present a novel approach, DiffGED, that leverages generative diffusion model to solve GED and recover the corresponding edit path. Specifically, we first generate multiple diverse node matching matrices in parallel through a diffusion-based graph matching model. Next, node mappings are extracted from each generated matching matrices in parallel, and each extracted node mapping can be simply transformed into an edit path. Benefiting from the generative diversity provided by the diffusion model, DiffGED is less likely to fall into local sub-optimal solutions, thereby achieving superior overall solution quality close to the exact solution. Experimental results on real-world datasets demonstrate that DiffGED can generate multiple diverse edit paths with exceptionally high accuracy comparable to exact solutions while maintaining a running time shorter than most of hybrid approaches.
Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine
Dou, Chengfeng, Zhang, Ying, Jin, Zhi, Jiao, Wenpin, Zhao, Haiyan, Zhao, Yongqiang, Tao, Zhengwei
Evidence-based medicine (EBM) plays a crucial role in the application of large language models (LLMs) in healthcare, as it provides reliable support for medical decision-making processes. Although it benefits from current retrieval-augmented generation~(RAG) technologies, it still faces two significant challenges: the collection of dispersed evidence and the efficient organization of this evidence to support the complex queries necessary for EBM. To tackle these issues, we propose using LLMs to gather scattered evidence from multiple sources and present a knowledge hypergraph-based evidence management model to integrate these evidence while capturing intricate relationships. Furthermore, to better support complex queries, we have developed an Importance-Driven Evidence Prioritization (IDEP) algorithm that utilizes the LLM to generate multiple evidence features, each with an associated importance score, which are then used to rank the evidence and produce the final retrieval results. Experimental results from six datasets demonstrate that our approach outperforms existing RAG techniques in application domains of interest to EBM, such as medical quizzing, hallucination detection, and decision support. Testsets and the constructed knowledge graph can be accessed at \href{https://drive.google.com/file/d/1WJ9QTokK3MdkjEmwuFQxwH96j_Byawj_/view?usp=drive_link}{https://drive.google.com/rag4ebm}.
G-Boost: Boosting Private SLMs with General LLMs
Fan, Yijiang, Mao, Yuren, Lai, Longbin, Zhang, Ying, Qian, Zhengping, Gao, Yunjun
Due to the limited computational resources, most Large Language Models (LLMs) developers can only fine-tune Small Language Models (SLMs) on their own data. These private SLMs typically have limited effectiveness. To boost the performance of private SLMs, this paper proposes to ask general LLMs for help. The general LLMs can be APIs or larger LLMs whose inference cost the developers can afford. Specifically, we propose the G-Boost framework where a private SLM adaptively performs collaborative inference with a general LLM under the guide of process reward. Experiments demonstrate that our framework can significantly boost the performance of private SLMs.
AoECR: AI-ization of Elderly Care Robot
Zhou, Linkun, Li, Jian, Mo, Yadong, Zhang, Xiangyan, Zhang, Ying, Wei, Shimin
-- Autonomous interaction is crucial for the effective use of elderly care robots. However, developing universal AI architectures is extremely challenging due to the diversity in robot configurations and a lack of dataset. We proposed a universal architecture for the AI - ization of elderly care robots, called AoECR. Specifically, based on a nursing bed, we d eveloped a patient - nurse inter action dataset tailored for elderly care scenarios and fine - tuned a large language model to enable it to perform nursing manipulations. Additionally, the inference process included a self - check chain to ensure the security of control commands. An expert op timization process further enhanced the humanization and personalization of the interactive responses. The physical experiment demonstrated that the AoECR exhibited zero - shot generalization capabilities across diverse scenario s, understood patients' instru ctions, implemented secure control commands, and delivered humanized and personalized interactive responses. In general, our research provides a valuable dataset reference and AI - ization solutions for elderly care robots. M anuscript received Month xx, x xxx; revised Month xx, xxxx; accepted Month x, xxxx.
Unlocking Multi-Modal Potentials for Dynamic Text-Attributed Graph Representation
Xu, Yuanyuan, Zhang, Wenjie, Zhang, Ying, Lin, Xuemin, Xu, Xiwei
Dynamic Text-Attributed Graphs (DyTAGs) are a novel graph paradigm that captures evolving temporal edges alongside rich textual attributes. A prior approach to representing DyTAGs leverages pre-trained language models to encode text attributes and subsequently integrates them into dynamic graph models. However, it follows edge-centric modeling, as in dynamic graph learning, which is limited in local structures and fails to exploit the unique characteristics of DyTAGs, leading to suboptimal performance. We observe that DyTAGs inherently comprise three distinct modalities-temporal, textual, and structural-often exhibiting dispersed or even orthogonal distributions, with the first two largely overlooked in existing research. Building on this insight, we propose MoMent, a model-agnostic multi-modal framework that can seamlessly integrate with dynamic graph models for structural modality learning. The core idea is to shift from edge-centric to node-centric modeling, fully leveraging three modalities for node representation. Specifically, MoMent presents non-shared node-centric encoders based on the attention mechanism to capture global temporal and semantic contexts from temporal and textual modalities, together with local structure learning, thus generating modality-specific tokens. To prevent disjoint latent space, we propose a symmetric alignment loss, an auxiliary objective that aligns temporal and textual tokens, ensuring global temporal-semantic consistency with a theoretical guarantee. Last, we design a lightweight adaptor to fuse these tokens, generating comprehensive and cohesive node representations. We theoretically demonstrate that MoMent enhances discriminative power over exclusive edge-centric modeling. Extensive experiments across seven datasets and two downstream tasks show that MoMent achieves up to 33.62% improvement against the baseline using four dynamic graph models.
UniDyG: A Unified and Effective Representation Learning Approach for Large Dynamic Graphs
Xu, Yuanyuan, Zhang, Wenjie, Lin, Xuemin, Zhang, Ying
--Dynamic graphs, which capture time-evolving edges between nodes, are formulated in continuous-time or discrete-time dynamic graphs. They differ in temporal granularity: Continuous-Time Dynamic Graphs (CTDGs) exhibit rapid, localized changes, while Discrete-Time Dynamic Graphs (DTDGs) show gradual, global updates. This difference leads to isolated developments in representation learning for each type. T o advance dynamic graph representation learning, recent research attempts to design a unified model capable of handling both CTDGs and DTDGs, achieving promising results. However, it typically focuses on local dynamic propagation for temporal structure learning in the time domain, failing to accurately capture the underlying structural evolution associated with each temporal granularity and thus compromising model effectiveness. In addition, existing works-whether specific or unified-often overlook the issue of temporal noise, compromising the model's robustness. T o better model both types of dynamic graphs, we propose UniDyG, a unified and effective representation learning approach, which can scale to large dynamic graphs. Specifically, we first propose a novel Fourier Graph Attention (FGA T) mechanism that can model local and global structural correlations based on recent neighbors and complex-number selective aggregation, while theoretically ensuring consistent representations of dynamic graphs over time. Based on approximation theory, we demonstrate that FGA T is well-suited to capture the underlying structures in both CTDGs and DTDGs. We further enhance FGA T to resist temporal noise by designing an energy-gated unit, which adaptively filters out high-frequency noise according to the energy. Last, we leverage our proposed FGA T mechanisms for temporal structure learning and employ the frequency-enhanced linear function for node-level dynamic updates, facilitating the generation of high-quality temporal embeddings. Extensive experiments show that our UniDyG achieves an average improvement of 14. 4% over sixteen baselines across nine dynamic graphs while exhibiting superior robustness in noisy scenarios. YNAMIC graphs serve as a crucial data modality for representing time-evolving relationships (edges) between entities (nodes). Y uanyuan Xu and Wenjie Zhang are with the School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW 2052, Australia (e-mail: yuanyuan.xu@unsw.edu.au; Xuemin Lin is with Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200052, china (e-mail: xuemin.lin@gmail.com). Ying Zhang is with the School of Statistics and Mathematics, School of Computer Science, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China (e-mail: ying.zhang@zjgsu.edu.cn).
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Zeng, Yujie, He, Wenlong, Vasyltsov, Ihor, Wei, Jiaxin, Zhang, Ying, Chen, Lin, Dai, Yuehua
Recently, Graph Neural Network based Force Field (GNNFF) models are widely used in Molecular Dynamics (MD) simulation, which is one of the most cost-effective means in semiconductor material research. However, even such models provide high accuracy in energy and force Mean Absolute Error (MAE) over trained (in-distribution) datasets, they often become unstable during long-time MD simulation when used for out-of-distribution datasets. In this paper, we propose a feature correlation based method for GNNFF models to enhance the stability of MD simulation. We reveal the negative relationship between feature correlation and the stability of GNNFF models, and design a loss function with a dynamic loss coefficient scheduler to reduce edge feature correlation that can be applied in general GNNFF training. We also propose an empirical metric to evaluate the stability in MD simulation. Experiments show our method can significantly improve stability for GNNFF models especially in out-of-distribution data with less than 3% computational overhead. For example, we can ensure the stable MD simulation time from 0.03ps to 10ps for Allegro model.
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Gao, Hao, Chen, Shaoyu, Jiang, Bo, Liao, Bencheng, Shi, Yiang, Guo, Xiaoyang, Pu, Yuechuan, Yin, Haoran, Li, Xiangyu, Zhang, Xinbang, Zhang, Ying, Liu, Wenyu, Zhang, Qian, Wang, Xinggang
Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented at https://hgao-cv.github.io/RAD.
Revisiting Dynamic Graph Clustering via Matrix Factorization
Li, Dongyuan, Kosugi, Satoshi, Zhang, Ying, Okumura, Manabu, Xia, Feng, Jiang, Renhe
Dynamic graph clustering aims to detect and track time-varying clusters in dynamic graphs, revealing the evolutionary mechanisms of complex real-world dynamic systems. Matrix factorization-based methods are promising approaches for this task; however, these methods often struggle with scalability and can be time-consuming when applied to large-scale dynamic graphs. Moreover, they tend to lack robustness and are vulnerable to real-world noisy data. To address these issues, we make three key contributions. First, to improve scalability, we propose temporal separated matrix factorization, where a single matrix is divided into multiple smaller matrices for independent factorization, resulting in faster computation. Second, to improve robustness, we introduce bi-clustering regularization, which jointly optimizes graph embedding and clustering, thereby filtering out noisy features from the graph embeddings. Third, to further enhance effectiveness and efficiency, we propose selective embedding updating, where we update only the embeddings of dynamic nodes while the embeddings of static nodes are fixed among different timestamps. Experimental results on six synthetic and five real-world benchmarks demonstrate the scalability, robustness and effectiveness of our proposed method. Source code is available at https://github.com/Clearloveyuan/DyG-MF.
ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models
Zhang, Ying, Yin, Maoliang, Bi, Wenfu, Yan, Haibao, Bian, Shaohan, Zhang, Cui-Hua, Hua, Changchun
Service robots operating in unstructured environments must effectively recognize and segment unknown objects to enhance their functionality. Traditional supervised learningbased segmentation techniques require extensive annotated datasets, which are impractical for the diversity of objects encountered in real-world scenarios. Unseen Object Instance Segmentation (UOIS) methods aim to address this by training models on synthetic data to generalize to novel objects, but they often suffer from the simulation-to-reality gap. This paper proposes a novel approach (ZISVFM) for solving UOIS by leveraging the powerful zero-shot capability of the segment anything model (SAM) and explicit visual representations from a selfsupervised vision transformer (ViT). The proposed framework operates in three stages: (1) generating object-agnostic mask proposals from colorized depth images using SAM, (2) refining these proposals using attention-based features from the selfsupervised ViT to filter non-object masks, and (3) applying K-Medoids clustering to generate point prompts that guide SAM towards precise object segmentation. Experimental validation on two benchmark datasets and a self-collected dataset demonstrates the superior performance of ZISVFM in complex environments, including hierarchical settings such as cabinets, drawers, and handheld objects. Our source code is available at https://github.com/Yinmlmaoliang/zisvfm.