Cheng, Yao
Intelligent Spatial Perception by Building Hierarchical 3D Scene Graphs for Indoor Scenarios with the Help of LLMs
Cheng, Yao, Han, Zhe, Jiang, Fengyang, Wang, Huaizhen, Zhou, Fengyu, Yin, Qingshan, Wei, Lei
This paper addresses the high demand in advanced intelligent robot navigation for a more holistic understanding of spatial environments, by introducing a novel system that harnesses the capabilities of Large Language Models (LLMs) to construct hierarchical 3D Scene Graphs (3DSGs) for indoor scenarios. The proposed framework constructs 3DSGs consisting of a fundamental layer with rich metric-semantic information, an object layer featuring precise point-cloud representation of object nodes as well as visual descriptors, and higher layers of room, floor, and building nodes. Thanks to the innovative application of LLMs, not only object nodes but also nodes of higher layers, e.g., room nodes, are annotated in an intelligent and accurate manner. A polling mechanism for room classification using LLMs is proposed to enhance the accuracy and reliability of the room node annotation. Thorough numerical experiments demonstrate the system's ability to integrate semantic descriptions with geometric data, creating an accurate and comprehensive representation of the environment instrumental for context-aware navigation and task planning.
Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving
Cheng, Yao, Zhao, Yibo, Zhu, Jiapeng, Liu, Yao, Sun, Xing, Li, Xiang
Large language models (LLMs) have demonstrated transformative potential across various domains, yet they face significant challenges in knowledge integration and complex problem reasoning, often leading to hallucinations and unreliable outputs. Retrieval-Augmented Generation (RAG) has emerged as a promising solution to enhance LLMs accuracy by incorporating external knowledge. However, traditional RAG systems struggle with processing complex relational information and multi-step reasoning, limiting their effectiveness in advanced problem-solving tasks. To address these limitations, we propose CogGRAG, a cognition inspired graph-based RAG framework, designed to improve LLMs performance in Knowledge Graph Question Answering (KGQA). Inspired by the human cognitive process of decomposing complex problems and performing self-verification, our framework introduces a three-stage methodology: decomposition, retrieval, and reasoning with self-verification. By integrating these components, CogGRAG enhances the accuracy of LLMs in complex problem solving. We conduct systematic experiments with three LLM backbones on four benchmark datasets, where CogGRAG outperforms the baselines.
FullStack Bench: Evaluating LLMs as Full Stack Coders
Bytedance-Seed-Foundation-Code-Team, null, :, null, Cheng, Yao, Chen, Jianfeng, Chen, Jie, Chen, Li, Chen, Liyu, Chen, Wentao, Chen, Zhengyu, Geng, Shijie, Li, Aoyan, Li, Bo, Li, Bowen, Li, Linyi, Liu, Boyi, Liu, Jerry, Liu, Kaibo, Liu, Qi, Liu, Shukai, Liu, Siyao, Liu, Tianyi, Liu, Tingkai, Liu, Yongfei, Long, Rui, Mai, Jing, Ning, Guanghan, Peng, Z. Y., Shen, Kai, Su, Jiahao, Su, Jing, Sun, Tao, Sun, Yifan, Tao, Yunzhe, Wang, Guoyin, Wang, Siwei, Wang, Xuwu, Wang, Yite, Wang, Zihan, Xia, Jinxiang, Xiang, Liang, Xiao, Xia, Xiao, Yongsheng, Xi, Chenguang, Xin, Shulin, Xu, Jingjing, Xu, Shikun, Yang, Hongxia, Yang, Jack, Yang, Yingxiang, Yuan, Jianbo, Zhang, Jun, Zhang, Yufeng, Zhang, Yuyu, Zheng, Shen, Zhu, He, Zhu, Ming
As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To address this gap, we have developed a comprehensive code evaluation dataset FullStack Bench focusing on full-stack programming, which encompasses a wide range of application domains (e.g., basic programming, data analysis, software engineering, mathematics, and machine learning). Besides, to assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages to reflect real-world usage scenarios rather than simple translations. Moreover, we also release an effective code sandbox execution tool (i.e., SandboxFusion) supporting various programming languages and packages to evaluate the performance of our FullStack Bench efficiently. Comprehensive experimental results on our FullStack Bench demonstrate the necessity and effectiveness of our FullStack Bench and SandboxFusion.
Can Large Language Models Act as Ensembler for Multi-GNNs?
Duan, Hanqi, Cheng, Yao, Yu, Jianxiang, Li, Xiang
Graph Neural Networks (GNNs) have emerged as powerful models for learning from graph-structured data. However, GNNs lack the inherent semantic understanding capability of rich textual node attributes, limiting their effectiveness in applications. On the other hand, we empirically observe that for existing GNN models, no one can consistently outperforms others across diverse datasets. In this paper, we study whether LLMs can act as an ensembler for multi-GNNs and propose the LensGNN model. The model first aligns multiple GNNs, mapping the representations of different GNNs into the same space. Then, through LoRA fine-tuning, it aligns the space between the GNN and the LLM, injecting graph tokens and textual information into LLMs. This allows LensGNN to ensemble multiple GNNs and take advantage of the strengths of LLM, leading to a deeper understanding of both textual semantic information and graph structural information. The experimental results show that LensGNN outperforms existing models. This research advances text-attributed graph ensemble learning by providing a robust and superior solution for integrating semantic and structural information. We provide our code and data here: https://anonymous.4open.science/r/EnsemGNN-E267/.
SEAGraph: Unveiling the Whole Story of Paper Review Comments
Yu, Jianxiang, Tan, Jiaqi, Ding, Zichen, Zhu, Jiapeng, Li, Jiahao, Cheng, Yao, Cui, Qier, Lan, Yunshi, Li, Xiang
Peer review, as a cornerstone of scientific research, ensures the integrity and quality of scholarly work by providing authors with objective feedback for refinement. However, in the traditional peer review process, authors often receive vague or insufficiently detailed feedback, which provides limited assistance and leads to a more time-consuming review cycle. If authors can identify some specific weaknesses in their paper, they can not only address the reviewer's concerns but also improve their work. This raises the critical question of how to enhance authors' comprehension of review comments. In this paper, we present SEAGraph, a novel framework developed to clarify review comments by uncovering the underlying intentions behind them. We construct two types of graphs for each paper: the semantic mind graph, which captures the author's thought process, and the hierarchical background graph, which delineates the research domains related to the paper. A retrieval method is then designed to extract relevant content from both graphs, facilitating coherent explanations for the review comments. Extensive experiments show that SEAGraph excels in review comment understanding tasks, offering significant benefits to authors.
Variational Graph Autoencoder for Heterogeneous Information Networks with Missing and Inaccurate Attributes
Zhao, Yige, Yu, Jianxiang, Cheng, Yao, Yu, Chengcheng, Liu, Yiding, Li, Xiang, Wang, Shuaiqiang
Heterogeneous Information Networks (HINs), which consist of various types of nodes and edges, have recently demonstrated excellent performance in graph mining. However, most existing heterogeneous graph neural networks (HGNNs) ignore the problems of missing attributes, inaccurate attributes and scarce labels for nodes, which limits their expressiveness. In this paper, we propose a generative self-supervised model GraMI to address these issues simultaneously. Specifically, GraMI first initializes all the nodes in the graph with a low-dimensional representation matrix. After that, based on the variational graph autoencoder framework, GraMI learns both node-level and attribute-level embeddings in the encoder, which can provide fine-grained semantic information to construct node attributes. In the decoder, GraMI reconstructs both links and attributes. Instead of directly reconstructing raw features for attributed nodes, GraMI generates the initial low-dimensional representation matrix for all the nodes, based on which raw features of attributed nodes are further reconstructed to leverage accurate attributes. In this way, GraMI can not only complete informative features for non-attributed nodes, but rectify inaccurate ones for attributed nodes. Finally, we conduct extensive experiments to show the superiority of GraMI in tackling HINs with missing and inaccurate attributes.
Query-Efficient Adversarial Attack Against Vertical Federated Graph Learning
Chen, Jinyin, Mu, Wenbo, Zhang, Luxin, Huang, Guohan, Zheng, Haibin, Cheng, Yao
Graph neural network (GNN) has captured wide attention due to its capability of graph representation learning for graph-structured data. However, the distributed data silos limit the performance of GNN. Vertical federated learning (VFL), an emerging technique to process distributed data, successfully makes GNN possible to handle the distributed graph-structured data. Despite the prosperous development of vertical federated graph learning (VFGL), the robustness of VFGL against the adversarial attack has not been explored yet. Although numerous adversarial attacks against centralized GNNs are proposed, their attack performance is challenged in the VFGL scenario. To the best of our knowledge, this is the first work to explore the adversarial attack against VFGL. A query-efficient hybrid adversarial attack framework is proposed to significantly improve the centralized adversarial attacks against VFGL, denoted as NA2, short for Neuron-based Adversarial Attack. Specifically, a malicious client manipulates its local training data to improve its contribution in a stealthy fashion. Then a shadow model is established based on the manipulated data to simulate the behavior of the server model in VFGL. As a result, the shadow model can improve the attack success rate of various centralized attacks with a few queries. Extensive experiments on five real-world benchmarks demonstrate that NA2 improves the performance of the centralized adversarial attacks against VFGL, achieving state-of-the-art performance even under potential adaptive defense where the defender knows the attack method. Additionally, we provide interpretable experiments of the effectiveness of NA2 via sensitive neurons identification and visualization of t-SNE.
ROADFIRST: A Comprehensive Enhancement of the Systemic Approach to Safety for Improved Risk Factor Identification and Evaluation
Reyya, Shriyan, Cheng, Yao
Many agencies have adopted the FHWA-recommended systemic approach to traffic safety, an essential supplement to the traditional hotspot crash analysis which develops region-wide safety projects based on identified risk factors. However, this approach narrows analysis to specific crash and facility types. This specification causes inefficient use of crash and inventory data as well as non-comprehensive risk evaluation and countermeasure selection for each location. To improve the comprehensiveness of the systemic approach to safety, we develop an enhanced process, ROADFIRST, that allows users to identify potential crash types and contributing factors at any location. As the knowledge base for such a process, crash types and contributing factors are analyzed with respect to features of interest, including both dynamic and static traffic-related features, using Random Forest and analyzed with the SHapley Additive exPlanations (SHAP) analysis. We identify and rank features impacting the likelihood of three sample contributing factors, namely alcohol-impaired driving, distracted driving, and speeding, according to crash and road inventory data from North Carolina, and quantify state-wide road segment risk for each contributing factor. The introduced models and methods serve as a sample for the further development of ROADFIRST by state and local agencies, which benefits the planning of more comprehensive region-wide safety improvement projects.
Improving Graph Out-of-distribution Generalization on Real-world Data
Xu, Can, Cheng, Yao, Yu, Jianxiang, Wang, Haosen, Lv, Jingsong, Li, Xiang
Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environments and invariant sub-graphs, this paper presents the theorems of environment-label dependency and mutable rationale invariance, where the former characterizes the usefulness of environments in determining graph labels while the latter refers to the mutable importance of graph rationales. Based on analytic investigations, a novel variational inference based method named ``Probability Dependency on Environments and Rationales for OOD Graphs on Real-world Data'' (DEROG) is introduced. To alleviate the adverse effect of unknown prior knowledge on environments and rationales, DEROG utilizes generalized Bayesian inference. Further, DEROG employs an EM-based algorithm for optimization. Finally, extensive experiments on real-world datasets under different distribution shifts are conducted to show the superiority of DEROG. Our code is publicly available at https://anonymous.4open.science/r/DEROG-536B.
G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning
Dai, Ruiting, Tan, Yuqiao, Mo, Lisi, Liang, Shuang, Huo, Guohao, Luo, Jiayi, Cheng, Yao
Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability. Some studies have explored combining LMs with Knowledge Graphs(KGs) by coarsely fusing the two modalities to perform Graph Neural Network(GNN)-based reasoning that lacks a profound interaction between heterogeneous modalities. In this paper, we propose a novel Graph-based Structure-Aware Prompt Learning Model for commonsense reasoning, named G-SAP, aiming to maintain a balance between heterogeneous knowledge and enhance the cross-modal interaction within the LM+GNNs model. In particular, an evidence graph is constructed by integrating multiple knowledge sources, i.e. ConceptNet, Wikipedia, and Cambridge Dictionary to boost the performance. Afterward, a structure-aware frozen PLM is employed to fully incorporate the structured and textual information from the evidence graph, where the generation of prompts is driven by graph entities and relations. Finally, a heterogeneous message-passing reasoning module is used to facilitate deep interaction of knowledge between the LM and graph-based networks. Empirical validation, conducted through extensive experiments on three benchmark datasets, demonstrates the notable performance of the proposed model. The results reveal a significant advancement over the existing models, especially, with 6.12% improvement over the SoTA LM+GNNs model on the OpenbookQA dataset.