AITopics | Pan, Zhenyu

Collaborating Authors

Pan, Zhenyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse

Pan, Zhenyu, Liu, Han

arXiv.org Artificial IntelligenceMar-24-2025

We present MetaSpatial, the first reinforcement learning (RL)-based framework designed to enhance 3D spatial reasoning in vision-language models (VLMs), enabling real-time 3D scene generation without the need for hard-coded optimizations. MetaSpatial addresses two core challenges: (i) the lack of internalized 3D spatial reasoning in VLMs, which limits their ability to generate realistic layouts, and (ii) the inefficiency of traditional supervised fine-tuning (SFT) for layout generation tasks, as perfect ground truth annotations are unavailable. Our key innovation is a multi-turn RL-based optimization mechanism that integrates physics-aware constraints and rendered image evaluations, ensuring generated 3D layouts are coherent, physically plausible, and aesthetically consistent. Methodologically, MetaSpatial introduces an adaptive, iterative reasoning process, where the VLM refines spatial arrangements over multiple turns by analyzing rendered outputs, improving scene coherence progressively. Empirical evaluations demonstrate that MetaSpatial significantly enhances the spatial consistency and formatting stability of various scale models. Post-training, object placements are more realistic, aligned, and functionally coherent, validating the effectiveness of RL for 3D spatial reasoning in metaverse, AR/VR, digital twins, and game development applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.1847

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Retrieval-Augmented Generation with Hierarchical Knowledge

Huang, Haoyu, Huang, Yongfeng, Yang, Junjie, Pan, Zhenyu, Chen, Yongqiang, Ma, Kaili, Chen, Hongzhi, Cheng, James

arXiv.org Artificial IntelligenceMar-13-2025

Graph-based Retrieval-Augmented Generation (RAG) methods have significantly enhanced the performance of large language models (LLMs) in domain-specific tasks. However, existing RAG methods do not adequately utilize the naturally inherent hierarchical knowledge in human cognition, which limits the capabilities of RAG systems. In this paper, we introduce a new RAG approach, called HiRAG, which utilizes hierarchical knowledge to enhance the semantic understanding and structure capturing capabilities of RAG systems in the indexing and retrieval processes. Our extensive experiments demonstrate that HiRAG achieves significant performance improvements over the state-of-the-art baseline methods. The code of our proposed method is available at \href{https://github.com/hhy-huang/HiRAG}{https://github.com/hhy-huang/HiRAG}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.1015

Country: Asia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Do Code LLMs Understand Design Patterns?

Pan, Zhenyu, Song, Xuefeng, Wang, Yunkun, Cao, Rongyu, Li, Binhua, Li, Yongbin, Liu, Han

arXiv.org Artificial IntelligenceJan-8-2025

Code Large Language Models (LLMs) demonstrate great versatility in adapting to various downstream tasks, including code generation and completion, as well as bug detection and fixing. However, Code LLMs often fail to capture existing coding standards, leading to the generation of code that conflicts with the required design patterns for a given project. As a result, developers must post-process to adapt the generated code to the project's design norms. In this work, we empirically investigate the biases of Code LLMs in software development. Through carefully designed experiments, we assess the models' understanding of design patterns across recognition, comprehension, and generation. Our findings reveal that biases in Code LLMs significantly affect the reliability of downstream tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.04835

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?

Pan, Zhenyu, Cao, Rongyu, Cao, Yongchang, Ma, Yingwei, Li, Binhua, Huang, Fei, Liu, Han, Li, Yongbin

arXiv.org Artificial IntelligenceOct-24-2024

Code completion, a key downstream task in code generation, is one of the most frequent and impactful methods for enhancing developer productivity in software development. As intelligent completion tools evolve, we need a robust evaluation benchmark that enables meaningful comparisons between products and guides future advancements. However, existing benchmarks focus more on coarse-grained tasks without industrial analysis resembling general code generation rather than the real-world scenarios developers encounter. Moreover, these benchmarks often rely on costly and time-consuming human annotation, and the standalone test cases fail to leverage minimal tests for maximum repository-level understanding and code coverage. To address these limitations, we first analyze business data from an industrial code completion tool and redefine the evaluation criteria to better align with the developer's intent and desired completion behavior throughout the coding process. Based on these insights, we introduce Codev-Agent, an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage, ensuring fair and effective comparisons. Using Codev-Agent, we present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Bench assesses whether a code completion tool can capture a developer's immediate intent and suggest appropriate code across diverse contexts, providing a more realistic benchmark for code completion in modern software development.

completion, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.01353

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action

Pan, Zhenyu, Luo, Haozheng, Li, Manling, Liu, Han

arXiv.org Artificial IntelligenceMay-28-2024

We present a Conversational Chain-of-Action (Conv-CoA) framework for Open-domain Conversational Question Answering (OCQA). Compared with literature, Conv-CoA addresses three major challenges: (i) unfaithful hallucination that is inconsistent with real-time or domain facts, (ii) weak reasoning performance in conversational scenarios, and (iii) unsatisfying performance in conversational information retrieval. Our key contribution is a dynamic reasoning-retrieval mechanism that extracts the intent of the question and decomposes it into a reasoning chain to be solved via systematic prompting, pre-designed actions, updating the Contextual Knowledge Set (CKS), and a novel Hopfield-based retriever. Methodologically, we propose a resource-efficiency Hopfield retriever to enhance the efficiency and accuracy of conversational information retrieval within our actions. Additionally, we propose a conversational-multi-reference faith score (Conv-MRFS) to verify and resolve conflicts between retrieved knowledge and answers in conversations. Empirically, we conduct comparisons between our framework and 23 state-of-the-art methods across five different research directions and two public benchmarks. These comparisons demonstrate that our Conv-CoA outperforms other methods in both the accuracy and efficiency dimensions.

large language model, machine learning, question answering, (17 more...)

arXiv.org Artificial Intelligence

2405.17822

Genre: Research Report > Promising Solution (0.66)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HeteGraph-Mamba: Heterogeneous Graph Learning via Selective State Space Model

Pan, Zhenyu, Jeong, Yoonsung, Liu, Xiaoda, Liu, Han

arXiv.org Artificial IntelligenceMay-22-2024

We propose a heterogeneous graph mamba network (HGMN) as the first exploration in leveraging the selective state space models (SSSMs) for heterogeneous graph learning. Compared with the literature, our HGMN overcomes two major challenges: (i) capturing long-range dependencies among heterogeneous nodes and (ii) adapting SSSMs to heterogeneous graph data. Our key contribution is a general graph architecture that can solve heterogeneous nodes in real-world scenarios, followed an efficient flow. Methodologically, we introduce a two-level efficient tokenization approach that first captures long-range dependencies within identical node types, and subsequently across all node types. Empirically, we conduct comparisons between our framework and 19 state-of-the-art methods on the heterogeneous benchmarks. The extensive comparisons demonstrate that our framework outperforms other methods in both the accuracy and efficiency dimensions.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2405.13915

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

Pan, Zhenyu, Luo, Haozheng, Li, Manling, Liu, Han

arXiv.org Artificial IntelligenceMar-25-2024

We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval mechanism that decomposes a complex question into a reasoning chain via systematic prompting and pre-designed actions. Methodologically, we propose three types of domain-adaptable `Plug-and-Play' actions for retrieving real-time information from heterogeneous sources. We also propose a multi-reference faith score (MRFS) to verify and resolve conflicts in the answers. Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of CoA over other methods.

information, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.17359

Genre: Research Report (1.00)

Industry:

Banking & Finance > Trading (1.00)
Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CoRMF: Criticality-Ordered Recurrent Mean Field Ising Solver

Pan, Zhenyu, Gilani, Ammar, Kuo, En-Jui, Liu, Zhuo

arXiv.org Machine LearningMar-7-2024

We propose an RNN-based efficient Ising model solver, the Criticality-ordered Recurrent Mean On one hand, the connection between NP problems and Field (CoRMF), for forward Ising problems. In Ising models has resulted in strong physics intuitions [Kirkpatrick its core, a criticality-ordered spin sequence of et al., 1983] that the hardness of these problems an N-spin Ising model is introduced by sorting emerges through the lens of complex energy landscapes mission-critical edges with greedy algorithm, over discrete random variables with multiple local minima such that an autoregressive mean-field factorization [Barahona, 1982, Chowdhury, 2014]. On the other hand, can be utilized and optimized with Recurrent the computational difficulty on the Ising side resonates with Neural Networks (RNNs). Our method the difficulties of numerous significant scientific problems, has two notable characteristics: (i) by leveraging including numerous other combinatorial decision-making the approximated tree structure of the underlying and optimization problems [Benati and Rizzi, 2007, Ngo Ising graph, the newly-obtained criticality et al., 1994, Garey and Johnson, 1979]. As the opposite order enables the unification between variational of conventional inverse Ising problems [Nguyen et al., mean-field and RNN, allowing the generally 2017, Reneau et al., 2023] that reconstruct graphical structure intractable Ising model to be efficiently from data, we refer to these problems, which have probed with probabilistic inference; (ii) it is wellmodulized, pre-specified graphical structures, as forward Ising problems model-independent while at the same (combinatorial inference and optimization problems time expressive enough, and hence fully applicable in Ising formulations [De las Cuevas and Cubitt, 2016, Lucas, to any forward Ising inference problems 2014, Pan et al., 2023]), and any efficient computational with minimal effort. Computationally, by using method or hardware solver [Mohseni et al., 2022] a variance-reduced Monte Carlo gradient estimator, for Ising models can potentially benefit them. CoRFM solves the Ising problems in a selftrain To describe the Ising model, we first introduce some notation fashion without data/evidence, and the inference here. We consider an Ising model of N spins as an exponential family model for binary N-spin data up to tasks can be executed by directly sampling quadratic sufficient statistic taking the Boltzmann form from RNN. Theoretically, we establish a provably tighter error bound than naive meanfield

artificial intelligence, ising model, machine learning, (18 more...)

arXiv.org Machine Learning

2403.03391

Country:

Asia (0.46)
North America > United States > Maryland (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback