Goto

Collaborating Authors

 Expert Systems


Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

Neural Information Processing Systems

The ability of Language Models (LMs) to understand natural language makes them a powerful tool for parsing human instructions into task plans for autonomous robots. Unlike traditional planning methods that rely on domain-specific knowledge and handcrafted rules, LMs generalize from diverse data and adapt to various tasks with minimal tuning, acting as a compressed knowledge base. We propose an LM-based Long-Horizon Planner for Multi-Agent Robotics (LLaMAR), a cognitive architecture for planning that achieves state-of-the-art results in long-horizon tasks within partially observable environments. LLaMAR employs a plan-act-correct-verify framework, allowing self-correction from action execution feedback without relying on oracles or simulators. Experiments show that LLaMAR achieves a 30\% higher success rate than other state-of-the-art LM-based multi-agent planners in MAP-THOR and Search \& Rescue tasks.


From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents

arXiv.org Artificial Intelligence

We introduce CTIM-Rover, an AI agent for Software Engineering (SE) built on top of AutoCodeRover (Zhang et al., 2024) that extends agentic reasoning frameworks with an episodic memory, more specifically, a general and repository-level Cross-Task-Instance Memory (CTIM). While existing open-source SE agents mostly rely on ReAct (Yao et al., 2023b), Reflexion (Shinn et al., 2023), or Code-Act (Wang et al., 2024), all of these reasoning and planning frameworks inefficiently discard their long-term memory after a single task instance. As repository-level understanding is pivotal for identifying all locations requiring a patch for fixing a bug, we hypothesize that SE is particularly well positioned to benefit from CTIM. For this, we build on the Experiential Learning (EL) approach ExpeL (Zhao et al., 2024), proposing a Mixture-Of-Experts (MoEs) inspired approach to create both a general-purpose and repository-level CTIM. We find that CTIM-Rover does not outperform AutoCodeRover in any configuration and thus conclude that neither ExpeL nor DoT-Bank (Lingam et al., 2024) scale to real-world SE problems. Our analysis indicates noise introduced by distracting CTIM items or exemplar trajectories as the likely source of the performance degradation.


Graph Random Walk with Feature-Label Space Alignment: A Multi-Label Feature Selection Method

arXiv.org Artificial Intelligence

The rapid growth in feature dimension may introduce implicit associations between features and labels in multi-label datasets, making the relationships between features and labels increasingly complex. Moreover, existing methods often adopt low-dimensional linear decomposition to explore the associations between features and labels. However, linear decomposition struggles to capture complex nonlinear associations and may lead to misalignment between the feature space and the label space. To address these two critical challenges, we propose innovative solutions. First, we design a random walk graph that integrates feature-feature, label-label, and feature-label relationships to accurately capture nonlinear and implicit indirect associations, while optimizing the latent representations of associations between features and labels after low-rank decomposition. Second, we align the variable spaces by leveraging low-dimensional representation coefficients, while preserving the manifold structure between the original high-dimensional multi-label data and the low-dimensional representation space. Extensive experiments and ablation studies conducted on seven benchmark datasets and three representative datasets using various evaluation metrics demonstrate the superiority of the proposed method\footnote{Code: https://github.com/Heilong623/-GRW-}.


Knowledge Base Construction for Knowledge-Augmented Text-to-SQL

arXiv.org Artificial Intelligence

Text-to-SQL aims to translate natural language queries into SQL statements, which is practical as it enables anyone to easily retrieve the desired information from databases. Recently, many existing approaches tackle this problem with Large Language Models (LLMs), leveraging their strong capability in understanding user queries and generating corresponding SQL code. Yet, the parametric knowledge in LLMs might be limited to covering all the diverse and domain-specific queries that require grounding in various database schemas, which makes generated SQLs less accurate oftentimes. To tackle this, we propose constructing the knowledge base for text-to-SQL, a foundational source of knowledge, from which we retrieve and generate the necessary knowledge for given queries. In particular, unlike existing approaches that either manually annotate knowledge or generate only a few pieces of knowledge for each query, our knowledge base is comprehensive, which is constructed based on a combination of all the available questions and their associated database schemas along with their relevant knowledge, and can be reused for unseen databases from different datasets and domains. We validate our approach on multiple text-to-SQL datasets, considering both the overlapping and non-overlapping database scenarios, where it outperforms relevant baselines substantially.


Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

arXiv.org Artificial Intelligence

Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a static retrieval pipeline that fetches relevant information from multiple Knowledge Bases (KBs), followed by a refinement step. However, these approaches overlook the reasoning and planning capabilities of MLLMs to dynamically determine how to interact with different KBs during the reasoning process. To address this limitation, we propose R1-Router, a novel MRAG framework that learns to decide when and where to retrieve knowledge based on the evolving reasoning state. Specifically, R1-Router can generate follow-up queries according to the current reasoning step, routing these intermediate queries to the most suitable KB, and integrating external knowledge into a coherent reasoning trajectory to answer the original query. Furthermore, we introduce Step-wise Group Relative Policy Optimization (Step-GRPO), a tailored reinforcement learning algorithm that assigns step-specific rewards to optimize the reasoning behavior of MLLMs. Experimental results on various open-domain QA benchmarks across multiple modalities demonstrate that R1-Router outperforms baseline models by over 7%. Further analysis shows that R1-Router can adaptively and effectively leverage diverse KBs, reducing unnecessary retrievals and improving both efficiency and accuracy.


A Human-Centric Approach to Explainable AI for Personalized Education

arXiv.org Artificial Intelligence

Deep neural networks form the backbone of artificial intelligence research, with potential to transform the human experience in areas ranging from autonomous driving to personal assistants, healthcare to education. However, their integration into the daily routines of real-world classrooms remains limited. It is not yet common for a teacher to assign students individualized homework targeting their specific weaknesses, provide students with instant feedback, or simulate student responses to a new exam question. While these models excel in predictive performance, this lack of adoption can be attributed to a significant weakness: the lack of explainability of model decisions, leading to a lack of trust from students, parents, and teachers. This thesis aims to bring human needs to the forefront of eXplainable AI (XAI) research, grounded in the concrete use case of personalized learning and teaching. We frame the contributions along two verticals: technical advances in XAI and their aligned human studies. We investigate explainability in AI for education, revealing systematic disagreements between post-hoc explainers and identifying a need for inherently interpretable model architectures. We propose four novel technical contributions in interpretability with a multimodal modular architecture (MultiModN), an interpretable mixture-of-experts model (InterpretCC), adversarial training for explainer stability, and a theory-driven LLM-XAI framework to present explanations to students (iLLuMinaTE), which we evaluate in diverse settings with professors, teachers, learning scientists, and university students. By combining empirical evaluations of existing explainers with novel architectural designs and human studies, our work lays a foundation for human-centric AI systems that balance state-of-the-art performance with built-in transparency and trust.


From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications

arXiv.org Artificial Intelligence

With the advent of 6G communications, intelligent communication systems face multiple challenges, including constrained perception and response capabilities, limited scalability, and low adaptability in dynamic environments. This tutorial provides a systematic introduction to the principles, design, and applications of Large Artificial Intelligence Models (LAMs) and Agentic AI technologies in intelligent communication systems, aiming to offer researchers a comprehensive overview of cutting-edge technologies and practical guidance. First, we outline the background of 6G communications, review the technological evolution from LAMs to Agentic AI, and clarify the tutorial's motivation and main contributions. Subsequently, we present a comprehensive review of the key components required for constructing LAMs. We further categorize LAMs and analyze their applicability, covering Large Language Models (LLMs), Large Vision Models (LVMs), Large Multimodal Models (LMMs), Large Reasoning Models (LRMs), and lightweight LAMs. Next, we propose a LAM-centric design paradigm tailored for communications, encompassing dataset construction and both internal and external learning approaches. Building upon this, we develop an LAM-based Agentic AI system for intelligent communications, clarifying its core components such as planners, knowledge bases, tools, and memory modules, as well as its interaction mechanisms. We also introduce a multi-agent framework with data retrieval, collaborative planning, and reflective evaluation for 6G. Subsequently, we provide a detailed overview of the applications of LAMs and Agentic AI in communication scenarios. Finally, we summarize the research challenges and future directions in current studies, aiming to support the development of efficient, secure, and sustainable next-generation intelligent communication systems.


Chinese Cyberbullying Detection: Dataset, Method, and Validation

arXiv.org Artificial Intelligence

Existing cyberbullying detection benchmarks were organized by the polarity of speech, such as "offensive" and "non-offensive", which were essentially hate speech detection. However, in the real world, cyberbullying often attracted widespread social attention through incidents. To address this problem, we propose a novel annotation method to construct a cyberbullying dataset that organized by incidents. The constructed CHNCI is the first Chinese cyberbullying incident detection dataset, which consists of 220,676 comments in 91 incidents. Specifically, we first combine three cyber-bullying detection methods based on explanations generation as an ensemble method to generate the pseudo labels, and then let human annotators judge these labels. Then we propose the evaluation criteria for validating whether it constitutes a cyberbul-lying incident. Experimental results demonstrate that the constructed dataset can be a benchmark for the tasks of cyberbullying detection and incident prediction. To the best of our knowledge, this is the first study for the Chinese cyberbullying incident detection task.


RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

arXiv.org Artificial Intelligence

Legal Judgment Prediction (LJP) is a pivotal task in legal AI. Existing semantic-enhanced LJP models integrate judicial precedents and legal knowledge for high performance. But they neglect legal reasoning logic, a critical component of legal judgments requiring rigorous logical analysis. Although some approaches utilize legal reasoning logic for high-quality predictions, their logic rigidity hinders adaptation to case-specific logical frameworks, particularly in complex cases that are lengthy and detailed. This paper proposes a rule-enhanced legal judgment prediction framework based on first-order logic (FOL) formalism and comparative learning (CL) to develop an adaptive adjustment mechanism for legal judgment logic and further enhance performance in LJP. Inspired by the process of human exam preparation, our method follows a three-stage approach: first, we initialize judgment rules using the FOL formalism to capture complex reasoning logic accurately; next, we propose a Confusion-aware Contrastive Learning (CACL) to dynamically optimize the judgment rules through a quiz consisting of confusable cases; finally, we utilize the optimized judgment rules to predict legal judgments. Experimental results on two public datasets show superior performance across all metrics. The code is publicly available{https://anonymous.4open.science/r/RLJP-FDF1}.


TeroSeek: An AI-Powered Knowledge Base and Retrieval Generation Platform for Terpenoid Research

arXiv.org Artificial Intelligence

Terpenoids repre sent a pivotal class of natural products that have garnered su stained scientific interest for over 150 years . However, the inherently interdisciplinary nature of terpenoid research -- spanning fields such as chemistry, pharmacology, and biology -- poses significant challenges in integrat ing and communicati ng domain - specific knowledge across disciplines . To bridge this gap, we present TeroSeek, first by systematically extracting key scientific data and findings from terpenoid - related literature pub lished over the past two decades to construct a cura ted knowledge base (KB), and then further develop ing an intelligent question - answering chatbot and web service powered by an AI - accelerated retrieval - augmented generation (RAG) framework . TeroSeek en able s rapid access to structured, high - quality information and accurately respon ds to a wide range of terpenoid - related queries, demonstrat ing superior performance over general - purpose large language models (LLMs) in various application scenarios . T here fore, we believe that TeroSeek serves as a powerful domain - specific expert model to support the multidisciplinary terpenoid research community . The TeroSeek web service is publicly accessible at http://teroseek.qmclab.com .