Goto

Collaborating Authors

 Problem Solving


Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases

arXiv.org Artificial Intelligence

Text-rich Graph Knowledge Bases (TG-KBs) have become increasingly crucial for answering queries by providing textual and structural knowledge. However, current retrieval methods often retrieve these two types of knowledge in isolation without considering their mutual reinforcement and some hybrid methods even bypass structural retrieval entirely after neighboring aggregation. To fill in this gap, we propose a Mixture of Structural-and-Textual Retrieval (MoR) to retrieve these two types of knowledge via a Planning-Reasoning-Organizing framework. In the Planning stage, MoR generates textual planning graphs delineating the logic for answering queries. Following planning graphs, in the Reasoning stage, MoR interweaves structural traversal and textual matching to obtain candidates from TG-KBs. In the Organizing stage, MoR further reranks fetched candidates based on their structural trajectory. Extensive experiments demonstrate the superiority of MoR in harmonizing structural and textual retrieval with insights, including uneven retrieving performance across different query logics and the benefits of integrating structural trajectories for candidate reranking. Our code is available at https://github.com/Yoega/MoR.


Unlocking Generalization for Robotics via Modularity and Scale

arXiv.org Artificial Intelligence

How can we build generalist robot systems? Scale may not be enough due to the significant multimodality of robotics tasks, lack of easily accessible data and the challenges of deploying on physical hardware. Meanwhile, most deployed robotic systems today are inherently modular and can leverage the independent generalization capabilities of each module to perform well. Therefore, this thesis seeks to tackle the task of building generalist robot agents by integrating these components into one: combining modularity with large-scale learning for general purpose robot control. The first question we consider is: how can we build modularity and hierarchy into learning systems? Our key insight is that rather than having the agent learn hierarchy and low-level control end-to-end, we can enforce modularity via planning to enable more efficient and capable robot learners. Next, we come to the role of scale in building generalist robot systems. To scale, neural networks require vast amounts of diverse data, expressive architectures to fit the data and a source of supervision to generate the data. We leverage a powerful supervision source: classical planning, which can generalize, but is expensive to run and requires access to privileged information to perform well in practice. We use these planners to supervise large-scale policy learning in simulation to produce generalist agents. Finally, we consider how to unify modularity with large-scale policy learning to build real-world robot systems capable of performing zero-shot manipulation. We do so by tightly integrating key ingredients of modular high and mid-level planning, learned local control, procedural scene generation and large-scale policy learning for sim2real transfer. We demonstrate that this recipe can produce a single, generalist agent that can solve challenging long-horizon manipulation tasks in the real world.


Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

arXiv.org Artificial Intelligence

In this paper, we address the challenging task of multimodal mathematical reasoning by incorporating the ability of "slow thinking" into multimodal large language models (MLLMs). Our core idea is that different levels of reasoning abilities can be combined dynamically to tackle questions with different complexity. To this end, we propose a paradigm of Self-structured Chain of Thought (SCoT), which is composed of minimal semantic atomic steps. Different from existing methods that rely on structured templates or free-form paradigms, our method can not only generate cognitive CoT structures for various complex tasks but also mitigates the phenomenon of overthinking. To introduce structured reasoning capabilities into visual understanding models, we further design a novel AtomThink framework with four key modules, including (i) a data engine to generate high-quality multimodal reasoning paths; (ii) a supervised fine-tuning process with serialized inference data; (iii) a policy-guided multi-turn inference method; and (iv) an atomic capability metric to evaluate the single step utilization rate. We conduct extensive experiments to show that the proposed AtomThink significantly improves the performance of baseline MLLMs, achieving more than 10\% average accuracy gains on MathVista and MathVerse. Compared to state-of-the-art structured CoT approaches, our method not only achieves higher accuracy but also improves data utilization by 5 times and boosts inference efficiency by 85.3\%. Our code is now public available in https://github.com/Quinn777/AtomThink.


Pre-Training Meta-Rule Selection Policy for Visual Generative Abductive Learning

arXiv.org Artificial Intelligence

Visual generative abductive learning studies jointly training symbol-grounded neural visual generator and inducing logic rules from data, such that after learning, the visual generation process is guided by the induced logic rules. A major challenge for this task is to reduce the time cost of logic abduction during learning, an essential step when the logic symbol set is large and the logic rule to induce is complicated. To address this challenge, we propose a pre-training method for obtaining meta-rule selection policy for the recently proposed visual generative learning approach AbdGen [Peng et al., 2023], aiming at significantly reducing the candidate meta-rule set and pruning the search space. The selection model is built based on the embedding representation of both symbol grounding of cases and meta-rules, which can be effectively integrated with both neural model and logic reasoning system. The pre-training process is done on pure symbol data, not involving symbol grounding learning of raw visual inputs, making the entire learning process low-cost. An additional interesting observation is that the selection policy can rectify symbol grounding errors unseen during pre-training, which is resulted from the memorization ability of attention mechanism and the relative stability of symbolic patterns. Experimental results show that our method is able to effectively address the meta-rule selection problem for visual abduction, boosting the efficiency of visual generative abductive learning.


GenAI for Simulation Model in Model-Based Systems Engineering

arXiv.org Artificial Intelligence

Generative AI (GenAI) has demonstrated remarkable capabilities in code generation, and its integration into complex product modeling and simulation code generation can significantly enhance the efficiency of the system design phase in Model-Based Systems Engineering (MBSE). In this study, we introduce a generative system design methodology framework for MBSE, offering a practical approach for the intelligent generation of simulation models for system physical properties. First, we employ inference techniques, generative models, and integrated modeling and simulation languages to construct simulation models for system physical properties based on product design documents. Subsequently, we fine-tune the language model used for simulation model generation on an existing library of simulation models and additional datasets generated through generative modeling. Finally, we introduce evaluation metrics for the generated simulation models for system physical properties. Our proposed approach to simulation model generation presents the innovative concept of scalable templates for simulation models. Using these templates, GenAI generates simulation models for system physical properties through code completion. The experimental results demonstrate that, for mainstream open-source Transformer-based models, the quality of the simulation model is significantly improved using the simulation model generation method proposed in this paper.


Towards Conversational AI for Disease Management

arXiv.org Artificial Intelligence

While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.


STAR: A Foundation Model-driven Framework for Robust Task Planning and Failure Recovery in Robotic Systems

arXiv.org Artificial Intelligence

Modern robotic systems, deployed across domains from industrial automation to domestic assistance, face a critical challenge: executing tasks with precision and adaptability in dynamic, unpredictable environments. To address this, we propose STAR (Smart Task Adaptation and Recovery), a novel framework that synergizes Foundation Models (FMs) with dynamically expanding Knowledge Graphs (KGs) to enable resilient task planning and autonomous failure recovery. While FMs offer remarkable generalization and contextual reasoning, their limitations, including computational inefficiency, hallucinations, and output inconsistencies hinder reliable deployment. STAR mitigates these issues by embedding learned knowledge into structured, reusable KGs, which streamline information retrieval, reduce redundant FM computations, and provide precise, scenario-specific insights. The framework leverages FM-driven reasoning to diagnose failures, generate context-aware recovery strategies, and execute corrective actions without human intervention or system restarts. Unlike conventional approaches that rely on rigid protocols, STAR dynamically expands its KG with experiential knowledge, ensuring continuous adaptation to novel scenarios. To evaluate the effectiveness of this approach, we developed a comprehensive dataset that includes various robotic tasks and failure scenarios. Through extensive experimentation, STAR demonstrated an 86% task planning accuracy and 78% recovery success rate, showing significant improvements over baseline methods. The framework's ability to continuously learn from experience while maintaining structured knowledge representation makes it particularly suitable for long-term deployment in real-world applications.


Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

arXiv.org Artificial Intelligence

Recent advances in large language models have demonstrated remarkable reasoning capabilities through Chain of Thought (CoT) prompting, but often at the cost of excessive verbosity in their intermediate outputs, which increases computational overhead. We introduce Sketch-of-Thought (SoT), a novel prompting framework that combines cognitive-inspired reasoning paradigms with linguistic constraints to minimize token usage while preserving reasoning accuracy. SoT is designed as a flexible framework that can incorporate any custom reasoning paradigms based on cognitive science, and we instantiate it with three such paradigms - Conceptual Chaining, Chunked Symbolism, and Expert Lexicons - each tailored to different reasoning tasks and selected dynamically via a lightweight routing model. Through comprehensive evaluation across 15 reasoning datasets with multiple languages and multimodal scenarios, we demonstrate that SoT achieves token reductions of 76% with negligible accuracy impact. In certain domains like mathematical and multi-hop reasoning, it even improves accuracy while using significantly fewer tokens. Our code is publicly available: https://www.github.com/SimonAytes/SoT.


Speculative Decoding for Multi-Sample Inference

arXiv.org Artificial Intelligence

We propose a novel speculative decoding method tailored for multi-sample reasoning scenarios, such as self-consistency and Best-of-N sampling. Our method exploits the intrinsic consensus of parallel generation paths to synthesize high-quality draft tokens without requiring auxiliary models or external databases. By dynamically analyzing structural patterns across parallel reasoning paths through a probabilistic aggregation mechanism, it identifies consensus token sequences that align with the decoding distribution. Evaluations on mathematical reasoning benchmarks demonstrate a substantial improvement in draft acceptance rates over baselines, while reducing the latency in draft token construction. This work establishes a paradigm shift for efficient multi-sample inference, enabling seamless integration of speculative decoding with sampling-based reasoning techniques.


Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty Environments

arXiv.org Artificial Intelligence

Monte Carlo Tree Search (MCTS) has emerged as a powerful tool for decision-making in robotics, enabling efficient exploration of large search spaces. However, traditional MCTS methods struggle in environments characterized by high uncertainty and noisy data due to their reliance on final-step reward evaluation. The lack of intermediate feedback during search often results in suboptimal decision-making and computational inefficiencies. This paper introduces Reward-Centered ReST-MCTS, a novel framework that enhances MCTS by incorporating intermediate reward shaping. The core of our approach is the Rewarding Center, which refines search trajectories by dynamically assigning partial rewards using rule-based validation, heuristic guidance, and neural estimation. By integrating these mechanisms, our method enables real-time optimization of search paths, mitigating the effects of error propagation. We evaluate Reward-Centered ReST-MCTS in robotic manipulation tasks under high uncertainty, demonstrating consistent improvements in decision accuracy. Compared to baseline methods, including Chain-of-Thought (CoT) prompting and Vanilla ReST-MCTS, our framework achieves a 2-4% accuracy improvement while maintaining computational feasibility. Ablation studies confirm the effectiveness of intermediate feedback in search refinement, particularly in pruning incorrect decision paths early. Furthermore, robustness tests show that our method retains high performance across varying levels of uncertainty.