red block
CIVIL: Causal and Intuitive Visual Imitation Learning
Dai, Yinlong, Sanchez, Robert Ramirez, Jeronimus, Ryan, Sagheb, Shahabedin, Nunez, Cara M., Nemlekar, Heramb, Losey, Dylan P.
Today's robots attempt to learn new tasks by imitating human examples. These robots watch the human complete the task, and then try to match the actions taken by the human expert. However, this standard approach to visual imitation learning is fundamentally limited: the robot observes what the human does, but not why the human chooses those behaviors. Without understanding which features of the system or environment factor into the human's decisions, robot learners often misinterpret the human's examples. In practice, this results in causal confusion, inefficient learning, and robot policies that fail when the environment changes. We therefore propose a shift in perspective: instead of asking human teachers just to show what actions the robot should take, we also enable humans to intuitively indicate why they made those decisions. Under our paradigm human teachers attach markers to task-relevant objects and use natural language prompts to describe their state representation. Our proposed algorithm, CIVIL, leverages this augmented demonstration data to filter the robot's visual observations and extract a feature representation that aligns with the human teacher. CIVIL then applies these causal features to train a transformer-based policy that -- when tested on the robot -- is able to emulate human behaviors without being confused by visual distractors or irrelevant items. Our simulations and real-world experiments demonstrate that robots trained with CIVIL learn both what actions to take and why to take those actions, resulting in better performance than state-of-the-art baselines. From the human's perspective, our user study reveals that this new training paradigm actually reduces the total time required for the robot to learn the task, and also improves the robot's performance in previously unseen scenarios. See videos at our project website: https://civil2025.github.io
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
Deliberate Planning in Language Models with Symbolic Representation
Xiong, Siheng, Liu, Zhangding, Zhou, Jieyu, Su, Yusen
Planning remains a core challenge for large language models (LLMs), particularly in domains that require coherent multi-step action sequences grounded in external constraints. We introduce SymPlanner, a novel framework that equips LLMs with structured planning capabilities by interfacing them with a symbolic environment that serves as an explicit world model. Rather than relying purely on natural language reasoning, SymPlanner grounds the planning process in a symbolic state space, where a policy model proposes actions and a symbolic environment deterministically executes and verifies their effects. To enhance exploration and improve robustness, we introduce Iterative Correction (IC), which refines previously proposed actions by leveraging feedback from the symbolic environment to eliminate invalid decisions and guide the model toward valid alternatives. Additionally, Contrastive Ranking (CR) enables fine-grained comparison of candidate plans by evaluating them jointly. Conceptually, SymPlanner operationalizes two cognitive faculties: (i) error monitoring and repair via externalized feedback (IC) and (ii) preference formation among alternatives via pairwise comparison (CR), advancing cognitively plausible, symbol-grounded planning aligned with the rich structure in intelligent systems. We evaluate SymPlanner on PlanBench, demonstrating that it produces more coherent, diverse, and verifiable plans than pure natural language baselines.
Learning Compositional Behaviors from Demonstration and Language
Liu, Weiyu, Nie, Neil, Zhang, Ruohan, Mao, Jiayuan, Wu, Jiajun
We introduce Behavior from Language and Demonstration (BLADE), a framework for long-horizon robotic manipulation by integrating imitation learning and model-based planning. BLADE leverages language-annotated demonstrations, extracts abstract action knowledge from large language models (LLMs), and constructs a library of structured, high-level action representations. These representations include preconditions and effects grounded in visual perception for each high-level action, along with corresponding controllers implemented as neural network-based policies. BLADE can recover such structured representations automatically, without manually labeled states or symbolic definitions. BLADE shows significant capabilities in generalizing to novel situations, including novel initial states, external state perturbations, and novel goals. We validate the effectiveness of our approach both in simulation and on real robots with a diverse set of objects with articulated parts, partial observability, and geometric constraints.
- Research Report (0.50)
- Workflow (0.46)
Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language
Li, Naiqi, Liu, Peiyuan, Liu, Zheng, Dai, Tao, Jiang, Yong, Xia, Shu-Tao
Solving puzzles in natural language poses a long-standing challenge in AI. While large language models (LLMs) have recently shown impressive capabilities in a variety of tasks, they continue to struggle with complex puzzles that demand precise reasoning and exhaustive search. In this paper, we propose Logic-of-Thought (Logot), a novel framework that bridges LLMs with logic programming to address this problem. Our method leverages LLMs to translate puzzle rules and states into answer set programs (ASPs), the solution of which are then accurately and efficiently inferred by an ASP interpreter. This hybrid approach combines the natural language understanding of LLMs with the precise reasoning capabilities of logic programs. We evaluate our method on various grid puzzles and dynamic puzzles involving actions, demonstrating near-perfect accuracy across all tasks. Our code and data are available at: https://github.com/naiqili/Logic-of-Thought.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities
Zhou, Xueyang, Tie, Guiyao, Zhang, Guowen, Wang, Weidong, Zuo, Zhigang, Wu, Di, Chu, Duanfeng, Zhou, Pan, Sun, Lichao, Gong, Neil Zhenqiang
The rise of Large Reasoning Models (LRMs) signifies a paradigm shift toward advanced computational reasoning. Yet, this progress disrupts traditional agent frameworks, traditionally anchored by execution-oriented Large Language Models (LLMs). To explore this transformation, we propose the LaRMA framework, encompassing nine tasks across Tool Usage, Plan Design, and Problem Solving, assessed with three top LLMs (e.g., Claude3.5-sonnet) and five leading LRMs (e.g., DeepSeek-R1). Our findings address four research questions: LRMs surpass LLMs in reasoning-intensive tasks like Plan Design, leveraging iterative reflection for superior outcomes; LLMs excel in execution-driven tasks such as Tool Usage, prioritizing efficiency; hybrid LLM-LRM configurations, pairing LLMs as actors with LRMs as reflectors, optimize agent performance by blending execution speed with reasoning depth; and LRMs' enhanced reasoning incurs higher computational costs, prolonged processing, and behavioral challenges, including overthinking and fact-ignoring tendencies. This study fosters deeper inquiry into LRMs' balance of deep thinking and overthinking, laying a critical foundation for future agent design advancements.
- North America > United States > Colorado (0.04)
- Europe > France (0.04)
- North America > United States > New York (0.04)
- (7 more...)
- Health & Medicine (0.67)
- Leisure & Entertainment (0.67)
- Consumer Products & Services (0.46)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Complex LLM Planning via Automated Heuristics Discovery
Ling, Hongyi, Parashar, Shubham, Khurana, Sambhav, Olson, Blake, Basu, Anwesha, Sinha, Gaurangi, Tu, Zhengzhong, Caverlee, James, Ji, Shuiwang
We consider enhancing large language models (LLMs) for complex planning tasks. While existing methods allow LLMs to explore intermediate steps to make plans, they either depend on unreliable self-verification or external verifiers to evaluate these steps, which demand significant data and computations. Here, we propose automated heuristics discovery (AutoHD), a novel approach that enables LLMs to explicitly generate heuristic functions to guide inference-time search, allowing accurate evaluation of intermediate states. These heuristic functions are further refined through a heuristic evolution process, improving their robustness and effectiveness. Our proposed method requires no additional model training or fine-tuning, and the explicit definition of heuristic functions generated by the LLMs provides interpretability and insights into the reasoning process. Extensive experiments across diverse benchmarks demonstrate significant gains over multiple baselines, including nearly twice the accuracy on some datasets, establishing our approach as a reliable and interpretable solution for complex planning tasks.
- North America > United States > Texas (0.04)
- Asia > Middle East > Jordan (0.04)