Goto

Collaborating Authors

 Planning & Scheduling


The Formula for the Perfect Vacation Length

Slate

In this special episode of Slate Money, Felix Salmon, Elizabeth Spiers, and Emily Peck discuss how long is too long for a vacation. They dig into how time zones should factor into travel plans, how much recovery time you need on the back end of a vacation, and why you should sometimes plan a trip to do absolutely nothing. Want to hear that discussion and hear more Slate Money? Join Slate Plus to unlock weekly bonus episodes. You can subscribe directly from the Slate Money show page on Apple Podcasts and Spotify.


Imitation Learning-Based Path Generation for the Complex Assembly of Deformable Objects

arXiv.org Artificial Intelligence

This paper investigates how learning can be used to ease the design of high-quality paths for the assembly of deformable objects. Object dynamics plays an important role when manipulating deformable objects; thus, detailed models are often used when conducting motion planning for deformable objects. We propose to use human demonstrations and learning to enable motion planning of deformable objects with only simple dynamical models of the objects. In particular, we use the offline collision-free path planning, to generate a large number of reference paths based on a simple model of the deformable object. Subsequently, we execute the collision-free paths on a robot with a compliant control such that a human can slightly modify the path to complete the task successfully. Finally, based on the virtual path data sets and the human corrected ones, we use behavior cloning (BC) to create a dexterous policy that follows one reference path to finish a given task.


Don't Just Follow MLLM Plans: Robust and Efficient Planning for Open-world Agents

arXiv.org Artificial Intelligence

Developing autonomous agents capable of mastering complex, multi-step tasks in unpredictable, interactive environments presents a significant challenge. While Large Language Models (LLMs) offer promise for planning, existing approaches often rely on problematic internal knowledge or make unrealistic environmental assumptions. Although recent work explores learning planning knowledge, they still retain limitations due to partial reliance on external knowledge or impractical setups. Indeed, prior research has largely overlooked developing agents capable of acquiring planning knowledge from scratch, directly in realistic settings. While realizing this capability is necessary, it presents significant challenges, primarily achieving robustness given the substantial risk of incorporating LLMs' inaccurate knowledge. Moreover, efficiency is crucial for practicality as learning can demand prohibitive exploration. In response, we introduce Robust and Efficient Planning for Open-world Agents (REPOA), a novel framework designed to tackle these issues. REPOA features three key components: adaptive dependency learning and fine-grained failure-aware operation memory to enhance robustness to knowledge inaccuracies, and difficulty-based exploration to improve learning efficiency. Our evaluation in two established open-world testbeds demonstrates REPOA's robust and efficient planning, showcasing its capability to successfully obtain challenging late-game items that were beyond the reach of prior approaches.


Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization

arXiv.org Artificial Intelligence

Abstract--Despite over a decade of development, autonomous driving trajectory planning in complex urban environments continues to encounter significant challenges. These challenges include the difficulty in accommodating the multi-modal nature of trajectories, the limitations of the single expert model in managing diverse scenarios, and insufficient consideration of environmental interactions. T o address these issues, this paper introduces the EMoE-Planner, which incorporates three innovative approaches. Firstly, the Explicit MoE (Mixture of Experts) dynamically selects specialized experts based on scenario-specific information through a shared scene router . Secondly, the planner utilizes scene-specific queries to provide multi-modal priors, directing the model's focus towards relevant target areas. Lastly, it enhances the prediction model and loss calculation by considering the interactions between the ego vehicle and other agents, thereby significantly boosting planning performance. Comparative experiments were conducted on the Nuplan dataset against the state-of-the-art methods. The simulation results demonstrate that our model consistently outperforms SOT A models across nearly all test scenarios. Our model is the first pure learning model to achieve performance surpassing rule-based algorithms in almost all Nuplan closed-loop simulations. UTONOMOUS driving trajectory planning has evolved over decades, with rule-based methods [1]-[3] providing fundamental safety assurances via predefined logic and heuristics. However, in complex urban settings, three significant limitations become apparent: (1) The manual construction of rules struggles to accommodate dynamic interactions and abrupt changes in road topology, resulting in unaddressed long-tail scenarios; (2) Rigid trajectory generation fails to mimic the adaptive behaviors of human drivers, such as dynamically adjusting following distances; (3) An exponential increase in maintenance costs arises from the "combinatorial explosion" of accumulating rules. Conversely, data-driven approaches, including imitation learning [4]-[6], address edge cases like extreme weather and complex traffic, capturing human-like driving behaviors from expert data. Reinforcement learning [7], [8] enables dynamic optimization through advanced reward mechanisms. These systems offer lower costs and faster iterations compared to rule-based alternatives.


VLM-RRT: Vision Language Model Guided RRT Search for Autonomous UAV Navigation

arXiv.org Artificial Intelligence

-- Path planning is a fundamental capability of autonomous Unmanned Aerial V ehicles (UA Vs), enabling them to efficiently navigate toward a target region or explore complex environments while avoiding obstacles. Traditional path-planning methods, such as Rapidly-exploring Random Trees (RRT), have proven effective but often encounter significant challenges. These include high search space complexity, sub-optimal path quality, and slow convergence, issues that are particularly problematic in high-stakes applications like disaster response, where rapid and efficient planning is critical. T o address these limitations and enhance path-planning efficiency, we propose Vision Language Model RRT (VLM-RRT), a hybrid approach that integrates the pattern recognition capabilities of Vision Language Models (VLMs) with the path-planning strengths of RRT . By leveraging VLMs to provide initial directional guidance based on environmental snapshots, our method biases sampling toward regions more likely to contain feasible paths, significantly improving sampling efficiency and path quality. Extensive quantitative and qualitative experiments with various state-of-the-art VLMs demonstrate the effectiveness of this proposed approach. As Unmanned Aerial V ehicles (UA Vs) operate in increasingly dynamic and complex environments, the demand for reliable navigation [1], including efficient and adaptive path-planning strategies [2], has grown significantly.


Make Planning Research Rigorous Again!

arXiv.org Artificial Intelligence

In over sixty years since its inception, the field of planning has made significant contributions to both the theory and practice of building planning software that can solve a never-before-seen planning problem. This was done through established practices of rigorous design and evaluation of planning systems. It is our position that this rigor should be applied to the current trend of work on planning with large language models. One way to do so is by correctly incorporating the insights, tools, and data from the automated planning community into the design and evaluation of LLM-based planners. The experience and expertise of the planning community are not just important from a historical perspective; the lessons learned could play a crucial role in accelerating the development of LLM-based planners. This position is particularly important in light of the abundance of recent works that replicate and propagate the same pitfalls that the planning community has encountered and learned from. We believe that avoiding such known pitfalls will contribute greatly to the progress in building LLM-based planners and to planning in general.


From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications

arXiv.org Artificial Intelligence

With the advent of 6G communications, intelligent communication systems face multiple challenges, including constrained perception and response capabilities, limited scalability, and low adaptability in dynamic environments. This tutorial provides a systematic introduction to the principles, design, and applications of Large Artificial Intelligence Models (LAMs) and Agentic AI technologies in intelligent communication systems, aiming to offer researchers a comprehensive overview of cutting-edge technologies and practical guidance. First, we outline the background of 6G communications, review the technological evolution from LAMs to Agentic AI, and clarify the tutorial's motivation and main contributions. Subsequently, we present a comprehensive review of the key components required for constructing LAMs. We further categorize LAMs and analyze their applicability, covering Large Language Models (LLMs), Large Vision Models (LVMs), Large Multimodal Models (LMMs), Large Reasoning Models (LRMs), and lightweight LAMs. Next, we propose a LAM-centric design paradigm tailored for communications, encompassing dataset construction and both internal and external learning approaches. Building upon this, we develop an LAM-based Agentic AI system for intelligent communications, clarifying its core components such as planners, knowledge bases, tools, and memory modules, as well as its interaction mechanisms. We also introduce a multi-agent framework with data retrieval, collaborative planning, and reflective evaluation for 6G. Subsequently, we provide a detailed overview of the applications of LAMs and Agentic AI in communication scenarios. Finally, we summarize the research challenges and future directions in current studies, aiming to support the development of efficient, secure, and sustainable next-generation intelligent communication systems.


Enhanced SIRRT*: A Structure-Aware RRT* for 2D Path Planning with Hybrid Smoothing and Bidirectional Rewiring

arXiv.org Artificial Intelligence

-- Sampling-based motion planners such as Rapidly-exploring Random Tree* (RRT*) and its informed variant IRRT* are widely used for optimal path planning in complex environments. However, these methods often suffer from slow convergence and high variance due to their reliance on random sampling, particularly when initial solution discovery is delayed. This paper presents Enhanced SIRRT* (E-SIRRT*), a structure-aware planner that improves upon the original SIRRT* framework by introducing two key enhancements: hybrid path smoothing and bidirectional rewiring. Hybrid path smoothing refines the initial path through spline fitting and collision-aware correction, while bidirectional rewiring locally optimizes tree connectivity around the smoothed path to improve cost propagation. Experimental results demonstrate that E-SIRRT* consistently outperforms IRRT* and SIRRT* in terms of initial path quality, convergence rate, and robustness across 100 trials. Unlike IRRT*, which exhibits high variability due to stochastic initialization, E-SIRRT* achieves repeatable and efficient performance through deterministic skeleton-based initialization and structural refinement.


FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation

arXiv.org Artificial Intelligence

Path planning is a critical component in autonomous drone operations, enabling safe and efficient navigation through complex environments. Recent advances in foundation models, particularly large language models (LLMs) and vision-language models (VLMs), have opened new opportunities for enhanced perception and intelligent decision-making in robotics. However, their practical applicability and effectiveness in global path planning remain relatively unexplored. This paper proposes foundation model-guided path planners (FM-Planner) and presents a comprehensive benchmarking study and practical validation for drone path planning. Specifically, we first systematically evaluate eight representative LLM and VLM approaches using standardized simulation scenarios. To enable effective real-time navigation, we then design an integrated LLM-Vision planner that combines semantic reasoning with visual perception. Furthermore, we deploy and validate the proposed path planner through real-world experiments under multiple configurations. Our findings provide valuable insights into the strengths, limitations, and feasibility of deploying foundation models in real-world drone applications and providing practical implementations in autonomous flight. Project site: https://github.com/NTU-ICG/FM-Planner.


Efficient Robotic Policy Learning via Latent Space Backward Planning

arXiv.org Artificial Intelligence

Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grained subgoals partially alleviates efficiency issues. However, their forward planning schemes can still result in off-task predictions due to accumulation errors, leading to misalignment with long-term goals. This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a Latent Space Backward Planning scheme (LBP), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. The grounded final goal enables backward subgoal planning to always remain aware of task completion, facilitating on-task prediction along the entire planning horizon. The subgoal-conditioned policy incorporates a learnable token to summarize the subgoal sequences and determines how each subgoal guides action extraction. Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. Project Page: https://lbp-authors.github.io