Goto

Collaborating Authors

 Planning & Scheduling


Time-Optimized Safe Navigation in Unstructured Environments through Learning Based Depth Completion

arXiv.org Artificial Intelligence

Quadrotors hold significant promise for several applications such as agriculture, search and rescue, and infrastructure inspection. Achieving autonomous operation requires systems to navigate safely through complex and unfamiliar environments. This level of autonomy is particularly challenging due to the complexity of such environments and the need for real-time decision making especially for platforms constrained by size, weight, and power (SWaP), which limits flight time and precludes the use of bulky sensors like Light Detection and Ranging (LiDAR) for mapping. Furthermore, computing globally optimal, collision-free paths and translating them into time-optimized, safe trajectories in real time adds significant computational complexity. To address these challenges, we present a fully onboard, real-time navigation system that relies solely on lightweight onboard sensors. Our system constructs a dense 3D map of the environment using a novel visual depth estimation approach that fuses stereo and monocular learning-based depth, yielding longer-range, denser, and less noisy depth maps than conventional stereo methods. Building on this map, we introduce a novel planning and trajectory generation framework capable of rapidly computing time-optimal global trajectories. As the map is incrementally updated with new depth information, our system continuously refines the trajectory to maintain safety and optimality. Both our planner and trajectory generator outperforms state-of-the-art methods in terms of computational efficiency and guarantee obstacle-free trajectories. We validate our system through robust autonomous flight experiments in diverse indoor and outdoor environments, demonstrating its effectiveness for safe navigation in previously unknown settings.


Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning

arXiv.org Artificial Intelligence

Vision Language Models (VLMs) show strong potential for visual planning but struggle with precise spatial and long-horizon reasoning. In contrast, Planning Domain Definition Language (PDDL) planners excel at long-horizon formal planning, but cannot interpret visual inputs. Recent works combine these complementary advantages by enabling VLMs to turn visual planning problems into PDDL files for formal planning. However, while VLMs can generate PDDL problem files satisfactorily, they struggle to accurately generate the PDDL domain files, which describe all the planning rules. As a result, prior methods rely on human experts to predefine domain files or on constant environment access for refinement. We propose VLMFP, a Dual-VLM-guided framework that can autonomously generate both PDDL problem and domain files for formal visual planning. VLMFP introduces two VLMs to ensure reliable PDDL file generation: A SimVLM that simulates action consequences based on input rule descriptions, and a GenVLM that generates and iteratively refines PDDL files by comparing the PDDL and SimVLM execution results. VLMFP unleashes multiple levels of generalizability: The same generated PDDL domain file works for all the different instances under the same problem, and VLMs generalize to different problems with varied appearances and rules. We evaluate VLMFP with 6 grid-world domains and test its generalization to unseen instances, appearance, and game rules. On average, SimVLM accurately describes 95.5%, 82.6% of scenarios, simulates 85.5%, 87.8% of action sequence, and judges 82.4%, 85.6% goal reaching for seen and unseen appearances, respectively. With the guidance of SimVLM, VLMFP can generate PDDL files to reach 70.0%, 54.1% valid plans for unseen instances in seen and unseen appearances, respectively. Project page: https://sites.google.com/view/vlmfp.


Optimal Smooth Coverage Trajectory Planning for Quadrotors in Cluttered Environment

arXiv.org Artificial Intelligence

In recent years, with the rapid development of manufacturing industries, unmanned systems have found widespread applications across various fields. Among them, quadro-tors have been increasingly utilized in industrial applications such as aerial photography and surveying [1]. As electricity consumption continues to rise, the frequency of power grid maintenance has also increased. Given the high risks and costs associated with manual inspections, the importance of utilizing unmanned systems for autonomous power grid inspections has become increasingly evident [2], as shown in Fig 1. Substations, as critical components of the power grid system, play an essential role in ensuring seamless inspection across modules within the same facility or between different facilities. The units scheduled for inspection can be abstracted as a series of access points, with drones acting as agents tasked with visiting these points.


Metrics vs Surveys: Can Quantitative Measures Replace Human Surveys in Social Robot Navigation? A Correlation Analysis

arXiv.org Artificial Intelligence

Abstract-- Social, also called human-aware, navigation is a key challenge for the integration of mobile robots into human environments. The evaluation of such systems is complex, as factors such as comfort, safety, and legibility must be considered. Human-centered assessments, typically conducted through surveys, provide reliable insights but are costly, resource-intensive, and difficult to reproduce or compare across systems. Alternatively, numerical social navigation metrics are easy to compute and facilitate comparisons, yet the community lacks consensus on a standard set of metrics. This work explores the relationship between numerical metrics and human-centered evaluations to identify potential correlations. If specific quantitative measures align with human perceptions, they could serve as standardized evaluation tools, reducing the dependency on surveys. Our results indicate that while current metrics capture some aspects of robot navigation behavior, important subjective factors remain insufficiently represented and new metrics are necessary. Human-aware robot navigation is a key research area for integrating mobile robots into human environments [1], [2]. Beyond the classical challenges of path planning and obstacle avoidance, human-aware navigation must address qualitative aspects of social interaction, such as comfort, predictability, and personal space, which are difficult to capture with mathematical models [3], [4].


ERUPT: An Open Toolkit for Interfacing with Robot Motion Planners in Extended Reality

arXiv.org Artificial Intelligence

We propose the Extended Reality Universal Planning Toolkit (ERUPT), an extended reality (XR) system for interactive motion planning. Our system allows users to create and dynamically reconfigure environments while they plan robot paths. In immersive three-dimensional XR environments, users gain a greater spatial understanding. XR also unlocks a broader range of natural interaction capabilities, allowing users to grab and adjust objects in the environment similarly to the real world, rather than using a mouse and keyboard with the scene projected onto a two-dimensional computer screen. Our system integrates with MoveIt, a manipulation planning framework, allowing users to send motion planning requests and visualize the resulting robot paths in virtual or augmented reality. We provide a broad range of interaction modalities, allowing users to modify objects in the environment and interact with a virtual robot. Our system allows operators to visualize robot motions, ensuring desired behavior as it moves throughout the environment, without risk of collisions within a virtual space, and to then deploy planned paths on physical robots in the real world.


Orchestrating Human-AI Teams: The Manager Agent as a Unifying Research Challenge

arXiv.org Artificial Intelligence

While agentic AI has advanced in automating individual tasks, managing complex multi-agent workflows remains a challenging problem. This paper presents a research vision for autonomous agentic systems that orchestrate collaboration within dynamic human-AI teams. We propose the Autonomous Manager Agent as a core challenge: an agent that decomposes complex goals into task graphs, allocates tasks to human and AI workers, monitors progress, adapts to changing conditions, and maintains transparent stakeholder communication. We formalize workflow management as a Partially Observable Stochastic Game and identify four foundational challenges: (1) compositional reasoning for hierarchical decomposition, (2) multi-objective optimization under shifting preferences, (3) coordination and planning in ad hoc teams, and (4) governance and compliance by design. To advance this agenda, we release MA-Gym, an open-source simulation and evaluation framework for multi-agent workflow orchestration. Evaluating GPT-5-based Manager Agents across 20 workflows, we find they struggle to jointly optimize for goal completion, constraint adherence, and workflow runtime - underscoring workflow management as a difficult open problem. We conclude with organizational and ethical implications of autonomous management systems.



1e1d184167ca7676cf665225e236a3d2-Reviews.html

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper presented a method to enable the generation of robust plans with partially specified domain models. The motivation of this research topic is well stated. The main contribution of this work is the formalization of the notion of plan robustness with respect to an incomplete domain model. The paper is clearly written and should the general interest for the broad NIPS audience.


1579779b98ce9edb98dd85606f2c119d-Reviews.html

Neural Information Processing Systems

"NIPS 2013 Neural Information Processing Systems December 5 - 10, Lake Tahoe, Nevada, USA",,, "Paper ID:","1046" "Title:","Convergence of Monte Carlo Tree Search in Simultaneous Move Games" Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper studies Monte Carlo tree search in zero-sum extensive form games with perfect information and simultaneous moves. It is proved that the MCTS algorithm converges to an approximate Nash equilibrium under certain conditions. Empirical study confirms the formal result. The detailed comments are as follows. The result is useful and the presentation is clear.