Planning & Scheduling
Planning with affordances: Integrating learned affordance models and symbolic planning
Intelligent agents working in real-world environments must be able to learn about the environment and its capabilities which enable them to take actions to change to the state of the world to complete a complex multi-step task in a photorealistic environment. Learning about the environment is especially important to perform various multiple-step tasks without having to redefine an agent's action set for different tasks or environment settings. In our work, we augment an existing task and motion planning framework with learned affordance models of objects in the world to enable planning and executing multi-step tasks using learned models. Each task can be seen as changing the current state of the world to a given goal state. The affordance models provide us with what actions are possible and how to perform those actions in any given state. A symbolic planning algorithm uses this information and the starting and goal state to create a feasible plan to reach the desired goal state to complete a given task. We demonstrate our approach in a virtual 3D photorealistic environment, AI2-Thor, and evaluate it on real-world tasks. Our results show that our agent quickly learns how to interact with the environment and is well prepared to perform tasks such as "Moving an object out of the way to reach the desired location." In real-world environments, the ability to come up with multi-step plans for a particular task is an important skill for an intelligent agent. Moreover, it is equally important that the agent be able to interact with the environment to execute this plan. For example, for efficient navigation of an environment, the agent must generate multi-step plans, including navigating through different rooms while clearing out any objects blocking the path. However, the agent must also be able to interact with the environment correctly for actions such as opening the door or picking up an object that is blocking the way.
DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
Sharma, Shashank, Hoffmann, Janina, Namboodiri, Vinay
In this paper, we address the challenge of long-horizon visual planning tasks using Hierarchical Reinforcement Learning (HRL). Our key contribution is a Discrete Hierarchical Planning (DHP) method, an alternative to traditional distance-based approaches. We provide theoretical foundations for the method and demonstrate its effectiveness through extensive empirical evaluations. Our agent recursively predicts subgoals in the context of a long-term goal and receives discrete rewards for constructing plans as compositions of abstract actions. The method introduces a novel advantage estimation strategy for tree trajectories, which inherently encourages shorter plans and enables generalization beyond the maximum tree depth. The learned policy function allows the agent to plan efficiently, requiring only $\log N$ computational steps, making re-planning highly efficient. The agent, based on a soft-actor critic (SAC) framework, is trained using on-policy imagination data. Additionally, we propose a novel exploration strategy that enables the agent to generate relevant training examples for the planning modules. We evaluate our method on long-horizon visual planning tasks in a 25-room environment, where it significantly outperforms previous benchmarks at success rate and average episode length. Furthermore, an ablation study highlights the individual contributions of key modules to the overall performance.
Rethinking Energy Management for Autonomous Ground Robots on a Budget
Chavan, Akshar, Joshi, Rudra, Brocanelli, Marco
Autonomous Ground Robots (AGRs) face significant challenges due to limited energy reserve, which restricts their overall performance and availability. Prior research has focused separately on energy-efficient approaches and fleet management strategies for task allocation to extend operational time. A fleet-level scheduler, however, assumes a specific energy consumption during task allocation, requiring the AGR to fully utilize the energy for maximum performance, which contrasts with energy-efficient practices. This paper addresses this gap by investigating the combined impact of computing frequency and locomotion speed on energy consumption and performance. We analyze these variables through experiments on our prototype AGR, laying the foundation for an integrated approach that optimizes cyber-physical resources within the constraints of a specified energy budget. To tackle this challenge, we introduce PECC (Predictable Energy Consumption Controller), a framework designed to optimize computing frequency and locomotion speed to maximize performance while ensuring the system operates within the specified energy budget. We conducted extensive experiments with PECC using a real AGR and in simulations, comparing it to an energy-efficient baseline. Our results show that the AGR travels up to 17\% faster than the baseline in real-world tests and up to 31\% faster in simulations, while consuming 95\% and 91\% of the given energy budget, respectively. These results prove that PECC can effectively enhance AGR performance in scenarios where prioritizing the energy budget outweighs the need for energy efficiency.
Wake-Informed 3D Path Planning for Autonomous Underwater Vehicles Using A* and Neural Network Approximations
Cooper-Baldock, Zachary, Turnock, Stephen, Sammut, Karl
Autonomous Underwater Vehicles (AUVs) encounter significant energy, control and navigation challenges in complex underwater environments, particularly during close-proximity operations, such as launch and recovery (LAR), where fluid interactions and wake effects present additional navigational and energy challenges. Traditional path planning methods fail to incorporate these detailed wake structures, resulting in increased energy consumption, reduced control stability, and heightened safety risks. This paper presents a novel wake-informed, 3D path planning approach that fully integrates localized wake effects and global currents into the planning algorithm. Two variants of the A* algorithm - a current-informed planner and a wake-informed planner - are created to assess its validity and two neural network models are then trained to approximate these planners for real-time applications. Both the A* planners and NN models are evaluated using important metrics such as energy expenditure, path length, and encounters with high-velocity and turbulent regions. The results demonstrate a wake-informed A* planner consistently achieves the lowest energy expenditure and minimizes encounters with high-velocity regions, reducing energy consumption by up to 11.3%. The neural network models are observed to offer computational speedup of 6 orders of magnitude, but exhibit 4.51 - 19.79% higher energy expenditures and 9.81 - 24.38% less optimal paths. These findings underscore the importance of incorporating detailed wake structures into traditional path planning algorithms and the benefits of neural network approximations to enhance energy efficiency and operational safety for AUVs in complex 3D domains.
Learning Human Perception Dynamics for Informative Robot Communication
Chen, Shenghui, Zhao, Ruihan, Chinchali, Sandeep, Topcu, Ufuk
Human-robot cooperative navigation is challenging in environments with incomplete information. We introduce CoNav-Maze, a simulated robotics environment where a robot navigates using local perception while a human operator provides guidance based on an inaccurate map. The robot can share its camera views to improve the operator's understanding of the environment. To enable efficient human-robot cooperation, we propose Information Gain Monte Carlo Tree Search (IG-MCTS), an online planning algorithm that balances autonomous movement and informative communication. Central to IG-MCTS is a neural human perception dynamics model that estimates how humans distill information from robot communications. We collect a dataset through a crowdsourced mapping task in CoNav-Maze and train this model using a fully convolutional architecture with data augmentation. User studies show that IG-MCTS outperforms teleoperation and instruction-following baselines, achieving comparable task performance with significantly less communication and lower human cognitive load, as evidenced by eye-tracking metrics.
Synergistic Traffic Assignment
Blรคsius, Thomas, Feilhauer, Adrian, Jung, Markus, Laupichler, Moritz, Sanders, Peter, Zรผndorf, Michael
Traffic assignment analyzes traffic flows in road networks that emerge due to traveler interaction. Traditionally, travelers are assumed to use private cars, so road costs grow with the number of users due to congestion. However, in sustainable transit systems, travelers share vehicles s.t. more users on a road lead to higher sharing potential and reduced cost per user. Thus, we invert the usual avoidant traffic assignment (ATA) and instead consider synergistic traffic assignment (STA) where road costs decrease with use. We find that STA is significantly different from ATA from a game-theoretical point of view. We show that a simple iterative best-response method with simultaneous updates converges to an equilibrium state. This enables efficient computation of equilibria using optimized speedup techniques for shortest-path queries. In contrast, ATA requires slower sequential updates or more complicated iteration schemes that only approximate an equilibrium. Experiments with a realistic scenario for the city of Stuttgart indicate that STA indeed quickly converges to an equilibrium. We envision STA as a part of software-defined transportation systems that dynamically adapt to current travel demand. As a first demonstration, we show that an STA equilibrium can be used to incorporate traveler synergism in a simple bus line planning algorithm to potentially greatly reduce the required vehicle resources.
ConditionNET: Learning Preconditions and Effects for Execution Monitoring
Sliwowski, Daniel, Lee, Dongheui
The introduction of robots into everyday scenarios necessitates algorithms capable of monitoring the execution of tasks. In this paper, we propose ConditionNET, an approach for learning the preconditions and effects of actions in a fully data-driven manner. We develop an efficient vision-language model and introduce additional optimization objectives during training to optimize for consistent feature representations. ConditionNET explicitly models the dependencies between actions, preconditions, and effects, leading to improved performance. We evaluate our model on two robotic datasets, one of which we collected for this paper, containing 406 successful and 138 failed teleoperated demonstrations of a Franka Emika Panda robot performing tasks like pouring and cleaning the counter. We show in our experiments that ConditionNET outperforms all baselines on both anomaly detection and phase prediction tasks. Furthermore, we implement an action monitoring system on a real robot to demonstrate the practical applicability of the learned preconditions and effects. Our results highlight the potential of ConditionNET for enhancing the reliability and adaptability of robots in real-world environments. The data is available on the project website: https://dsliwowski1.github.io/ConditionNET_page.
Lipschitz Lifelong Monte Carlo Tree Search for Mastering Non-Stationary Tasks
Monte Carlo Tree Search (MCTS) has proven highly effective in solving complex planning tasks by balancing exploration and exploitation using Upper Confidence Bound for Trees (UCT). However, existing work have not considered MCTS-based lifelong planning, where an agent faces a non-stationary series of tasks -- e.g., with varying transition probabilities and rewards -- that are drawn sequentially throughout the operational lifetime. This paper presents LiZero for Lipschitz lifelong planning using MCTS. We propose a novel concept of adaptive UCT (aUCT) to transfer knowledge from a source task to the exploration/exploitation of a new task, depending on both the Lipschitz continuity between tasks and the confidence of knowledge in in Monte Carlo action sampling. We analyze LiZero's acceleration factor in terms of improved sampling efficiency and also develop efficient algorithms to compute aUCT in an online fashion by both data-driven and model-based approaches, whose sampling complexity and error bounds are also characterized. Experiment results show that LiZero significantly outperforms existing MCTS and lifelong learning baselines in terms of much faster convergence (3$\sim$4x) to optimal rewards. Our results highlight the potential of LiZero to advance decision-making and planning in dynamic real-world environments.
Doubly Robust Monte Carlo Tree Search
We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions. Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS. In Tic-Tac-Toe, DR-MCTS achieves an 88% win rate compared to a 10% win rate for standard MCTS. In compound VirtualHome tasks, DR-MCTS attains a 20.7% success rate versus 10.3% for standard MCTS. Our scaling analysis reveals that DR-MCTS exhibits better sample efficiency, notably outperforming standard MCTS with larger language models while using a smaller model. These results underscore DR-MCTS's potential for efficient decision-making in complex, real-world scenarios where sample efficiency is paramount.
Virtual airways heatmaps to optimize point of entry location in lung biopsy planning systems
Gil, Debora, Lloret, Pere, Diez-Ferrer, Marta, Sanchez, Carles
Purpose: We present a virtual model to optimize point of entry (POE) in lung biopsy planning systems. Our model allows to compute the quality of a biopsy sample taken from potential POE, taking into account the margin of error that arises from discrepancies between the orientation in the planning simulation and the actual orientation during the operation. Additionally, the study examines the impact of the characteristics of the lesion. Methods: The quality of the biopsy is given by a heatmap projected onto the skeleton of a patient-specific model of airways. The skeleton provides a 3D representation of airways structure, while the heatmap intensity represents the potential amount of tissue that it could be extracted from each POE. This amount of tissue is determined by the intersection of the lesion with a cone that represents the uncertainty area in the introduction of biopsy instruments. The cone, lesion, and skeleton are modelled as graphical objects that define a 3D scene of the intervention. Results: We have simulated different settings of the intervention scene from a single anatomy extracted from a CT scan and two lesions with regular and irregular shapes. The different scenarios are simulated by systematic rotation of each lesion placed at different distances from airways. Analysis of the heatmaps for the different settings show a strong impact of lesion orientation for irregular shape and the distance for both shapes. Conclusion: The proposed heatmaps help to visually assess the optimal POE and identify whether multiple optimal POEs exist in different zones of the bronchi. They also allow us to model the maximum allowable error in navigation systems and study which variables have the greatest influence on the success of the operation. Additionally, they help determine at what point this influence could potentially jeopardize the operation.