Goto

Collaborating Authors

 pomcp


POrTAL: Plan-Orchestrated Tree Assembly for Lookahead

Conway, Evan, Porfirio, David, Chan, David, Roberts, Mark, Hiatt, Laura M.

arXiv.org Artificial Intelligence

Abstract-- Assigning tasks to robots often involves supplying the robot with an overarching goal, such as through natural language, and then relying on the robot to uncover and execute a plan to achieve that goal. In many settings common to human-robot interaction, however, the world is only partially observable to the robot, requiring that it create plans under uncertainty. Although many probabilistic planning algorithms exist for this purpose, these algorithms can be inefficient if executed with the robot's limited computational resources, or may require more steps than expected to achieve the goal. We thereby created a new, lightweight, probabilistic planning algorithm, Plan-Orchestrated Tree Assembly for Lookahead (POrTAL), that combines the strengths of two baseline planning algorithms, FF-Replan and POMCP . In a series of case studies, we demonstrate POrTAL's ability to quickly arrive at solutions that outperform these baselines in terms of number of steps. We additionally demonstrate how POrTAL performs under varying temporal constraints. The ability of modern robots to respond to arbitrary user requests has advanced considerably in recent years. This advancement is in large part due to robots' ability to autonomously plan their own actions. When receiving a goal such as "bring me a cup of coffee," for example, a robot can calculate the minimum number of steps required to achieve this goal: obtain the coffee grinds, proceeding to the coffee maker, load the grinds, and so on. In many scenarios common to human-robot interaction, however, this planning must be performed under considerable uncertainty.




Joint Multi-Target Detection-Tracking in Cognitive Massive MIMO Radar via POMCP

Bouhou, Imad, Fortunati, Stefano, Gharsalli, Leila, Renaux, Alexandre

arXiv.org Artificial Intelligence

This correspondence presents a power-aware cognitive radar framework for joint detection and tracking of multiple targets in a massive multiple-input multiple-output (MIMO) radar environment. Building on a previous single-target algorithm based on Partially Observable Monte Carlo Planning (POMCP), we extend it to the multi-target case by assigning each target an independent POMCP tree, enabling scalable and efficient planning. Departing from uniform power allocation, which is often suboptimal with varying signal-to-noise ratios (SNRs), our approach predicts each target's future angular position and expected received power based on its expected range. These predictions guide adaptive waveform design via a constrained optimization problem that allocates transmit energy to enhance the detectability of weaker or distant targets, while ensuring sufficient power for high-SNR targets. Simulations involving multiple targets with different SNRs confirm the effectiveness of our method. The proposed framework for the cognitive radar improves detection probability for low-SNR targets and achieves more accurate tracking compared to approaches using uniform or orthogonal waveforms. These results demonstrate the potential of the POMCP-based framework for adaptive, efficient multi-target radar systems.


Assistive Decision-Making for Right of Way Navigation at Uncontrolled Intersections

Tiwari, Navya, Vazhaeparampil, Joseph, Preston, Victoria

arXiv.org Artificial Intelligence

Uncontrolled intersections account for a significant fraction of roadway crashes due to ambiguous right-of-way rules, occlusions, and unpredictable driver behavior. While autonomous vehicle research has explored uncertainty-aware decision making, few systems exist to retrofit human-operated vehicles with assistive navigation support. We present a driver-assist framework for right-of-way reasoning at uncontrolled intersections, formulated as a Partially Observable Markov Decision Process (POMDP). Using a custom simulation testbed with stochastic traffic agents, pedestrians, occlusions, and adversarial scenarios, we evaluate four decision-making approaches: a deterministic finite state machine (FSM), and three probabilistic planners: QMDP, POMCP, and DESPOT. Results show that probabilistic planners outperform the rule-based baseline, achieving up to 97.5 percent collision-free navigation under partial observability, with POMCP prioritizing safety and DESPOT balancing efficiency and runtime feasibility. Our findings highlight the importance of uncertainty-aware planning for driver assistance and motivate future integration of sensor fusion and environment perception modules for real-time deployment in realistic traffic environments.


Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time

Veronese, Celeste, Meli, Daniele, Farinelli, Alessandro

arXiv.org Artificial Intelligence

Most popular and effective approaches to online solving Partially Observable Markov Decision Processes (POMDPs, Kaelbling et al. (1998)), e.g., Partially Observable Monte Carlo Planning (POMCP) by Silver and Veness (2010) and Determinized Sparse Partially Observable Tree (DESPOT) by Ye et al. (2017), rely on Monte Carlo Tree Search (MCTS). These approaches are based on online simulations performed in a simulation environment (i.e. a black-box twin of the real POMDP environment) and estimate the value of actions. However, they require domain-specific policy heuristics, suggesting best actions at each state, for efficient exploration. Macro-actions (He et al. (2011); Bertolucci et al. (2021)) are popular policy heuristics that are particularly efficient for long planning horizons. A macro-action is essentially a sequence of suggested actions from a given state that can effectively guide the simulation phase towards actions with high utilities. However, such heuristics are heavily dependent on domain features and are typically handcrafted for each specific domain. Defining these heuristics is an arduous process that requires significant domain knowledge, especially in complex domains. An alternative approach, like the one by Cai and Hsu (2022), is to learn such heuristics via neural networks, which are, however, uninterpretable and data-inefficient. This paper extends the methodology proposed by Meli et al. (2024) to the learning, via Inductive Logic Programming (ILP, Muggleton (1991)), of Event Calculus (EC) theories C. Veronese, D. Meli & A. Farinelli.


Towards Intention Recognition for Robotic Assistants Through Online POMDP Planning

Saborio, Juan Carlos, Hertzberg, Joachim

arXiv.org Artificial Intelligence

Intention recognition, or the ability to anticipate the actions of another agent, plays a vital role in the design and development of automated assistants that can support humans in their daily tasks. In particular, industrial settings pose interesting challenges that include potential distractions for a decision-maker as well as noisy or incomplete observations. In such a setting, a robotic assistant tasked with helping and supporting a human worker must interleave information gathering actions with proactive tasks of its own, an approach that has been referred to as active goal recognition. In this paper we describe a partially observable model for online intention recognition, show some preliminary experimental results and discuss some of the challenges present in this family of problems.


Shrinking POMCP: A Framework for Real-Time UAV Search and Rescue

Zhang, Yunuo, Luo, Baiting, Mukhopadhyay, Ayan, Stojcsics, Daniel, Elenius, Daniel, Roy, Anirban, Jha, Susmit, Maroti, Miklos, Koutsoukos, Xenofon, Karsai, Gabor, Dubey, Abhishek

arXiv.org Artificial Intelligence

--Efficient path optimization for drones in search and rescue operations faces challenges, including limited visibility, time constraints, and complex information gathering in urban environments. We present a comprehensive approach to optimize UA V-based search and rescue operations in neighborhood areas, utilizing both a 3D AirSim-ROS2 simulator and a 2D simulator . The path planning problem is formulated as a partially observable Markov decision process (POMDP), and we propose a novel "Shrinking POMCP" approach to address time constraints. In the AirSim environment, we integrate our approach with a probabilistic world model for belief maintenance and a neu-rosymbolic navigator for obstacle avoidance. The 2D simulator employs surrogate ROS2 nodes with equivalent functionality. We compare trajectories generated by different approaches in the 2D simulator and evaluate performance across various belief types in the 3D AirSim-ROS simulator . Experimental results from both simulators demonstrate that our proposed shrinking POMCP solution achieves significant improvements in search times compared to alternative methods, showcasing its potential for enhancing the efficiency of UA V-assisted search and rescue operations. Search and rescue (SAR) operations are critical, time-sensitive missions conducted in challenging environments like neighborhoods, wilderness [1], or maritime settings [2]. These resource-intensive operations require efficient path planning and optimal routing [3]. In recent years, Unmanned Aerial V ehicles (UA Vs) have become valuable SAR assets, offering advantages such as rapid deployment, extended flight times, and access to hard-to-reach areas. Equipped with sensors and cameras, UA Vs can detect heat signatures, identify objects, and provide real-time aerial imagery to search teams [4]. However, the use of UA Vs in SAR operations presents unique challenges, particularly in path planning and decision-making under uncertainty. Factors such as limited battery life, changing weather conditions, and incomplete information about the search area complicate the task of efficiently coordinating UA V movements to maximize the probability of locating targets [3].


POMDP-Driven Cognitive Massive MIMO Radar: Joint Target Detection-Tracking In Unknown Disturbances

Bouhou, Imad, Fortunati, Stefano, Gharsalli, Leila, Renaux, Alexandre

arXiv.org Artificial Intelligence

The joint detection and tracking of a moving target embedded in an unknown disturbance represents a key feature that motivates the development of the cognitive radar paradigm. Building upon recent advancements in robust target detection with multiple-input multiple-output (MIMO) radars, this work explores the application of a Partially Observable Markov Decision Process (POMDP) framework to enhance the tracking and detection tasks in a statistically unknown environment. In the POMDP setup, the radar system is considered as an intelligent agent that continuously senses the surrounding environment, optimizing its actions to maximize the probability of detection $(P_D)$ and improve the target position and velocity estimation, all this while keeping a constant probability of false alarm $(P_{FA})$. The proposed approach employs an online algorithm that does not require any apriori knowledge of the noise statistics, and it relies on a much more general observation model than the traditional range-azimuth-elevation model employed by conventional tracking algorithms. Simulation results clearly show substantial performance improvement of the POMDP-based algorithm compared to the State-Action-Reward-State-Action (SARSA)-based one that has been recently investigated in the context of massive MIMO (MMIMO) radar systems.


Increasing the Value of Information During Planning in Uncertain Environments

Pokharel, Gaurab

arXiv.org Artificial Intelligence

However, on an important set of problems where there is a large time delay between when the agent can gather information and when it needs to use that information, these solutions fail to adequately consider the value of information. As a result, information gathering actions, even when they are critical in the optimal policy, will be ignored by existing solutions, leading to sub-optimal decisions by the agent. In this research, we develop a novel solution that rectifies this problem by introducing a new algorithm that improves upon state-of-the-art online planning by better reflecting on the value of actions that gather information. We do this by adding Entropy to the UCB1 heuristic in the POMCP algorithm. We test this solution on the hallway problem. Results indicate that our new algorithm performs significantly better than POMCP. We as humans instinctively gather information or ask clarifying questions when faced with task completion in uncertain situations. We know to do this because, even though we are delaying the task at hand, it is ultimately in our favour to work with complete information. Ideally, online planning algorithms like POMCP [10], whose sole job is to make plans for agents acting in uncertain situations, know to do the same. They would be able to strategically pick actions that will provide the information to best guide the agent's decision making. However, unlike humans, who can easily correlate information gain with the ease of task accomplishment, these algorithms cannot.