online pomdp planning
Online POMDP Planning with Anytime Deterministic Guarantees
Autonomous agents operating in real-world scenarios frequently encounter uncertainty and make decisions based on incomplete information. Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs). However, finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. In recent years, approximate algorithms, such as tree search and sample-based methodologies, have emerged as state-of-the-art POMDP solvers for larger problems. Despite their effectiveness, these algorithms offer only probabilistic and often asymptotic guarantees toward the optimal solution due to their dependence on sampling.
Sampling Networks and Aggregate Simulation for Online POMDP Planning
The paper introduces a new algorithm for planning in partially observable Markov decision processes (POMDP) based on the idea of aggregate simulation. The algorithm uses product distributions to approximate the belief state and shows how to build a representation graph of an approximate action-value function over belief space.
Online POMDP Planning with Anytime Deterministic Guarantees - Supplementary Anonymous Author(s) Affiliation Address email
W e restate some of the definitions from the paper for convenience. Moreover, depending on the algorithm implementation, the number of iterations can be finite (e.g. by After the allowable time steps ended, the simulation was reset to its initial state. The main goal is to tag as many opponents as possible within a given time frame. The states reflect the agent's current position and the opponents' positions. The Baby POMDP is a classic problem that represents the scenario of a baby and a caregiver. The states in this problem represent the baby's needs, which could be hunger, The observations are binary, either the baby is crying or not.
HyPlan: Hybrid Learning-Assisted Planning Under Uncertainty for Safe Autonomous Driving
Pfaffmann, Donald, Klusch, Matthias, Steinmetz, Marcel
Abstract-- We present a novel hybrid learning-assisted planning method, named HyPlan, for solving the collision-free navigation problem for self-driving cars in partially observable traffic environments. HyPlan combines methods for multi-agent behavior prediction, deep reinforcement learning with proximal policy optimization and approximated online POMDP planning with heuristic confidence-based vertical pruning to reduce its execution time without compromising safety of driving. Our experimental performance analysis on the CARLA-CTS2 benchmark of critical traffic scenarios with pedestrians revealed that HyPlan may navigate safer than selected relevant baselines and perform significantly faster than considered alternative online POMDP planners. In general, the collision-free navigation (CFN) problem for a self-driving car is to minimize the time to a given goal while avoiding collisions with other objects such as other cars, pedestrians and cyclists in a partially observable traffic environment. This constrained optimization problem can be modeled as a POMDP and solved online by the autonomous vehicle (A V). While hybrid neuro-explicit planning methods such as LEADER [9], LFGnav [28] and HyLEAP [27] may navigate reasonably safe with provably correct action planning under uncertainty, they still suffer from significantly slower execution times compared to deep learning-based methods.
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
DESPOT: Online POMDP Planning with Regularization
POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the "curse of dimensionality" and the "curse of history". This paper presents an online lookahead search algorithm that alleviates these difficulties by limiting the search to a set of sampled scenarios. The execution of all policies on the sampled scenarios is summarized using a Determinized Sparse Partially Observable Tree (DESPOT), which is a sparsely sampled belief tree. Our algorithm, named Regularized DESPOT (R-DESPOT), searches the DESPOT for a policy that optimally balances the size of the policy and the accuracy on its value estimate obtained through sampling. We give an output-sensitive performance bound for all policies derived from the DESPOT, and show that R-DESPOT works well if a small optimal policy exists. We also give an anytime approximation to R-DESPOT. Experiments show strong results, compared with two of the fastest online POMDP algorithms.
Reviews: Sampling Networks and Aggregate Simulation for Online POMDP Planning
Author feedback: I thank the authors for the feedback. The feedback was of high quality and satisfied my concerns. I suggest that, perhaps a compressed version, of "Explaining limitations of our work" from the author feedback, which I enjoyed reading, will be added to the final version of the paper. The paper "Sampling Networks and Aggregate Simulation for Online POMDP Planning" proposes a new solution to computing policies for large POMDP problems that is based on factorizing the belief distribution using a mean field approximation during planning and execution and extending aggregate simulation to POMDPs. In short, the proposed POMDP planner projects factorized beliefs forward in time forming at the same time a computational graph and then computes gradients backwards in time over the graph to improve the policy.
Reviews: Sampling Networks and Aggregate Simulation for Online POMDP Planning
All reviewers appreciate a practical approach to tackle POMDP in large state and observation space with factorized belief and aggregated simulation. Reviewers had some concern regarding the limitation of the work by the factorization assumption, but these concerns are addressed in author feedback. Reviewers are particularly happy about the quality of the rebuttal and encourage authors to incorporate the discussion of limitation of the algorithm in final draft.
Online POMDP Planning with Anytime Deterministic Guarantees
Autonomous agents operating in real-world scenarios frequently encounter uncertainty and make decisions based on incomplete information. Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs). However, finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. In recent years, approximate algorithms, such as tree search and sample-based methodologies, have emerged as state-of-the-art POMDP solvers for larger problems. Despite their effectiveness, these algorithms offer only probabilistic and often asymptotic guarantees toward the optimal solution due to their dependence on sampling.
Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling
Liang, Yuanchu, Kim, Edward, Thomason, Wil, Kingston, Zachary, Kurniawati, Hanna, Kavraki, Lydia E.
Partially Observable Markov Decision Processes (POMDPs) are a general and principled framework for motion planning under uncertainty. Despite tremendous improvement in the scalability of POMDP solvers, long-horizon POMDPs (e.g., $\geq15$ steps) remain difficult to solve. This paper proposes a new approximate online POMDP solver, called Reference-Based Online POMDP Planning via Rapid State Space Sampling (ROP-RaS3). ROP-RaS3 uses novel extremely fast sampling-based motion planning techniques to sample the state space and generate a diverse set of macro actions online which are then used to bias belief-space sampling and infer high-quality policies without requiring exhaustive enumeration of the action space -- a fundamental constraint for modern online POMDP solvers. ROP-RaS3 is evaluated on various long-horizon POMDPs, including on a problem with a planning horizon of more than 100 steps and a problem with a 15-dimensional state space that requires more than 20 look ahead steps. In all of these problems, ROP-RaS3 substantially outperforms other state-of-the-art methods by up to multiple folds.
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- North America > United States > Texas > Harris County > Houston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)