adaops
Adaptive Online Packing-guided Search for POMDPs
The partially observable Markov decision process (POMDP) provides a general framework for modeling an agent's decision process with state uncertainty, and online planning plays a pivotal role in solving it. A belief is a distribution of states representing state uncertainty. Methods for large-scale POMDP problems rely on the same idea of sampling both states and observations.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Appendix A for AdaOPS
According to Alg. 2, in each exploration, at least one leaf node will be expanded. Thus, we have the conclusion that AdaOPS is guaranteed to terminate. First, we will demonstrate that the value of any belief can be formulated as an integral. This lemma is a concentration inequality of self-normalized importance sampling estimator. The ESS threshold µ for adaptive resampling is set to .
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)