Particle Filter-based Policy Gradient in POMDPs

Coquelin, Pierre-arnaud, Deguest, Romain, Munos, Rémi

Feb-15-2020, 01:28:11 GMT–Neural Information Processing Systems

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure.

particle filter-based policy gradient, pomdp

Neural Information Processing Systems

Feb-15-2020, 01:28:11 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)