Goto

Collaborating Authors

 macro-action policy


Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information Genevieve Flaspohler 1,2, Nicholas Roy 1, and John W. Fisher III 1 Massachusetts Intitute of Technology 1

Neural Information Processing Systems

Finally, Section D provides additional visualizations and discussion of experimental results. The following section contains the detailed derivation of Lemmas 5.3-5.5 presented in the main text. S22 and S25 follow by the triangle inequality, Eq. We can bound the final term in Eq. Plugging this expression in to Eq. S23, we have the recursion: Despite it's continuous nature, the value function for any discrete, finite horizon POMDP can be This property can be observed directly from Eq. 2 when integration is replaced by summation.