Entropy-regularized Point-based Value Iteration

Delecki, Harrison, Vazquez-Chanlatte, Marcell, Yel, Esen, Wray, Kyle, Arnon, Tomer, Witwicki, Stefan, Kochenderfer, Mykel J.

Feb-14-2024–arXiv.org Artificial Intelligence

Model-based planners for partially observable problems must accommodate both model uncertainty during planning and goal uncertainty during objective inference. However, model-based planners may be brittle under these types of uncertainty because they rely on an exact model and tend to commit to a single optimal behavior. Inspired by results in the model-free setting, we propose an entropy-regularized model-based planner for partially observable problems. Entropy regularization promotes policy robustness for planning and objective inference by encouraging policies to be no more committed to a single action than necessary. We evaluate the robustness and objective inference performance of entropy-regularized policies in three problem domains. Our results show that entropy-regularized policies outperform non-entropy-regularized baselines in terms of higher expected returns under modeling errors and higher accuracy during objective inference.

agent, alpha vector, pomdp, (12 more...)

arXiv.org Artificial Intelligence

Feb-14-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Clara County
  - Stanford (0.05)
  - Palo Alto (0.05)
  - Santa Clara (0.05)

Genre:
- Research Report > New Finding (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Belief Revision (0.71)
    - Agents (0.68)
  - Machine Learning
    - Performance Analysis > Accuracy (0.57)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.79)