Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Jun-1-2025, 23:54:00 GMT–Neural Information Processing Systems

This paper introduces a new complexity measure for MDPs called maximum expected hitting cost. Unlike the diameter measure is only a function of the transition dynamics, this new measure takes into account the reward dynamics as well. The authors show theoretically that under the same assumptions as previous authors who introduced diameter, this new measure is a tighter upper bound. Furthermore, they show the usefulness of this measure by showing that it can be used to better understand the informativeness of rewards when using potential based reward shaping and they prove theoretically that in a large class of MDPs potential based reward shaping is bounded by a multiplicative factor of 2 on their maximum expected hitting costs. I enjoyed reading this paper. I appreciated the structure that the authors used in this paper which first introduced all the necessary prior work (related to diameter) cosily but thoroughly enough before introducing their contributions.

contribution, diameter, markov decision process and informativeness, (8 more...)

Neural Information Processing Systems

Jun-1-2025, 23:54:00 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report > New Finding (0.52)

Technology:
- Information Technology
  - Decision Support Systems (0.40)
  - Artificial Intelligence > Machine Learning
    - Learning Graphical Models > Undirected Networks > Markov Models (0.40)