Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
–Neural Information Processing Systems
This paper introduces a new complexity measure for MDPs called maximum expected hitting cost. Unlike the diameter measure is only a function of the transition dynamics, this new measure takes into account the reward dynamics as well. The authors show theoretically that under the same assumptions as previous authors who introduced diameter, this new measure is a tighter upper bound. Furthermore, they show the usefulness of this measure by showing that it can be used to better understand the informativeness of rewards when using potential based reward shaping and they prove theoretically that in a large class of MDPs potential based reward shaping is bounded by a multiplicative factor of 2 on their maximum expected hitting costs. I enjoyed reading this paper. I appreciated the structure that the authors used in this paper which first introduced all the necessary prior work (related to diameter) cosily but thoroughly enough before introducing their contributions.
Neural Information Processing Systems
Jun-1-2025, 23:54:00 GMT