Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
–Neural Information Processing Systems
However, strikingly, this measure is independent of rewards and is a function of only the transitions.
Neural Information Processing Systems
Aug-20-2025, 09:33:14 GMT