Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs
Mustafin, Arsenii, Sheng, Xinyi, Baumann, Dominik
–arXiv.org Artificial Intelligence
The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.
arXiv.org Artificial Intelligence
Oct-29-2025