Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor
–Neural Information Processing Systems
We introduce the Blackwell discount factor for Markov Decision Processes (MDPs). Classical objectives for MDPs include discounted, average, and Blackwell optimality. Many existing approaches to computing average-optimal policies solve for discount-optimal policies with a discount factor close to 1, but they only work under strong or hard-to-verify assumptions on the MDP structure such as unichain or ergodicity. We are the first to highlight the shortcomings of the classical definition of Blackwell optimality, which does not lead to simple algorithms for computing Blackwell-optimal policies and overlooks the pathological behaviors of optimal policies as regards the discount factors. To resolve this issue, in this paper, we show that when the discount factor is larger than the Blackwell discount factor \gamma_{\sf bw}, all discount-optimal policies become Blackwell- and average-optimal, and we derive a general upper bound on \gamma_{\sf bw} .
Neural Information Processing Systems
Jan-19-2025, 18:11:40 GMT
- Technology: