Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

Jan-19-2025, 18:11:40 GMT–Neural Information Processing Systems

We introduce the Blackwell discount factor for Markov Decision Processes (MDPs). Classical objectives for MDPs include discounted, average, and Blackwell optimality. Many existing approaches to computing average-optimal policies solve for discount-optimal policies with a discount factor close to 1, but they only work under strong or hard-to-verify assumptions on the MDP structure such as unichain or ergodicity. We are the first to highlight the shortcomings of the classical definition of Blackwell optimality, which does not lead to simple algorithms for computing Blackwell-optimal policies and overlooks the pathological behaviors of optimal policies as regards the discount factors. To resolve this issue, in this paper, we show that when the discount factor is larger than the Blackwell discount factor \gamma_{\sf bw}, all discount-optimal policies become Blackwell- and average-optimal, and we derive a general upper bound on \gamma_{\sf bw} .

blackwell and average optimality, blackwell discount factor, blackwell optimality, (5 more...)

Neural Information Processing Systems

Jan-19-2025, 18:11:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.42)