Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Harm Van Seijen, Mehdi Fatemi, Arash Tavakoli

Neural Information Processing Systems 

Weproveconvergence for this method under standard assumptions and demonstrate empirically that it indeed enables lowerdiscount factors forapproximate reinforcement-learning methods.