Biasing Approximate Dynamic Programming with a Lower Discount Factor

Feb-15-2020, 02:58:22 GMT–Neural Information Processing Systems

Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. In fact, it is often used in problems with is no intrinsic motivation. In this paper, we show that when used in approximate dynamic programming, an artificially low discount factor may significantly improve the performance on some problems, such as Tetris. We propose two explanations for this phenomenon. Our first justification follows directly from the standard approximation error bounds: using a lower discount factor may decrease the approximation error bounds.

biasing approximate dynamic programming, discount factor, lower discount factor, (2 more...)

Neural Information Processing Systems

Feb-15-2020, 02:58:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.66)
  - Machine Learning > Reinforcement Learning (0.66)