Biasing Approximate Dynamic Programming with a Lower Discount Factor
Petrik, Marek, Scherrer, Bruno
–Neural Information Processing Systems
Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. In fact, it is often used in problems with is no intrinsic motivation. In this paper, we show that when used in approximate dynamic programming, an artificially low discount factor may significantly improve the performance on some problems, such as Tetris. We propose two explanations for this phenomenon. Our first justification follows directly from the standard approximation error bounds: using a lower discount factor may decrease the approximation error bounds.
Neural Information Processing Systems
Feb-15-2020, 02:58:22 GMT
- Technology: