Approximate Solutions to Optimal Stopping Problems
Tsitsiklis, John N., Roy, Benjamin Van
–Neural Information Processing Systems
We propose and analyze an algorithm that approximates solutions to the problem of optimal stopping in a discounted irreducible aperiodic Markovchain. The scheme involves the use of linear combinations offixed basis functions to approximate a Q-function. The weights of the linear combination are incrementally updated through an iterative process similar to Q-Iearning, involving simulation ofthe underlying Markov chain. Due to space limitations, we only provide an overview of a proof of convergence (with probability 1)and bounds on the approximation error. This is the first theoretical result that establishes the soundness of a Q-Iearninglike algorithmwhen combined with arbitrary linear function approximators tosolve a sequential decision problem.
Neural Information Processing Systems
Dec-31-1997
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.15)
- Europe > United Kingdom