Approximate Solutions to Optimal Stopping Problems
Tsitsiklis, John N., Roy, Benjamin Van
–Neural Information Processing Systems
We propose and analyze an algorithm that approximates solutions to the problem of optimal stopping in a discounted irreducible aperiodic Markov chain. The scheme involves the use of linear combinations of fixed basis functions to approximate a Q-function. The weights of the linear combination are incrementally updated through an iterative process similar to Q-Iearning, involving simulation of the underlying Markov chain. Due to space limitations, we only provide an overview of a proof of convergence (with probability 1) and bounds on the approximation error. This is the first theoretical result that establishes the soundness of a Q-Iearninglike algorithm when combined with arbitrary linear function approximators to solve a sequential decision problem.
Neural Information Processing Systems
Dec-31-1997
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.15)
- Europe > United Kingdom