Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Karmakar, Prasenjit, Bhatnagar, Shalabh

arXiv.org Artificial Intelligence 

Stochastic approximation algorithms are sequential nonparametric methods for finding a zero or minimum of a function in the situation where only the noisy observations of the function values are available. Two timescale stochastic approximation algorithms represent one of the most general subclasses of stochastic approximation methods. These algorithms consist of two coupled recursions which are updated with different (one is considerably smaller than the other) step sizes which in turn facilitate convergence for such algorithms. Two timescale stochastic approximation algorithms [19] have successfully been applied to several complex problems arising in the areas of reinforcement learning, signal processing and admission control in communication networks. There are many reinforcement learning applications (precisely those where parameterization of value function is implemented) where non-additive Markov noise is present in one or both iterates thus requiring the current two timescale framework to be extended to include Markov noise (for example, in [13, p. 5] it is mentioned that in order to generalize the analysis to Markov noise, the theory of two timescale stochastic approximation needs to include the latter).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found