Stabilizing Value Iteration with and without Approximation Errors

Heydari, Ali

arXiv.org Machine Learning 

Intelligent control using adaptive/approximate dynamic programming (ADP), sometimes referred to by reinforcement learning (RL) or neuro-dynamic programming (NDP), is a set of powerful tools for obtaining approximate solutions to difficult and mathematically intractable problems which seek optimum while sometimes even no knowledge of the system model/dynamics is available. The dramatic potential of the tools in practice has attracted many researchers within the last few decades, [1]- [13]. The multitude of appeared papers and success stories on applications of ADP to different problems, however, has intensified the need for firm mathematical analyses for guaranteeing the convergence of the learning processes and the stability of the results. Besides the classifications of heuristic dynamic programming (HDP), dual heuristic programming (DHP), etc. [7], which are in terms of the variables subject to approximation and their dependencies, the learning algorithms are typically based on either value iteration (VI) or policy iteration (PI), [3], [14]. These algorithms are well investigated both by computer scientists for machine learning [3] and by control scientists for feedback control of dynamical systems [14].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found