Stabilizing Value Iteration with and without Approximation Errors

May-15-2015–arXiv.org Machine Learning

Intelligent control using adaptive/approximate dynamic programming (ADP), sometimes referred to by reinforcement learning (RL) or neuro-dynamic programming (NDP), is a set of powerful tools for obtaining approximate solutions to difficult and mathematically intractable problems which seek optimum while sometimes even no knowledge of the system model/dynamics is available. The dramatic potential of the tools in practice has attracted many researchers within the last few decades, [1]- [13]. The multitude of appeared papers and success stories on applications of ADP to different problems, however, has intensified the need for firm mathematical analyses for guaranteeing the convergence of the learning processes and the stability of the results. Besides the classifications of heuristic dynamic programming (HDP), dual heuristic programming (DHP), etc. [7], which are in terms of the variables subject to approximation and their dependencies, the learning algorithms are typically based on either value iteration (VI) or policy iteration (PI), [3], [14]. These algorithms are well investigated both by computer scientists for machine learning [3] and by control scientists for feedback control of dynamical systems [14].

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

May-15-2015

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)
- Europe > United Kingdom
  - England (0.28)

Genre:
- Research Report (0.64)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.75)
  - Machine Learning > Reinforcement Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found