Temporal-difference learning for nonlinear value function approximation in the lazy training regime

Jun-5-2019–arXiv.org Machine Learning

In recent years, deep reinforcement learning has pushed the boundaries of Artificial Intelligence to an unprecedented level, achieving what was expected to be possible only in a decade and outperforming human intelligence in a number of highly complex tasks. Paramount examples of this potential have appeared over the past few years, with such algorithms mastering games and tasks of increasing complexity, from playing Atari to learning to walk and beating world grandmasters at the game of Go [16, 23, 24, 31-33]. Such impressive success would be impossible without using neural networks to approximate value functions and / or policy functions in reinforcement learning algorithms. While neural networks, in particular deep neural networks, provide a powerful and versatile tool to approximate high dimensional functions [4, 12, 17], their intrinsic nonlinearity might also lead to trouble in training, in particular in the context of reinforcement learning. For example, it is well known that nonlinear approximation to value function might cause divergence in the classical temporal-difference learning due to instability [40].

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Jun-5-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment > Games > Go (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.41)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found