Reinforcement learning is supervised learning on optimized data

Nov-5-2020, 09:52:00 GMT–AIHub

The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming. Methods that compute the gradients of the non-differentiable expected reward objective, such as the REINFORCE trick are commonly grouped into the optimization perspective, whereas methods that employ TD-learning or Q-learning are dynamic programming methods. While these methods have shown considerable success in recent years, these methods are still quite challenging to apply to new problems. In contrast deep supervised learning has been extremely successful and we may hence ask: Can we use supervised learning to perform RL? In this blog post we discuss a mental model for RL, based on the idea that RL can be viewed as doing supervised learning on the "good data".

data distribution, learning, supervised learning, (14 more...)

AIHub

Nov-5-2020, 09:52:00 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found