[D]Why are non-linear approximators such as neural networks unstable for reinforcement learning

Oct-14-2020, 19:01:30 GMT–#artificialintelligence

As you know, in supervised learning it is important for the data to be iid. In RL the training data is sampled from the state space that the agent chooses to explore, which tends to be highly correlated to the agent's current preferences and a small subset of the total state space. Q learning selects the action with the highest expected reward. So if a1 has an expected reward of 0.49, and a2 has an expected reward of 0.51, a small parameter change can cause the agent to swap from picking a2 100% of the time to picking a1 100% of the time, causing a significant shift in the distribution of data being trained on. At a higher conceptual level, you can think of RL as supervised learning where instead of having clearly defined labels, you'guess' what the label is using a often times noisy reward signal, and the quality of your guess is based on how accurate your policy is.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

#artificialintelligence

Oct-14-2020, 19:01:30 GMT

News Web Page

Add feedback

Industry:
- Media > News (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found