Learning Policies For Learning Policies -- Meta Reinforcement Learning (RL²) in Tensorflow

#artificialintelligence 

Reinforcement Learning provides a framework for training agents to solve problems in the world. One of the limitations of these agents however is their inflexibility once trained. They are able to learn a policy to solve a specific problem (formalized as an MDP), but that learned policy is often useless in new problems, even relatively similar ones. Imagine the simplest possible agent: one trained to solve a two-armed bandit task in which one arm always provides a positive reward, and the other arm always provides no reward. Using any RL algorithm such as Q-Learning or Policy Gradient, the agent can quickly learn to always choose the arm with the positive reward.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found