How to fix reinforcement learning
"Value functions are a core component of [RL] systems. The main idea is to to construct a single function approximator V(s; θ) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s, g; θ) that generalise not just over states s but also over goals g." Here is a rigorous, mathematical formulation of RL that treats goals (the high-level objective of the skill to be learned, which should yield good rewards) as a fundamental and necessary input rather than something to be discovered from just the reward signal. The agent is told what it's supposed to do, just as is done in zero-shot learning and actual human learning. It has been 3 years since this was published, and how many papers have cited it since?
Apr-20-2020, 00:27:58 GMT
- Genre:
- Research Report (0.69)
- Industry:
- Education (0.94)
- Leisure & Entertainment > Games (0.72)
- Technology: