[R] Reinforcement Learning with Unsupervised Auxiliary Tasks • /r/MachineLearning
Can someone explain how does the Loss function workout in the model's favor in 3.4 UNREAL AGENT? They're combining the loss function at first: The primary policy is trained with A3C, then The auxiliary tasks are trained on very recent sequences. Then it says "In practice, the loss is broken down into separate components that are applied either on-policy, directly from experience; or off-policy, on replayed transitions." What decided to apply which to either of the mentioned above components?
Nov-17-2016, 05:25:29 GMT
- Technology: