[R] GRAC: Self-Guided and Self-Regularized Actor-Critic
Abstract: Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Another dominant component especially in continuous domains is the policy gradient method which models and optimizes the policy directly. However, when Q functions are approximated with neural networks, their landscapes can be complex and therefore mislead the local gradient.
machine learning, reinforcement learning, self-guided and self-regularized actor-critic, (4 more...)
Sep-22-2020, 04:02:07 GMT
- Technology: