A Minimal Working Example for Continuous Policy Gradients
Defining a custom loss function and applying the GradientTape functionality, the actor network can be trained using only a few lines. At the root of all the sophisticated actor-critic algorithms that are designed and applied these days is the vanilla policy gradient algorithm, which essentially is an actor-only algorithm. Nowadays, the actor that learns the decision-making policy is often represented by a neural network. With so many deep reinforcement learning algorithms in circulation, you'd expect it to be easy to find abundant plug-and-play TensorFlow implementations for a basic actor network in continuous control, but this is hardly the case. Various reasons may exist for this.
Aug-29-2020, 09:15:43 GMT
- Technology: