Fractal Landscapes in Policy Optimization

Neural Information Processing Systems 

The understanding of such failure cases is still limited. For instance, the training process of reinforcement learning is unstable and the learning curve can fluctuate during training in ways that are hard to predict. The probability of obtaining satisfactory policies can also be inherently low in reward-sparse or highly nonlinear control tasks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found