Fractal Landscapes in Policy Optimization
–Neural Information Processing Systems
The understanding of such failure cases is still limited. For instance, the training process of reinforcement learning is unstable and the learning curve can fluctuate during training in ways that are hard to predict. The probability of obtaining satisfactory policies can also be inherently low in reward-sparse or highly nonlinear control tasks.
Neural Information Processing Systems
Feb-7-2026, 19:43:34 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- California > San Diego County > San Diego (0.04)
- Asia > Middle East
- Technology: