Fractal Landscapes in Policy Optimization
–Neural Information Processing Systems
The understanding of such failure cases is still limited. For instance, the training process of reinforcement learning is unstable and the learning curve can fluctuate during training in ways that are hard to predict. The probability of obtaining satisfactory policies can also be inherently low in reward-sparse or highly nonlinear control tasks.
Neural Information Processing Systems
Feb-7-2026, 19:43:34 GMT
- Country:
- North America > United States
- California > San Diego County > San Diego (0.04)
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Technology: