Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Jan-18-2025, 19:40:19 GMT–Neural Information Processing Systems

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy.

continuous control, noisy neighborhood, policy optimization, (2 more...)

Neural Information Processing Systems

Jan-18-2025, 19:40:19 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)