return landscape
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy.
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Rahn, Nate, D'Oro, Pierluca, Wiltzer, Harley, Bacon, Pierre-Luc, Bellemare, Marc G.
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.