6191ab7080c840f67eaf5dff7d5edfcb-Supplemental-Conference.pdf
–Neural Information Processing Systems
Diversity in equally-performing policies.We show that different neighborhoods correspond to different post-update return distributions and agent behaviors. We discover that at equal average returns, different policies obtained by the same deep RL algorithm may in fact have substantially different distributional profiles, as measured by statistics of the post-update return distribution.
Neural Information Processing Systems
Feb-12-2026, 16:10:47 GMT
- Country:
- North America
- United States > Louisiana (0.04)
- Canada > Quebec (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Technology: