Iteratively Learn Diverse Strategies with State Distance Information
–Neural Information Processing Systems
In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., and . First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure.
Neural Information Processing Systems
Dec-24-2025, 23:43:07 GMT
- Technology: