Rethinking Exploration in Reinforcement Learning with Effective Metric-Based Exploration Bonus

Neural Information Processing Systems 

Additionally, methods that utilize the bisimulation metric for evaluating state discrepancies face a theory-practice gap due to improper approximations in metric learning, particularly struggling with hard exploration tasks.