Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks
–Neural Information Processing Systems
Our work applies DreamerV3's tricks to PPO and is the first such empirical study outside of the original work.
Neural Information Processing Systems
Dec-27-2025, 22:22:05 GMT
- Country:
- Genre:
- Research Report (0.46)
- Technology: