Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks

Neural Information Processing Systems 

Our work applies DreamerV3's tricks to PPO and is the first such empirical study outside of the original work.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found