Review for NeurIPS paper: Improving Generalization in Reinforcement Learning with Mixture Regularization
–Neural Information Processing Systems
Additional Feedback: Although I believe the arguments for mixup style regularization make sense, I do have some concerns about potential bias from the ProcGen benchmark. Many of the games in ProcGen are 2D games with a fixed camera (a skim of videos in the envs gives 8 of 16 envs have a fixed cameras and 7 of those 8 have a static image background.) We would expect a mixup style method to do better on these environments, because averaging 2 images together naturally exposes what parts of the image are static, and what parts of the image are not. So I have some concerns over how well this will generalize to other settings. Based on the training curves, mixup is simply more efficient than PPO on the train-time environments.
Neural Information Processing Systems
Jan-24-2025, 17:12:19 GMT
- Technology: