Review for NeurIPS paper: Meta-Gradient Reinforcement Learning with an Objective Discovered Online
–Neural Information Processing Systems
Strengths: The idea of formulating the inner loss for meta RL as learning from the objective discovered by its own is interesting and novel. Generally, defining the algorithm to self-discover its objective makes the learning algorithm moves one step closer towards developing automated machine intelligence compared to the conventional meta RL methods which greatly rely on expert's design choice such as the hyperparameter to perform learning-to-learn. The authors present extensive experiment results to evaluate the proposed method. The proposed method has been evaluated on three task domains: a catch game to demonstrate the method could effectively learn bootstrapping, a 5-state random walk to demonstrate the method works in non-stationary environments, and ALE which is a large-scale RL testbed. In all the task domains, the proposed method achieves noticeable performance improvement over the compared baselines.
Neural Information Processing Systems
Jan-27-2025, 14:31:58 GMT
- Technology: