Review for NeurIPS paper: Discovering Reinforcement Learning Algorithms
–Neural Information Processing Systems
Additional Feedback: Page 2: In your related work, you have missed several important works, such as for example those of Francis Maes where he proposes approaches for learning fundamental learning rules for RL algorithms (especially for playing bandit problems), see https://scholar.google.be/citations?hl fr&user h8kelPwAAAAJ His approach is very close to yours (same type of objective function). Page 3: The finding of an optimal update policy is in some sense expressed as a Bayesian RL problem (you know a probability distribution over environments as prior) but you never make the connection with this field of research. In the work of Maes, it is somehow formalized as such. You approach can be considered as a gradient-based direct policy search approach for which you have as evaluation metric formula (1), as search space \eta \times \theta and as optimization method a gradient-based method. The main contribution of this paper is how to define the candidate space of your eta, something you never define very well.
Neural Information Processing Systems
Jan-21-2025, 11:08:25 GMT
- Technology: