discovering reinforcement learning algorithm
Discovering Reinforcement Learning Algorithms
Reinforcement learning (RL) algorithms update an agent's parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been prior attempts at addressing this significant scientific challenge, it remains an open question whether it is feasible to discover alternatives to fundamental concepts of RL such as value functions and temporal-difference learning. This paper introduces a new meta-learning approach that discovers an entire update rule which includes both how to learn from it' (e.g.
Discovering Reinforcement Learning Algorithms
Reinforcement learning (RL) algorithms update an agent's parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been prior attempts at addressing this significant scientific challenge, it remains an open question whether it is feasible to discover alternatives to fundamental concepts of RL such as value functions and temporal-difference learning. This paper introduces a new meta-learning approach that discovers an entire update rule which includes both what to predict' (e.g. The output of this method is an RL algorithm that we call Learned Policy Gradient (LPG).
Review for NeurIPS paper: Discovering Reinforcement Learning Algorithms
Additional Feedback: Page 2: In your related work, you have missed several important works, such as for example those of Francis Maes where he proposes approaches for learning fundamental learning rules for RL algorithms (especially for playing bandit problems), see https://scholar.google.be/citations?hl fr&user h8kelPwAAAAJ His approach is very close to yours (same type of objective function). Page 3: The finding of an optimal update policy is in some sense expressed as a Bayesian RL problem (you know a probability distribution over environments as prior) but you never make the connection with this field of research. In the work of Maes, it is somehow formalized as such. You approach can be considered as a gradient-based direct policy search approach for which you have as evaluation metric formula (1), as search space \eta \times \theta and as optimization method a gradient-based method. The main contribution of this paper is how to define the candidate space of your eta, something you never define very well.
Discovering Reinforcement Learning Algorithms
Reinforcement learning (RL) algorithms update an agent's parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been prior attempts at addressing this significant scientific challenge, it remains an open question whether it is feasible to discover alternatives to fundamental concepts of RL such as value functions and temporal-difference learning. This paper introduces a new meta-learning approach that discovers an entire update rule which includes both what to predict' (e.g. The output of this method is an RL algorithm that we call Learned Policy Gradient (LPG).