Goto

Collaborating Authors

 Search




Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: The paper presents a sample-efficient policy search algorithm for large, continuous reinforcement learning problems. In contrast to existing model-based policy search algorithms, the approach presented in this paper tries to learn local models in form of linear Gaussian controllers. Given the information (rollouts) from these linear local models, a global, nonlinear policy can then be learned using an arbitrary parametrization scheme. The so-called Guided Policy Search approach alternates between (local) trajectory optimization and (global) policy search in an iterative fashion. In their experiments, the authors show that the approach outperforms various state-of-the-art Policy Search methods, e.g., REPS, PILCO etc. Experiments where conducted in (mostly 2D) dynamics simulations involving the continuous control of multi-linked agents.


Minimax

Neural Information Processing Systems

We thank reviewers for appreciating the originality of our work and providing constructive feedback. We address specific concerns below. Random selection in Alg. 1 means sampling uniformly The intuition behind Thm. 2 in explained But to interpret Thm. 2 alone: for any algorithm considered, if There is no missing factor of 2 in Eq.(28) and Eq.(26) Thm. 3 is as following: for any Pareto optimal rate Alg. 1 is thus Pareto optimal. Eq. after line 115 defines the hardness level of a given problem, Alg. 1 is different from the Distilled Note that we are also comparing to an algorithm, i.e., QRM2, that allows the reuse of statistics [12]. The lower bound in Section 2 is in the minimax sense, so it suffices to reduce to the single-best arm case.



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper introduces a new approach to sampling from continuous probability distributions. The method extends prior work on using a combination of Gumbel perturbations and optimization to the continuous case. This is technically challenging, and they devise several interesting ideas to deal with continuous spaces, e.g. to produce an exponentially large or even infinite number of random variables (one per point of the continuous/discrete space) with the right distribution in an implicit way. Finally, they highlight an interesting connection with adaptive rejection sampling.


743c41a921516b04afde48bb48e28ce6-AuthorFeedback.pdf

Neural Information Processing Systems

HOOF is robust to settings within this range. We could not present results for Ant and Walker due to space constraints. Thus we are restricted to zero order optimisers. For natural gradients like TNPG, HOOF does not add any new hyperparameters beyond those used by grid search - i.e. Other methods like PBT introduce more hyperparameters than these.


Response to Reviewer 1: 3

Neural Information Processing Systems

We thank all reviewers for their comments and acknowledgeme nt of our contribution. Below we address each reviewer's comments separately. The reviewer raised a very good point. We will add this clarification in the revised version. Our gradient-based method is much more efficient but only finds a stationary point.