38af86134b65d0f10fe33d30dd76442e-Reviews.html
–Neural Information Processing Systems
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper under review, Variational Guided Policy Search introduces a new approach of how classical policy search can be combined and improved with trajectory optimization methods serving as exploration strategy. An optimization criteria with the goal of finding optimal policy parameters is decomposed with a variational approach. The variational distribution is approximated as Gaussian distribution which allows a solution with the iterative LQR algorithm. The overall algorithm uses expectation maximization to iterate between minimizing the KL divergence of the variational decomposition and maximizing the lower bound with respect to the policy parameters.
Neural Information Processing Systems
Oct-3-2025, 09:08:02 GMT