Reviews: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Neural Information Processing Systems 

The manuscript discusses an important topic, which is optimization in deep reinforcement learning. The authors extend the use of Kronecker-Factored approximation to develop a second order optimization method for deep reinforcement learning. The optimization method use kronecker-factored approximation to the Fisher matrix to estimate the curvature of the cost, resulting in a scalable approximation to natural gradients. The authors demonstrate the power of the method (termed ACKTR) in terms of the performance of agents in Atari and Mujoco RL environments, and compare the proposed algorithm to two previous methods (A2C and TRPO). Overall the manuscript is well-written and to my knowledge the methodology is a novel application to Kronecker-factored approximation.