Review for NeurIPS paper: Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Neural Information Processing Systems 

Additional Feedback: The authors' response has addressed my questions. I will keep my score. This is a natural question to ask, so it could be worth an explanation somewhere. However, this paper suggests a slower rate by a factor of (1-\gamma) {-2}. What could cause the difference and how could the theory here guide development of deep RL algorithms?