Supplementary Materials A Experiment As suggested by one reviewer, we conduct the following experiment over Cartpole in OpenAI gym to

Neural Information Processing Systems 

The following lemma justifies item 3 in Assumption 1. Consider the following two cases: 1. Density function of the policy is smooth, i.e. We then show how Theorem 4 implies Theorem 1. Assumption 3. F or all x X, there exist constants such that the following hold 1. F or all x, we have null A Now we proceed to prove the main theorem. Then, given the above convergence result on the gradient norm, we proceed to prove the convergence of NAC in terms of the function value.