O (null

Neural Information Processing Systems 

We thank the reviewers for their thoughtful comments and feedback. Below we respond to the reviewers' concerns. In the proof of Lemma A.5, on lines 429-430, we are using the Taylor expansion of We plan to expand our experimental results in multiple directions. 1) We have already If accepted, we plan to include it in the final submission. We apologize for the confusion. Note that the use of SGD or Adam does not change the overall takeaways of the experiments.