A Proofs
–Neural Information Processing Systems
A.1 Nonconvex stochastic optimization We give proofs of the theorems in section 3. We first give some lemmas. Following the proofs in [60], we introduce the definition of a supermartingale. Since r (0 .5, 1), it follows that the number of iterations N needed is at most O ( null To prove Theorem 5, we first prove the following lemma. Suppose that Assumptions 1 and 2 hold. When neither regularization nor damping is used, i.e.
Neural Information Processing Systems
Aug-17-2025, 05:24:30 GMT