A Proofs
–Neural Information Processing Systems
A.1 Nonconvex stochastic optimization We give proofs of the theorems in section 3. We first give some lemmas. Following the proofs in [60], we introduce the definition of a supermartingale. Since r (0 .5, 1), it follows that the number of iterations N needed is at most O ( null To prove Theorem 5, we first prove the following lemma. Suppose that Assumptions 1 and 2 hold. When neither regularization nor damping is used, i.e.
Neural Information Processing Systems
Nov-15-2025, 14:34:06 GMT