A Proofs

Neural Information Processing Systems 

A.1 Nonconvex stochastic optimization We give proofs of the theorems in section 3. We first give some lemmas. Lemma 1. Suppose that {x Lemma 2. Suppose that Assumption 2 holds for {x I, the result holds for k = 0. Define ɛ From the definition of Penrose-Moore inverse, we know Equation (41) holds. C > 0 is a constant. Substituting it into (34b), we obtain (47b). Using Corollary 1 we obtain the descent property of SAM: Lemma 3. Suppose that Assumptions 1 and 2 hold for {x C > 0 is a constant. Following the proofs in [60], we introduce the definition of a supermartingale. Proposition 1 (Supermartingale convergence theorem, see, e.g., Theorem 4.2.12 in [13]). As the diminishing condition (18a) holds, we obtain (20).