A Details formulations of the Algorithms

Neural Information Processing Systems 

Note that we can assume a = 0 without loss of generality. L(q, λ), which implies that (q, λ) is a saddle point of L(q, λ). Combing the two cases yield the result. Therefore Assumption 2.2 indicates Φ is v Introducing a Lagrange multiplier λ 0. Since the problem is convex in φ, we can try to solve the dual of the problem, which is max min r This leads to the first claim using the Gronwall's inequality. Following on our derivation in Theorem 2.7, we find that Note that by Theorem 2.7 We run 300 iteration for all the four methods.