002262941c9edfd472a79298b2ac5e17-Supplemental-Conference.pdf
–Neural Information Processing Systems
A.1 Proof Sketch We first introduce the following lemma: Lemma 1. Lemma 2. For matrices A,B 2Mn, if A B, then we have min(A) min(B)and max(A) max(B), where max() (resp., min()) denotes taking the maximum (resp., minimum) eigenvalue.. Proof of Lemma 2. For any matrix P 2Mn with P> = P, we have max(P) = max We first consider the condition number of ˆH when X is in a locally convex area. By equations 3 and 4, we have M1 H M2. Rearranging the terms yields H M1 0 and M2 H 0. Therefore, for any vector x 2RM, we have We next consider the minimum singular value of H and ˆH with min(H)= p min(H2) and min(ˆH)= q min(ˆH2) in any case. Under Assumption 1 and equation 4, we have H M2. Similarly, we can obtain H M2. By Lemma 2, we further have max(H) max(M2)= nmax 2 C.1 kr ˆf(ˆX) k2 vs. krf(X) k2 In this section, we explain why we use kr ˆf(ˆX) k2 rather than kr f(X) k2 to characterize the convergence rate. In general, it is hard to develop a convergence rate for objective values. However, when the global model is in a locally convex area of f, we can obtain the relationship between the gradient and the local optimum.
Neural Information Processing Systems
Apr-24-2026, 07:14:54 GMT
- Technology: