Goto

Collaborating Authors

 bounding




A Complete Algorithms

Neural Information Processing Systems

In Section B, we provide some preliminaries. In Section C, we provide sparsity analysis. We show convergence analysis in Section D. In Section E, we show how to combine the sparsity, convergence, running time all together. In Section F, we show correlation between sparsity and spectral gap of Hessian in neural tangent kernel. In Section G, we discuss how to generalize our result to quantum setting.







A Baseline algorithms

Neural Information Processing Systems

The following theorem is a more general version of Theorem 5.1. Assume that Assumptions 1 to 3 hold. Note that the only difference between Theorem B.1 and Theorem 5.1 lies in That is, the "oldest" response used to update By Jensen's inequality and L -smoothness, we have null f In order for the paper to be self-contained, we restate the proof here. The following lemma is slightly modified from Lemma 8 in [18]. By Lemma B.1, we have B Combining Appendix B.3.1 and Appendix B.3.2, we have B.4 Deriving the convergence bound In this subsection, we obtain Theorem B.1 based on the descent lemma.