2a91de02871011d0090e662ffd6f2328-Supplemental-Conference.pdf
–Neural Information Processing Systems
The structure of the appendix mainly follows the roadmap of the proof described in Section 4.4. In Appendix A, we define the characterizable population risk function in (31) to approximate the objective function. Also, some notations to simplify the analysis are introduced in Appendix A, and we recommend the readers to refer to Table 3 for the major notations used in the proofs. Instead, in this paper, we consider multi-layer cases and need to derive a lower bound for the Hessian matrix for all the layers. Instead, the input of the intermediate layer cannot be proved to be Gaussian but belong to sub-Gaussian distribution.
Neural Information Processing Systems
Feb-9-2026, 09:34:53 GMT
- Technology: