TowardsTheoreticallyUnderstandingWhySGD GeneralizesBetterThanADAMinDeepLearning (SupplementaryFile)

Feb-11-2026, 02:37:24 GMT–Neural Information Processing Systems

It is structured as follows. Appendix C summarizes the notations throughout this document and also provides the auxiliary theories and lemmas forsubsequent analysis whose proofs aredeferred toAppendix E.Then Appendix Dgivesthe proofs ofthe main results inSec. Then we introduce the two types of randomness in the SDE ofADAM. Finally, we run experiments to investigate the validity of the constructedSDEsofADAMandSGD. Here wefurther investigate the second-order moment of the gradient noise.

exp, pq 1, psup 0, (16 more...)

Neural Information Processing Systems

Feb-11-2026, 02:37:24 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:
- Information Technology (0.93)

Duplicate Docs Excel Report

Title
f3f27a324736617f20abbf2ffd806f6d-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found