5aea56eefab60e06f35016478e21aae6-Supplemental-Conference.pdf
–Neural Information Processing Systems
A.2 DerivationsforSection3.1 We begin with a formal derivation of the formulas in Section 3.1. We remind that we consider a function F(θ) whose parameters can be split inton SI groups: θ = (θ1,...,θn). We solve an optimization problem(1)with projected gradient descent(2). Remark2 The above formulation allegedly lacks the third (divergent) regime. If, conversely, η > 1Pn i=1αi, then at each iteration at least one of the individual ELRs exceeds its convergencethreshold: ηi > 1αi.
Neural Information Processing Systems
Feb-9-2026, 05:45:17 GMT
- Technology: