A Theory

Neural Information Processing Systems 

In this section, we provide proofs and additional details for Section 3. A.1 Norm constraint: total vs. individual We begin with a formal derivation of the formulas in Section 3.1. Then the following results hold: 1. η < The above formulation allegedly lacks the third (divergent) regime. For the second statement, based on eq. A.4 More formally on the results of Section 3.2 In this section, we provide a more formal argument on the results of Section 3.2. According to the results of Section 3.1, solving it with the projected gradient method Here we provide additional plots depicting the behavior of individual ELRs in the toy example at the end of Section 3.2.