A.1 ProofofThm.1 We will assume without loss of generality that the conditioninfδ (0,1) σ(δ) +σ (δ)
–Neural Information Processing Systems
A.1 ProofofThm.1 We will assume without loss of generality that the conditioninfδ (0,1) Vdiag(y)X B. Tothatend,weletV beany 1-valuedn mmatrix which satisfies Eq. (4) as well as V c( n+ m), wherec 1 is some universal constant. In other words, this class is acomposition of all linear functions of norm at most1, and all univariateL-Lipschitz functions crossing the origin. Fortunately, the Rademacher complexity of such composed classes was analyzed in Golowich et al. [2017] for a differentpurpose. Wenowwish tochoose thefreeparametersp,a,toensure thatallthese conditions aremet(hence we indeed manage to shatter the dataset), while allowing the sizem of the shattered set to be as large as possible. Thus, wecan upperboundtheexpression abovebytakingthesupremum overallvectorswsuchthat w B (and not just those that the correspondingmatrix has spectral norm B). Thus, we have W B. Letx Rd such that ϕ1(x) = w w and xk = 0 for every coordinatek that does not appear inϕ1.
Neural Information Processing Systems
Feb-8-2026, 10:37:57 GMT