Supplementary to Smooth Bilevel Programming for Sparse Regularization Clarice Poon, Gabriel Peyré APseudocode for gradient descent implementation
–Neural Information Processing Systems
Note that f(βt) = gt is computed either as in line 5 or line 9 of the algorithm and one can use these computations for any gradient based algorithm (e.g. Note also that this is simply gradient descent on a smooth function, and one can apply typical methods to choosing the stepsize γk, such as the Barzilai-Borwein stepsize [Barzilai and Borwein, 1988]. Algorithm 1: Gradient descent implementation of Ncvx-Pro for solving Lasso. 1 initialization v0 Rn (with no zero entries), stepsize γt > 0; Result: βt 2 while not converged do 3 if n6 mand λ>0 then 4 ut = diag(vt)X>Xdiag(vt) + λId To show that i) implies ii), recall that a convex, proper and lower semicontinuous function ϕ can be written in terms of its convex conjugate which has domain Rd . For the expression of ψwhen Ris a norm,from the above, we know that ψ = ( ϕ) ( z), and recall that for any norm, R(β) = maxR (w)61hw, βi. We derive some properties of the function h: Lemma 1.
Neural Information Processing Systems
May-1-2026, 01:41:33 GMT
- Technology: