Goto

Collaborating Authors

 xhvh


Supplementary to Smooth Bilevel Programming for Sparse Regularization Clarice Poon, Gabriel Peyré APseudocode for gradient descent implementation

Neural Information Processing Systems

Note that f(βt) = gt is computed either as in line 5 or line 9 of the algorithm and one can use these computations for any gradient based algorithm (e.g. Note also that this is simply gradient descent on a smooth function, and one can apply typical methods to choosing the stepsize γk, such as the Barzilai-Borwein stepsize [Barzilai and Borwein, 1988]. Algorithm 1: Gradient descent implementation of Ncvx-Pro for solving Lasso. 1 initialization v0 Rn (with no zero entries), stepsize γt > 0; Result: βt 2 while not converged do 3 if n6 mand λ>0 then 4 ut = diag(vt)X>Xdiag(vt) + λId To show that i) implies ii), recall that a convex, proper and lower semicontinuous function ϕ can be written in terms of its convex conjugate which has domain Rd . For the expression of ψwhen Ris a norm,from the above, we know that ψ = ( ϕ) ( z), and recall that for any norm, R(β) = maxR (w)61hw, βi. We derive some properties of the function h: Lemma 1.


SupplementarytoSmoothBilevelProgramming forSparseRegularization

Neural Information Processing Systems

Inversionoflinearsystems As mentioned in Corollary(1), for the Lasso, when computing the gradient off, one can either invert an nlinear system or anm mlinear system.