SupplementaryMaterial AStochasticBilevelOptimizerPZOBO-S

Neural Information Processing Systems 

It can be checked that the strong-convexity, smoothness properties are satisfied. First,DARTSestimates amatrix-vector product, whereas our method estimates the response Jacobian matrix. Second, the estimator in DARTS uses an outer gradient difference evaluated at points with a gap of the inner gradient. HOZOG [18]: a hyperparameter optimization algorithm that uses evolution strategies to estimate the entire hypergradient (both the direct and indirect component). We use our own implementationforthismethod.