Supplementary Material A Stochastic Bilevel Optimizer PZOBO-S

Neural Information Processing Systems 

We present the algorithm specification for our proposed stochastic bilevel optimizer PZOBO-S.Algorithm 2 Stochastic PZOBO algorithm (PZOBO-S) For the experiments in Sections 4.1 and 4.4, the bilevel problems are relatively simpler with quadratic It can be checked that the strong-convexity, smoothness properties are satisfied. For the experiments that involve neural networks, e.g., in deep hyper-representation (Section 4.2) and in meta-learning (Section 4.3), the lower-level problem optimizes Second, the estimator in DARTS uses an outer gradient difference evaluated at points with a gap of the inner gradient. The batch size is fixed to 128 for both methods. E.1 Specifications on Baseline Bilevel Approaches in Section 4.1 We compare our algorithm PZOBO with the following baseline methods: 16 Figure 8: PZOBO with different choices of Q for HR with two-layer net. We use the following hyperparameters for all compared methods.