Semiparametric Efficient Bilevel Gradient Estimation
Khoury, Fares El, Zenati, Houssam, Kallus, Nathan, Arbel, Michael, Bibaut, Aurélien
Bilevel optimization provides a natural framework for problems in which one learning task is constrained by the solution of another. This hierarchical structure appears across machine learning, including hyperparameter optimization [43, 39, 36], meta-learning [20, 18, 45], inverse problems and optimal control [31, 1], reinforcement learning [25], domain adaptation [35], and instrumental variable regression [42, 50, 49]. In these applications, the outer parameter is typically updated using gradient-based methods, so the quality of the resulting bilevel gradient directly affects both optimization and statistical performance. Most existing theory for bilevel optimization has been developed in finite-dimensional parametric settings, often under strong convexity of the lower-level problem [21, 27, 29, 61]. This assumption gives a unique inner solution and makes implicit differentiation stable [43, 36]. It is also convenient for algorithmic convergence and stability analyses [9, 23, 40].
May-21-2026