Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates
Grazzi, Riccardo, Pontil, Massimiliano, Salzo, Saverio
Important examples are given by hyperparameter optimization and meta-learning (Franceschi et al., 2018; Lee et al., 2019), where (1) expresses the optimality conditions of a lower-level minimization problem. Further examples include learning a surrogate model for data poisoning attacks (Xiao et al., 2015; Muñoz-González et al., 2017), deep equilibrium models (Bai et al., 2019) or OptNet (Amos & Kolter, 2017). All these problems may present nonsmooth mappings Φ. For instance, consider hyperparameter optimization or data poisoning attacks for SVMs, or meta-learning for image classification, where Φ is evaluated through the forward pass of a neural net with RELU activations (Bertinetto et al., 2019; Lee et al., 2019; Rajeswaran et al., 2019). In addition, when such settings are applied to large datasets, evaluating the map Φ would be too costly, but we can usually apply stochastic methods through the composite stochastic structure in (2), where only T involves a computation on the full training set (e.g., a gradient descent step).
Mar-28-2024