Derivatives of Stochastic Gradient Descent
Iutzeler, Franck, Pauwels, Edouard, Vaiter, Samuel
–arXiv.org Artificial Intelligence
The differentiation of iterative algorithms has been a subject of research since the 1990s (Gilbert, 1992; Christianson, 1994; Beck, 1994), and was succinctly described as "piggyback differentiation" by Griewank and Faure (2003). This idea has gained renewed interest within the machine learning community, particularly for applications such as hyperparameter optimization (Maclaurin et al., 2015; Franceschi et al., 2017), metalearning (Finn et al., 2017; Rajeswaran et al., 2019), and learning discretization of total variation (Chambolle and Pock, 2021; Bogensperger et al., 2022). When applied to an optimization problem, an important theoretical concern is the convergence of the derivatives of iterates to the derivatives of the solution. Traditional guarantees focus on asymptotic convergence to the solution derivative, as described by the implicit function theorem (Gilbert, 1992; Christianson, 1994; Beck, 1994). This issue has inspired recent works for smooth optimization algorithms (Mehmood and Ochs, 2020, 2022), generic nonsmooth iterations (Bolte et al., 2022), and second-order methods (Bolte et al., 2023).
arXiv.org Artificial Intelligence
May-24-2024
- Country:
- Europe > France > Provence-Alpes-Côte d'Azur (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Technology: