One-step differentiation of iterative algorithms
Bolte, Jérôme, Pauwels, Edouard, Vaiter, Samuel
–arXiv.org Artificial Intelligence
Differentiating the solution of a machine learning problem is a important task, e.g., in hyperparameters optimization [9], in neural architecture search [26] and when using convex layers [3]. There are two main ways to achieve this goal: automatic differentiation (AD) and implicit differentiation (ID). Automatic differentiation implements the idea of evaluating derivatives through the compositional rules of differential calculus in a user-transparent way. It is a mature concept [23] implemented in several machine learning frameworks [31, 16, 1]. However, the time and memory complexity incurred may become prohibitive as soon as the computational graph becomes bigger, a typical example being unrolling iterative optimization algorithms such as gradient descent [5]. The alternative, implicit differentiation, is not always accessible: it does not solely relies on the compositional rules of differential calculus and usually requires solving a linear system. The user needs to implement custom rules in an automatic differentiation framework (as done, for example, in [4]) or use dedicated libraries such as [11, 3, 10] implementing these rules for given models. Provided that the implementation is carefully done, this is most of the time the gold standard for the task of differentiating problem solutions.
arXiv.org Artificial Intelligence
May-23-2023