Goto

Collaborating Authors

 provably correct automatic sub-differentiation


Provably Correct Automatic Sub-Differentiation for Qualified Programs

Neural Information Processing Systems

The \emph{Cheap Gradient Principle}~\citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing sub-derivatives: widely used ML libraries, including TensorFlow and PyTorch, do \emph{not} correctly compute (generalized) sub-derivatives even on simple differentiable examples. This work considers the question: is there a \emph{Cheap Sub-gradient Principle}? Our main result shows that, under certain restrictions on our library of non-smooth functions (standard in non-linear programming), provably correct generalized sub-derivatives can be computed at a computational cost that is within a (dimension-free) factor of $6$ of the cost of computing the scalar function itself.


Provably Correct Automatic Sub-Differentiation for Qualified Programs

Neural Information Processing Systems

The \emph{Cheap Gradient Principle} \citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a d -dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of 5) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing sub-derivatives: widely used ML libraries, including TensorFlow and PyTorch, do \emph{not} correctly compute (generalized) sub-derivatives even on simple differentiable examples. This work considers the question: is there a \emph{Cheap Sub-gradient Principle}? Our main result shows that, under certain restrictions on our library of non-smooth functions (standard in non-linear programming), provably correct generalized sub-derivatives can be computed at a computational cost that is within a (dimension-free) factor of 6 of the cost of computing the scalar function itself.


Reviews: Provably Correct Automatic Sub-Differentiation for Qualified Programs

Neural Information Processing Systems

In this submission, the authors consider the problem of computing sub-differentiation for a class of non-smooth functions automatically and correctly. They give a very nice example that illustrates problems with current automated differentiation frameworks, such as tensorflow and pytorch. Then, the authors prove a chain rule for the one-sided directional derivative of a composite non-smooth function satisfying certain assumptions. Based on this rule, the authors derive a (randomized) algorithm for computing such a derivative for a particular kind of programs only with constant overhead. The algorithm is very similar to the one for back-ward automatic differentiation except that its forward computation is based on the newly-proved chain rule in the submission, rather than the standard chain rule for differentiation.

  non-smooth function, provably correct automatic sub-differentiation, submission, (9 more...)

Provably Correct Automatic Sub-Differentiation for Qualified Programs

Kakade, Sham M., Lee, Jason D.

Neural Information Processing Systems

The \emph{Cheap Gradient Principle} \citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing sub-derivatives: widely used ML libraries, including TensorFlow and PyTorch, do \emph{not} correctly compute (generalized) sub-derivatives even on simple differentiable examples. This work considers the question: is there a \emph{Cheap Sub-gradient Principle}? Our main result shows that, under certain restrictions on our library of non-smooth functions (standard in non-linear programming), provably correct generalized sub-derivatives can be computed at a computational cost that is within a (dimension-free) factor of $6$ of the cost of computing the scalar function itself. Papers published at the Neural Information Processing Systems Conference.