Goto

Collaborating Authors

 automatic differentiation


Robust Automatic Differentiation of Square-Root Kalman Filters via Gramian Differentials

Corenflos, Adrien

arXiv.org Machine Learning

Square-root Kalman filters propagate state covariances in Cholesky-factor form for numerical stability, and are a natural target for gradient-based parameter learning in state-space models. Their core operation, triangularization of a matrix $M \in \mathbb{R}^{n \times m}$, is computed via a QR decomposition in practice, but naively differentiating through it causes two problems: the semi-orthogonal factor is non-unique when $m > n$, yielding undefined gradients; and the standard Jacobian formula involves inverses, which diverges when $M$ is rank-deficient. Both are resolved by the observation that all filter outputs relevant to learning depend on the input matrix only through the Gramian $MM^\top$, so the composite loss is smooth in $M$ even where the triangularization is not. We derive a closed-form chain-rule directly from the differential of this Gramian identity, prove it exact for the Kalman log-marginal likelihood and filtered moments, and extend it to rank-deficient inputs via a two-component decomposition: a column-space term based on the Moore--Penrose pseudoinverse, and a null-space correction for perturbations outside the column space of $M$.


One-step differentiation of iterative algorithms

Neural Information Processing Systems

For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as efficient as implicit differentiation for fast algorithms (e.g., superlinear






bbc9d480a8257889d2af88983e8b126a-Paper-Conference.pdf

Neural Information Processing Systems

While existing automatic differentiation (AD) frameworks allow flexibly composing model architectures, theydonotprovide thesame flexibility forcomposing learning algorithms--everything has to be implemented in terms of backpropagation.



EfficientLearningofGenerativeModelsvia Finite-DifferenceScoreMatching

Neural Information Processing Systems

Several machine learning applications involve the optimization of higher-order derivatives(e.g., gradients ofgradients) during training, which can beexpensive with respect to memory and computation even with automatic differentiation.