Oberman, Adam M
Adversarial Boot Camp: label free certified robustness in one epoch
Campbell, Ryan, Finlay, Chris, Oberman, Adam M
Machine learning models are vulnerable to adversarial attacks. One approach to addressing this vulnerability is certification, which focuses on models that are guaranteed to be robust for a given perturbation size. A drawback of recent certified models is that they are stochastic: they require multiple computationally expensive model evaluations with random noise added to a given input. In our work, we present a deterministic certification approach which results in a certifiably robust model. This approach is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We achieve certified models on ImageNet-1k by retraining a model with this loss for one epoch without the use of label information.
Deterministic Gaussian Averaged Neural Networks
Campbell, Ryan, Finlay, Chris, Oberman, Adam M
We present a deterministic method to compute the Gaussian average of neural networks used in regression and classification. Our method is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We use this equivalence to certify models which perform well on clean data but are not robust to adversarial perturbations. In terms of certified accuracy and adversarial robustness, our method is comparable to known stochastic methods such as randomized smoothing, but requires only a single model evaluation during inference.
Learning normalizing flows from Entropy-Kantorovich potentials
Finlay, Chris, Gerolin, Augusto, Oberman, Adam M, Pooladian, Aram-Alexandre
We approach the problem of learning continuous normalizing flows from a dual perspective motivated by entropy-regularized optimal transport, in which continuous normalizing flows are cast as gradients of scalar potential functions. This formulation allows us to train a dual objective comprised only of the scalar potential functions, and removes the burden of explicitly computing normalizing flows during training. After training, the normalizing flow is easily recovered from the potential functions.
Partial differential equation regularization for supervised machine learning
Oberman, Adam M
This article is an overview of supervised machine learning problems for regression and classification. Topics include: kernel methods, training by stochastic gradient descent, deep learning architecture, losses for classification, statistical learning theory, and dimension independent generalization bounds. Implicit regularization in deep learning examples are presented, including data augmentation, adversarial training, and additive noise. These methods are re-framed as explicit gradient regularization.
Scaleable input gradient regularization for adversarial robustness
Finlay, Chris, Oberman, Adam M
Input gradient regularization is not thought to be an effective means for promoting adversarial robustness. In this work we revisit this regularization scheme with some new ingredients. First, we derive new per-image theoretical robustness bounds based on local gradient information, and curvature information when available. These bounds strongly motivate input gradient regularization. Second, we implement a scaleable version of input gradient regularization which avoids double backpropagation: adversarially robust ImageNet models are trained in 33 hours on four consumer grade GPUs. Finally, we show experimentally that input gradient regularization is competitive with adversarial training.
Lipschitz regularized Deep Neural Networks converge and generalize
Oberman, Adam M, Calder, Jeff
Lipschitz regularized neural networks augment the usual fidelity term used in training with a regularization term corresponding the excess Lipschitz constant of the network compared to the Lipschitz constant of the data. We prove that Lipschitz regularized neural networks converge, and provide a rate, in the limit as the number of data points $n\to\infty$. We consider the regime where perfect fitting of data is possible, which means the size of the network grows with $n$. There are two regimes: in the case of perfect labels, we prove convergence to the label function which corresponds to zero loss. In the case of corrupted labels which occurs when the Lipschitz constant of the data blows up, we prove convergence to a regularized label function which is the solution of a limiting variational problem.