pprs
Certified Adversarial Robustness via Partition-based Randomized Smoothing
A reliable application of deep neural network classifiers requires robustness certificates against adversarial perturbations. Gaussian smoothing is a widely analyzed approach to certifying robustness against norm-bounded perturbations, where the certified prediction radius depends on the variance of the Gaussian noise and the confidence level of the neural net's prediction under the additive Gaussian noise. However, in application to high-dimensional image datasets, the certified radius of the plain Gaussian smoothing could be relatively small, since Gaussian noise with high variances can significantly harm the visibility of an image. In this work, we propose the Pixel Partitioning-based Randomized Smoothing (PPRS) methodology to boost the neural net's confidence score and thus the robustness radius of the certified prediction. We demonstrate that the proposed PPRS algorithm improves the visibility of the images under additive Gaussian noise. We discuss the numerical results of applying PPRS to standard computer vision datasets and neural network architectures. Our empirical findings indicate a considerable improvement in the certified accuracy and stability of the prediction model to the additive Gaussian noise in randomized smoothing.
Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning
Colin, Igor, Santos, Ludovic Dos, Scaman, Kevin
We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $\varepsilon^{-2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.