Plotting

 Stock, Pierre


Training with Quantization Noise for Extreme Model Compression

arXiv.org Machine Learning

We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training (Jacob et al., 2018), where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator (Bengio et al., 2013). In this paper, we extend this approach to work beyond int8 fixedpoint quantization with extreme compression methods where the approximations introduced by STE are severe, such as Product Quantization. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. For example, applying our method to state-of-the-art Transformer and ConvNet architectures, we can achieve 82.5% accuracy on MNLI by compressing RoBERTa to 14 MB and 80.0% top-1 accuracy on ImageNet by compressing an EfficientNet-B3 to 3.3 MB. Many of the best performing neural network architectures in real-world applications have a large number of parameters. For example, the current standard machine translation architecture, Transformer (Vaswani et al., 2017), has layers that contain millions of parameters. Even models that are designed to jointly optimize the performance and the parameter efficiency, such as EfficientNets (Tan & Le, 2019), still require dozens to hundreds of megabytes, which limits their applications to domains like robotics or virtual assistants. Model compression schemes reduce the memory footprint of overparametrized models. Pruning (LeCun et al., 1990) and distillation (Hinton et al., 2015) remove parameters by reducing the number of network weights. In contrast, quantization focuses on reducing the bits per weight.


ConvNets and ImageNet Beyond Accuracy: Explanations, Bias Detection, Adversarial Examples and Model Criticism

arXiv.org Machine Learning

ConvNets and Imagenet have driven the recent success of deep learning for image classification. However, the marked slowdown in performance improvement, the recent studies on the lack of robustness of neural networks to adversarial examples and their tendency to exhibit undesirable biases (e.g racial biases) questioned the reliability and the sustained development of these methods. This work investigates these questions from the perspective of the end-user by using human subject studies and explanations. We experimentally demonstrate that the accuracy and robustness of ConvNets measured on Imagenet are underestimated. We show that explanations can mitigate the impact of misclassified adversarial examples from the perspective of the end-user and we introduce a novel tool for uncovering the undesirable biases learned by a model. These contributions also show that explanations are a promising tool for improving our understanding of ConvNets' predictions and for designing more reliable models