Baader, Maximilian
The Fundamental Limits of Interval Arithmetic for Neural Networks
Mirman, Matthew, Baader, Maximilian, Vechev, Martin
Interval analysis (or interval bound propagation, IBP) is a popular technique for verifying and training provably robust deep neural networks, a fundamental challenge in the area of reliable machine learning. However, despite substantial efforts, progress on addressing this key challenge has stagnated, calling into question whether interval arithmetic is a viable path forward. In this paper we present two fundamental results on the limitations of interval arithmetic for analyzing neural networks. Our main impossibility theorem states that for any neural network classifying just three points, there is a valid specification over these points that interval analysis can not prove. Further, in the restricted case of one-hidden-layer neural networks we show a stronger impossibility result: given any radius $\alpha < 1$, there is a set of $O(\alpha^{-1})$ points with robust radius $\alpha$, separated by distance $2$, that no one-hidden-layer network can be proven to classify robustly via interval analysis.
Latent Space Smoothing for Individually Fair Representations
Peychev, Momchil, Ruoss, Anian, Balunović, Mislav, Baader, Maximilian, Vechev, Martin
Fair representation learning encodes user data to ensure fairness and utility, regardless of the downstream application. However, learning individually fair representations, i.e., guaranteeing that similar individuals are treated similarly, remains challenging in high-dimensional settings such as computer vision. In this work, we introduce LASSI, the first representation learning method for certifying individual fairness of high-dimensional data. Our key insight is to leverage recent advances in generative modeling to capture the set of similar individuals in the generative latent space. This allows learning individually fair representations where similar individuals are mapped close together, by using adversarial training to minimize the distance between their representations. Finally, we employ randomized smoothing to provably map similar individuals close together, in turn ensuring that local robustness verification of the downstream application results in end-to-end fairness certification. Our experimental evaluation on challenging real-world image data demonstrates that our method increases certified individual fairness by up to 60%, without significantly affecting task utility.
Certified Defenses: Why Tighter Relaxations May Hurt Training?
Jovanović, Nikola, Balunović, Mislav, Baader, Maximilian, Vechev, Martin
Certified defenses based on convex relaxations are an established technique for training provably robust models. The key component is the choice of relaxation, varying from simple intervals to tight polyhedra. Paradoxically, however, it was empirically observed that training with tighter relaxations can worsen certified robustness. While several methods were designed to partially mitigate this issue, the underlying causes are poorly understood. In this work we investigate the above phenomenon and show that tightness may not be the determining factor for reduced certified robustness. Concretely, we identify two key features of relaxations that impact training dynamics: continuity and sensitivity. We then experimentally demonstrate that these two factors explain the drop in certified robustness when using popular relaxations. Further, we show, for the first time, that it is possible to successfully train with tighter relaxations (i.e., triangle), a result supported by our two properties. Overall, we believe the insights of this work can help drive the systematic discovery of new effective certified defenses.
Efficient Certification of Spatial Robustness
Ruoss, Anian, Baader, Maximilian, Balunović, Mislav, Vechev, Martin
Such spatial attacks can be modeled by smooth vector fields that describe the displacement of every pixel. Common geometric transformations, e.g., rotation and translation, are particular instances of these smooth vector fields, which indicates that they capture a wide range of naturally occurring image transformations. Since the vulnerability of neural networks to spatially transformed adversarial examples can pose a security threat to computer vision systems relying on such models, it is critical to quantify their robustness against spatial transformations. A common approach to estimate neural network robustness is to measure the success rate of strong attacks (Carlini, Wagner, 2017; Madry et al., 2018). However, many networks which are indeed robust against these attacks were later broken using even more sophisticated attacks (Athalye, Carlini, 2018; Athalye et al., 2018; Engstrom et al., 2018; Tramer et al., 2020). The key issue is that such attacks do not provide provable robustness guarantees.
Universal Approximation with Certified Networks
Baader, Maximilian, Mirman, Matthew, Vechev, Martin
Training neural networks to be certifiably robust is a powerful defense against adversarial attacks. However, while promising, state-of-the-art results with certified training are far from satisfactory. Currently, it is very difficult to train a neural network that is both accurate and certified on realistic datasets and specifications (e.g., robustness). Given this difficulty, a pressing existential question is: given a dataset and a specification, is there a network that is both certified and accurate with respect to these? While the evidence suggests "no", we prove that for realistic datasets and specifications, such a network does exist and its certification can be established by propagating lower and upper bounds of each neuron through the network (interval analysis) - the most relaxed yet computationally efficient convex relaxation. Our result can be seen as a Universal Approximation Theorem for interval-certified ReLU networks. To the best of our knowledge, this is the first work to prove the existence of accurate, interval-certified networks.