Tursynbek, Nurislam
Smoothed Embeddings for Certified Few-Shot Learning
Pautov, Mikhail, Kuznetsova, Olesya, Tursynbek, Nurislam, Petiushko, Aleksandr, Oseledets, Ivan
Randomized smoothing is considered to be the state-of-the-art provable defense against adversarial perturbations. However, it heavily exploits the fact that classifiers map input objects to class probabilities and do not focus on the ones that learn a metric space in which classification is performed by computing distances to embeddings of classes prototypes. In this work, we extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings. We provide analysis of Lipschitz continuity of such models and derive robustness certificate against $\ell_2$-bounded perturbations that may be useful in few-shot learning scenarios. Our theoretical results are confirmed by experiments on different datasets.
CC-Cert: A Probabilistic Approach to Certify General Robustness of Neural Networks
Pautov, Mikhail, Tursynbek, Nurislam, Munkhoeva, Marina, Muravev, Nikita, Petiushko, Aleksandr, Oseledets, Ivan
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks -- small modifications of the input that change the predictions. Besides rigorously studied $\ell_p$-bounded additive perturbations, recently proposed semantic perturbations (e.g. rotation, translation) raise a serious concern on deploying ML systems in real-world. Therefore, it is important to provide provable guarantees for deep learning models against semantically meaningful input transformations. In this paper, we propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds that can be used in general attack settings. We estimate the probability of a model to fail if the attack is sampled from a certain distribution. Our theoretical findings are supported by experimental results on different datasets.
Adversarial Turing Patterns from Cellular Automata
Tursynbek, Nurislam, Vilkoviskiy, Ilya, Sindeeva, Maria, Oseledets, Ivan
State-of-the-art deep classifiers are intriguingly vulnerable to universal adversarial perturbations: single disturbances of small magnitude that lead to misclassification of most inputs. This phenomena may potentially result in a serious security problem. Despite the extensive research in this area, there is a lack of theoretical understanding of the structure of these perturbations. In image domain, there is a certain visual similarity between patterns, that represent these perturbations, and classical Turing patterns, which appear as a solution of non-linear partial differential equations and are underlying concept of many processes in nature. In this paper, we provide a theoretical bridge between these two different theories, by mapping a simplified algorithm for crafting universal perturbations to (inhomogeneous) cellular automata, the latter is known to generate Turing patterns. Furthermore, we propose to use Turing patterns, generated by cellular automata, as universal perturbations, and experimentally show that they significantly degrade the performance of deep learning models. We found this method to be a fast and efficient way to create a data-agnostic quasi-imperceptible perturbation in the black-box scenario.