Merrer, Erwan Le
Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes
Godinot, Augustin, Merrer, Erwan Le, Penzo, Camilla, Taïani, François, Trédan, Gilles
The deployment of machine learning models in operational contexts represents a significant investment for any organisation. Consequently, the risk of these models being misappropriated by competitors needs to be addressed. In recent years, numerous proposals have been put forth to detect instances of model stealing. However, these proposals operate under implicit and disparate data and model access assumptions; as a consequence, it remains unclear how they can be effectively compared to one another. Our evaluation shows that a simple baseline that we introduce performs on par with existing state-of-the-art fingerprints, which, on the other hand, are much more complex. To uncover the reasons behind this intriguing result, this paper introduces a systematic approach to both the creation of model fingerprinting schemes and their evaluation benchmarks. By dividing model fingerprinting into three core components -- Query, Representation and Detection (QuRD) -- we are able to identify $\sim100$ previously unexplored QuRD combinations and gain insights into their performance. Finally, we introduce a set of metrics to compare and guide the creation of more representative model stealing detection benchmarks. Our approach reveals the need for more challenging benchmarks and a sound comparison with baselines. To foster the creation of new fingerprinting schemes and benchmarks, we open-source our fingerprinting toolbox.
The 20 questions game to distinguish large language models
Richardeau, Gurvan, Merrer, Erwan Le, Penzo, Camilla, Tredan, Gilles
In a parallel with the 20 questions game, we present a method to determine whether two large language models (LLMs), placed in a black-box context, are the same or not. The goal is to use a small set of (benign) binary questions, typically under 20. We formalize the problem and first establish a baseline using a random selection of questions from known benchmark datasets, achieving an accuracy of nearly 100% within 20 questions. After showing optimal bounds for this problem, we introduce two effective questioning heuristics able to discriminate 22 LLMs by using half as many questions for the same task. These methods offer significant advantages in terms of stealth and are thus of interest to auditors or copyright owners facing suspicions of model leaks.
Under manipulations, are some AI models harder to audit?
Godinot, Augustin, Tredan, Gilles, Merrer, Erwan Le, Penzo, Camilla, Taïani, Francois
Auditors need robust methods to assess the compliance of web platforms with the law. However, since they hardly ever have access to the algorithm, implementation, or training data used by a platform, the problem is harder than a simple metric estimation. Within the recent framework of manipulation-proof auditing, we study in this paper the feasibility of robust audits in realistic settings, in which models exhibit large capacities. We first prove a constraining result: if a web platform uses models that may fit any data, no audit strategy -- whether active or not -- can outperform random sampling when estimating properties such as demographic parity. To better understand the conditions under which state-of-the-art auditing techniques may remain competitive, we then relate the manipulability of audits to the capacity of the targeted models, using the Rademacher complexity. We empirically validate these results on popular models of increasing capacities, thus confirming experimentally that large-capacity models, which are commonly used in practice, are particularly hard to audit robustly. These results refine the limits of the auditing problem, and open up enticing questions on the connection between model capacity and the ability of platforms to manipulate audit attempts.
Fairness Auditing with Multi-Agent Collaboration
de Vos, Martijn, Dhasade, Akash, Bourrée, Jade Garcia, Kermarrec, Anne-Marie, Merrer, Erwan Le, Rottembourg, Benoit, Tredan, Gilles
For instance, (Rastegarpanah Existing work in fairness audits assumes that et al., 2021) proposes to study the illegal use of some profile agents operate independently. In this paper, we data in the response of a model in such a query-response consider the case of multiple agents auditing the setup. Yet, as of today, an auditor performs her audit tasks same platform for different tasks. Agents have on each attribute of interest sequentially, one after the other, two levers: their collaboration strategy, with or and independent of other auditors. For example, when she without coordination beforehand, and their sampling wants to audit an ML model that predicts whether it is safe method. We theoretically study their interplay to issue a loan (Feldman et al., 2015), she begins by auditing when agents operate independently or collaborate.
On the relevance of APIs facing fairwashed audits
Bourrée, Jade Garcia, Merrer, Erwan Le, Tredan, Gilles, Rottembourg, Benoît
Recent legislation required AI platforms to provide APIs for regulators to assess their compliance with the law. Research has nevertheless shown that platforms can manipulate their API answers through fairwashing. Facing this threat for reliable auditing, this paper studies the benefits of the joint use of platform scraping and of APIs. In this setup, we elaborate on the use of scraping to detect manipulated answers: since fairwashing only manipulates API answers, exploiting scraps may reveal a manipulation. To abstract the wide range of specific API-scrap situations, we introduce a notion of proxy that captures the consistency an auditor might expect between both data sources. If the regulator has a good proxy of the consistency, then she can easily detect manipulation and even bypass the API to conduct her audit. On the other hand, without a good proxy, relying on the API is necessary, and the auditor cannot defend against fairwashing. We then simulate practical scenarios in which the auditor may mostly rely on the API to conveniently conduct the audit task, while maintaining her chances to detect a potential manipulation. To highlight the tension between the audit task and the API fairwashing detection task, we identify Pareto-optimal strategies in a practical audit scenario. We believe this research sets the stage for reliable audits in practical and manipulation-prone setups.
RoBIC: A benchmark suite for assessing classifiers robustness
Maho, Thibault, Bonnet, Benoît, Furon, Teddy, Merrer, Erwan Le
Many defenses have emerged with the development of adversarial attacks. Models must be objectively evaluated accordingly. This paper systematically tackles this concern by proposing a new parameter-free benchmark we coin RoBIC. RoBIC fairly evaluates the robustness of image classifiers using a new half-distortion measure. It gauges the robustness of the network against white and black box attacks, independently of its accuracy. RoBIC is faster than the other available benchmarks. We present the significant differences in the robustness of 16 recent models as assessed by RoBIC.
SurFree: a fast surrogate-free black-box attack
Maho, Thibault, Furon, Teddy, Merrer, Erwan Le
Machine learning classifiers are critically prone to evasion attacks. Adversarial examples are slightly modified inputs that are then misclassified, while remaining perceptively close to their originals. Last couple of years have witnessed a striking decrease in the amount of queries a black box attack submits to the target classifier, in order to forge adversarials. This particularly concerns the black-box score-based setup, where the attacker has access to top predicted probabilites: the amount of queries went from to millions of to less than a thousand. This paper presents SurFree, a geometrical approach that achieves a similar drastic reduction in the amount of queries in the hardest setup: black box decision-based attacks (only the top-1 label is available). We first highlight that the most recent attacks in that setup, HSJA, QEBA and GeoDA all perform costly gradient surrogate estimations. SurFree proposes to bypass these, by instead focusing on careful trials along diverse directions, guided by precise indications of geometrical properties of the classifier decision boundaries. We motivate this geometric approach before performing a head-to-head comparison with previous attacks with the amount of queries as a first class citizen. We exhibit a faster distortion decay under low query amounts (few hundreds to a thousand), while remaining competitive at higher query budgets.
The Bouncer Problem: Challenges to Remote Explainability
Merrer, Erwan Le, Tredan, Gilles
The concept of explainability is envisioned to satisfy society's demands for transparency on machine learning decisions. The concept is simple: like humans, algorithms should explain the rationale behind their decisions so that their fairness can be assessed. While this approach is promising in a local context (e.g. to explain a model during debugging at training time), we argue that this reasoning cannot simply be transposed in a remote context, where a trained model by a service provider is only accessible through its API. This is problematic as it constitutes precisely the target use-case requiring transparency from a societal perspective. Through an analogy with a club bouncer (which may provide untruthful explanations upon customer reject), we show that providing explanations cannot prevent a remote service from lying about the true reasons leading to its decisions. More precisely, we prove the impossibility of remote explainability for single explanations, by constructing an attack on explanations that hides discriminatory features to the querying user. We provide an example implementation of this attack. We then show that the probability that an observer spots the attack, using several explanations for attempting to find incoherences, is low in practical settings. This undermines the very concept of remote ex-plainability in general. 1 Introduction Modern decision-making driven by black-box systems now impacts a significant share of our lives [9, 29]. Those systems build on user data, and range from rec-ommenders [21] ( e.g., for personalized ranking of information on websites) to predictive algorithms ( e.g., credit default) [29]. This widespread deployment, along with the opaque decision process provided by those systems raises concerns about transparency for the general public or for policy makers [12]. This translated in some jurisdictions ( e.g., United States of America and Europe) into a so called right to explanation [12, 26], that states that the output decisions of an algorithm must be motivated. Explainability of in-house models An already large body of work is interested in the explainability of implicit machine learning models (such as neural network models) [2, 13, 20]. Indeed, those models show state-of-art performances when it comes to a task accuracy, but they are not designed to provide explanations -or at least intelligible decision processes-when one wants to obtain more than the output decision of the model. In the context of recommendation, the expression "post hoc explanation" has been coined [32].
MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets
Hardy, Corentin, Merrer, Erwan Le, Sericola, Bruno
A recent technical breakthrough in the domain of machine learning is the discovery and the multiple applications of Generative Adversarial Networks (GANs). Those generative models are computationally demanding, as a GAN is composed of two deep neural networks, and because it trains on large datasets. A GAN is generally trained on a single server. In this paper, we address the problem of distributing GANs so that they are able to train over datasets that are spread on multiple workers. MD-GAN is exposed as the first solution for this problem: we propose a novel learning procedure for GANs so that they fit this distributed setup. We then compare the performance of MD-GAN to an adapted version of Federated Learning to GANs, using the MNIST and CIFAR10 datasets. MD-GAN exhibits a reduction by a factor of two of the learning complexity on each worker node, while providing better performances than federated learning on both datasets. We finally discuss the practical implications of distributing GANs.
zoNNscan : a boundary-entropy index for zone inspection of neural models
Jaouen, Adel, Merrer, Erwan Le
The training of deep neural network classifiers results in decision boundaries which geometry is still not well understood. This is in direct relation with classification problems such as so called adversarial examples. We introduce zoNNscan, an index that is intended to inform on the boundary uncertainty (in terms of the presence of other classes) around one given input datapoint. It is based on confidence entropy, and is implemented through sampling in the multidimensional ball surrounding that input. We detail the zoNNscan index, give an algorithm for approximating it, and finally illustrate its benefits on four applications, including two important problems for the adoption of deep networks in critical systems: adversarial examples and corner case inputs. We highlight that zoNNscan exhibits significantly higher values than for standard inputs in those two problem classes.