AITopics | Yaghini, Mohammad

Plotting

Yaghini, Mohammad

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Privacy Risk of In-context Learning

Duan, Haonan, Dziedzic, Adam, Yaghini, Mohammad, Papernot, Nicolas, Boenisch, Franziska

arXiv.org Artificial IntelligenceNov-15-2024

Large language models (LLMs) are excellent few-shot learners. They can perform a wide variety of tasks purely based on natural language prompts provided to them. These prompts contain data of a specific downstream task -- often the private dataset of a party, e.g., a company that wants to leverage the LLM for their purposes. We show that deploying prompted models presents a significant privacy risk for the data used within the prompt by instantiating a highly effective membership inference attack. We also observe that the privacy risk of prompted models exceeds fine-tuned models at the same utility levels. After identifying the model's sensitivity to their prompts -- in the form of a significantly higher prediction confidence on the prompted data -- as a cause for the increased risk, we propose ensembling as a mitigation strategy. By aggregating over multiple different versions of a prompted model, membership inference risk can be decreased.

large language model, machine learning, positive rate true positive rate, (13 more...)

arXiv.org Artificial Intelligence

2411.10512

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Regulation Games for Trustworthy Machine Learning

Yaghini, Mohammad, Liu, Patty, Boenisch, Franziska, Papernot, Nicolas

arXiv.org Artificial IntelligenceFeb-5-2024

Existing work on trustworthy machine learning (ML) often concentrates on individual aspects of trust, such as fairness or privacy. Additionally, many techniques overlook the distinction between those who train ML models and those responsible for assessing their trustworthiness. To address these issues, we propose a framework that views trustworthy ML as a multi-objective multi-agent optimization problem. This naturally lends itself to a game-theoretic formulation we call regulation games. We illustrate a particular game instance, the SpecGame in which we model the relationship between an ML model builder and fairness and privacy regulators. Regulators wish to design penalties that enforce compliance with their specification, but do not want to discourage builders from participation. Seeking such socially optimal (i.e., efficient for all agents) solutions to the game, we introduce ParetoPlay. This novel equilibrium search algorithm ensures that agents remain on the Pareto frontier of their objectives and avoids the inefficiencies of other equilibria. Simulating SpecGame through ParetoPlay can provide policy guidance for ML Regulation. For instance, we show that for a gender classification application, regulators can enforce a differential privacy budget that is on average 4.0 lower if they take the initiative to specify their desired guarantee first.

artificial intelligence, machine learning, regulator, (18 more...)

arXiv.org Artificial Intelligence

2402.0354

Country:

North America > United States (0.68)
North America > Canada > Ontario (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)
Leisure & Entertainment > Games (0.88)
Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tubes Among Us: Analog Attack on Automatic Speaker Identification

Ahmed, Shimaa, Wani, Yash, Shamsabadi, Ali Shahin, Yaghini, Mohammad, Shumailov, Ilia, Papernot, Nicolas, Fawaz, Kassem

arXiv.org Artificial IntelligenceMay-27-2023

Recent years have seen a surge in the popularity of acoustics-enabled personal devices powered by machine learning. Yet, machine learning has proven to be vulnerable to adversarial examples. A large number of modern systems protect themselves against such attacks by targeting artificiality, i.e., they deploy mechanisms to detect the lack of human involvement in generating the adversarial examples. However, these defenses implicitly assume that humans are incapable of producing meaningful and targeted adversarial examples. In this paper, we show that this base assumption is wrong. In particular, we demonstrate that for tasks like speaker identification, a human is capable of producing analog adversarial examples directly with little cost and supervision: by simply speaking through a tube, an adversary reliably impersonates other speakers in eyes of ML models for speaker identification. Our findings extend to a range of other acoustic-biometric tasks such as liveness detection, bringing into question their use in security-critical settings in real life, such as phone banking.

artificial intelligence, machine learning, mystique, (21 more...)

arXiv.org Artificial Intelligence

2202.02751

Country:

North America > United States > New York (0.14)
North America > United States > Wisconsin (0.14)
Europe > United Kingdom > England (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Proof-of-Learning is Currently More Broken Than You Think

Fang, Congyu, Jia, Hengrui, Thudi, Anvith, Yaghini, Mohammad, Choquette-Choo, Christopher A., Dullerud, Natalie, Chandrasekaran, Varun, Papernot, Nicolas

arXiv.org Artificial IntelligenceApr-17-2023

Proof-of-Learning (PoL) proposes that a model owner logs training checkpoints to establish a proof of having expended the computation necessary for training. The authors of PoL forego cryptographic approaches and trade rigorous security guarantees for scalability to deep learning. They empirically argued the benefit of this approach by showing how spoofing--computing a proof for a stolen model--is as expensive as obtaining the proof honestly by training the model. However, recent work has provided a counter-example and thus has invalidated this observation. In this work we demonstrate, first, that while it is true that current PoL verification is not robust to adversaries, recent work has largely underestimated this lack of robustness. This is because existing spoofing strategies are either unreproducible or target weakened instantiations of PoL--meaning they are easily thwarted by changing hyperparameters of the verification. Instead, we introduce the first spoofing strategies that can be reproduced across different configurations of the PoL verification and can be done for a fraction of the cost of previous spoofing strategies. This is possible because we identify key vulnerabilities of PoL and systematically analyze the underlying assumptions needed for robust verification of a proof. On the theoretical side, we show how realizing these assumptions reduces to open problems in learning theory.We conclude that one cannot develop a provably robust PoL verification mechanism without further understanding of optimization in deep learning.

adversary, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2208.03567

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (1.00)
Workflow (0.93)

Industry:

Government (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Learning with Impartiality to Walk on the Pareto Frontier of Fairness, Privacy, and Utility

Yaghini, Mohammad, Liu, Patty, Boenisch, Franziska, Papernot, Nicolas

arXiv.org Artificial IntelligenceFeb-17-2023

Deploying machine learning (ML) models often requires both fairness and privacy guarantees. Both of these objectives present unique trade-offs with the utility (e.g., accuracy) of the model. However, the mutual interactions between fairness, privacy, and utility are less well-understood. As a result, often only one objective is optimized, while the others are tuned as hyper-parameters. Because they implicitly prioritize certain objectives, such designs bias the model in pernicious, undetectable ways. To address this, we adopt impartiality as a principle: design of ML pipelines should not favor one objective over another. We propose impartially-specified models, which provide us with accurate Pareto frontiers that show the inherent trade-offs between the objectives. Extending two canonical ML frameworks for privacy-preserving learning, we provide two methods (FairDP-SGD and FairPATE) to train impartially-specified models and recover the Pareto frontier. Through theoretical privacy analysis and a comprehensive empirical study, we provide an answer to the question of where fairness mitigation should be integrated within a privacy-aware ML pipeline.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2302.09183

Country:

North America > United States (0.92)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Dataset Inference: Ownership Resolution in Machine Learning

Maini, Pratyush, Yaghini, Mohammad, Papernot, Nicolas

arXiv.org Machine LearningApr-21-2021

With increasingly more data and computation involved in their training, machine learning models constitute valuable intellectual property. This has spurred interest in model stealing, which is made more practical by advances in learning with partial, little, or no supervision. Existing defenses focus on inserting unique watermarks in a model's decision surface, but this is insufficient: the watermarks are not sampled from the training distribution and thus are not always preserved during model stealing. In this paper, we make the key observation that knowledge contained in the stolen model's training set is what is common to all stolen copies. The adversary's goal, irrespective of the attack employed, is always to extract this knowledge or its by-products. This gives the original model's owner a strong advantage over the adversary: model owners have access to the original training data. We thus introduce $dataset$ $inference$, the process of identifying whether a suspected model copy has private knowledge from the original model's dataset, as a defense against model stealing. We develop an approach for dataset inference that combines statistical testing with the ability to estimate the distance of multiple data points to the decision boundary. Our experiments on CIFAR10, SVHN, CIFAR100 and ImageNet show that model owners can claim with confidence greater than 99% that their model (or dataset as a matter of fact) was stolen, despite only exposing 50 of the stolen model's training points. Dataset inference defends against state-of-the-art attacks even when the adversary is adaptive. Unlike prior work, it does not require retraining or overfitting the defended model.

adversary, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

2104.10706

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback

Proof-of-Learning: Definitions and Practice

Jia, Hengrui, Yaghini, Mohammad, Choquette-Choo, Christopher A., Dullerud, Natalie, Thudi, Anvith, Chandrasekaran, Varun, Papernot, Nicolas

arXiv.org Artificial IntelligenceMar-9-2021

Training machine learning (ML) models typically involves expensive iterative optimization. Once the model's final parameters are released, there is currently no mechanism for the entity which trained the model to prove that these parameters were indeed the result of this optimization procedure. Such a mechanism would support security of ML applications in several ways. For instance, it would simplify ownership resolution when multiple parties contest ownership of a specific model. It would also facilitate the distributed training across untrusted workers where Byzantine workers might otherwise mount a denial-of-service by returning incorrect model updates. In this paper, we remediate this problem by introducing the concept of proof-of-learning in ML. Inspired by research on both proof-of-work and verified computations, we observe how a seminal training algorithm, stochastic gradient descent, accumulates secret information due to its stochasticity. This produces a natural construction for a proof-of-learning which demonstrates that a party has expended the compute require to obtain a set of model parameters correctly. In particular, our analyses and experiments show that an adversary seeking to illegitimately manufacture a proof-of-learning needs to perform *at least* as much work than is needed for gradient descent itself. We also instantiate a concrete proof-of-learning mechanism in both of the scenarios described above. In model ownership resolution, it protects the intellectual property of models released publicly. In distributed training, it preserves availability of the training procedure. Our empirical evaluation validates that our proof-of-learning mechanism is robust to variance induced by the hardware (ML accelerators) and software stacks.

deep learning, neural network, pol, (21 more...)

arXiv.org Artificial Intelligence

2103.05633

Country:

North America > United States > Wisconsin (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Add feedback

Disparate Vulnerability: on the Unfairness of Privacy Attacks Against Machine Learning

Yaghini, Mohammad, Kulynych, Bogdan, Troncoso, Carmela

arXiv.org Machine LearningJun-2-2019

A membership inference attack (MIA) against a machine learning model enables an attacker to determine whether a given data record was part of the model's training dataset or not. Such attacks have been shown to be practical both in centralized and federated settings, and pose a threat in many privacy-sensitive domains such as medicine or law enforcement. In the literature, the effectiveness of these attacks is invariably reported using metrics computed across the whole population. In this paper, we take a closer look at the attack's performance across different subgroups present in the data distributions. We introduce a framework that enables us to efficiently analyze the vulnerability of machine learning models to MIA. We discover that even if the accuracy of MIA looks no better than random guessing over the whole population, subgroups are subject to disparate vulnerability, i.e., certain subgroups can be significantly more vulnerable than others. We provide a theoretical definition for MIA vulnerability which we validate empirically both on synthetic and real data.

health & medicine, neural network, subgroup, (18 more...)

arXiv.org Machine Learning

1906.00389

Country:

North America > United States > Oregon (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Non-Discriminatory Machine Learning Through Convex Fairness Criteria

Goel, Naman (EPFL, Lausanne) | Yaghini, Mohammad (EPFL, Lausanne) | Faltings, Boi (EPFL, Lausanne)

AAAI ConferencesFeb-8-2018

Biased decision making by machine learning systems is increasingly recognized as an important issue. Recently, techniques have been proposed to learn non-discriminatory clas- sifiers by enforcing constraints in the training phase. Such constraints are either non-convex in nature (posing computational difficulties) or don’t have a clear probabilistic interpretation. Moreover, the techniques offer little understanding of the more subjective notion of fairness. In this paper, we introduce a novel technique to achieve non-discrimination without sacrificing convexity and probabilistic interpretation. Our experimental analysis demonstrates the success of the method on popular real datasets including ProPublica’s COMPAS dataset. We also propose a new notion of fairness for machine learning and show that our technique satisfies this subjective fairness criterion.

artificial intelligence, classifier, machine learning, (16 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Law (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback