AITopics | Security & Privacy

Collaborating Authors

Security & Privacy

News Overviews Instructional Materials AI-Alerts Classics

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers, Greg Yang

Neural Information Processing SystemsMay-31-2025, 14:08:11 GMT

In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers.

artificial intelligence, classifier, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models Yuxin Wen

Neural Information Processing SystemsMay-31-2025, 13:49:16 GMT

It now common to produce domain-specific models by fine-tuning large pre-trained models using a small bespoke dataset. But selecting one of the many foundation models from the web poses considerable risks, including the potential that this model has been backdoored. In this paper, we introduce a new type of model backdoor: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Improving Alignment and Robustness with Circuit Breakers

Neural Information Processing SystemsMay-31-2025, 13:47:33 GMT

AI systems can take harmful actions and are highly vulnerable to adversarial attacks.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

An Algorithmic Framework For Differentially Private Data Analysis on Trusted Processors

Joshua Allen, Bolin Ding, Janardhan Kulkarni, Harsha Nori, Olga Ohrimenko, Sergey Yekhanin

Neural Information Processing SystemsMay-31-2025, 13:34:21 GMT

Differential privacy has emerged as the main definition for private data analysis and machine learning. The global model of differential privacy, which assumes that users trust the data collector, provides strong privacy guarantees and introduces small errors in the output.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.14)
North America > Canada (0.14)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Auditing Privacy Mechanisms via Label Inference Attacks

Neural Information Processing SystemsMay-31-2025, 13:22:16 GMT

We propose reconstruction advantage measures to audit label privatization mechanisms. A reconstruction advantage measure quantifies the increase in an attacker's ability to infer the true label of an unlabeled example when provided with a private version of the labels in a dataset (e.g., aggregate of labels from different users or noisy labels output by randomized response), compared to an attacker that only observes the feature vectors, but may have prior knowledge of the correlation between features and labels. We consider two such auditing measures: one additive, and one multiplicative. These incorporate previous approaches taken in the literature on empirical auditing and differential privacy. The measures allow us to place a variety of proposed privatization schemes--some differentially private, some not--on the same footing. We analyze these measures theoretically under a distributional model which encapsulates reasonable adversarial settings. We also quantify their behavior empirically on real and simulated prediction tasks. Across a range of experimental settings, we find that differentially private schemes dominate or match the privacy-utility tradeoff of more heuristic approaches.

artificial intelligence, deep learning, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Differentially Private Set Representations

Neural Information Processing SystemsMay-31-2025, 12:31:51 GMT

We study the problem of differentially private (DP) mechanisms for representing sets of size k from a large universe.

artificial intelligence, machine learning, probability, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Security & Privacy (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor Shaokui Wei Baoyuan Wu

Neural Information Processing SystemsMay-31-2025, 11:31:59 GMT

Data-poisoning backdoor attacks are serious security threats to machine learning models, where an adversary can manipulate the training dataset to inject backdoors into models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. Unlike most existing methods that primarily detect and remove/unlearn suspicious samples to mitigate malicious backdoor attacks, we propose a novel defense approach called PDB (Proactive Defensive Backdoor).

artificial intelligence, backdoor attack, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Langevin Unlearning: A New Perspective of Noisy Gradient Descent for Machine Unlearning

Neural Information Processing SystemsMay-31-2025, 10:36:41 GMT

Machine unlearning has raised significant interest with the adoption of laws ensuring the "right to be forgotten". Researchers have provided a probabilistic notion of approximate unlearning under a similar definition of Differential Privacy (DP), where privacy is defined as statistical indistinguishability to retraining from scratch. We propose Langevin unlearning, an unlearning framework based on noisy gradient descent with privacy guarantees for approximate unlearning problems.

artificial intelligence, langevin, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation

Neural Information Processing SystemsMay-31-2025, 10:07:45 GMT

Image segmentation is a crucial vision task that groups pixels within an image into semantically meaningful segments, which is pivotal in obtaining a fine-grained understanding of real-world scenes. However, an increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data. In this work, we exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images. Particularly, we propose a novel Unlearnable Segmentation (UnSeg) framework to train a universal unlearnable noise generator that is capable of transforming any downstream images into their unlearnable version. The unlearnable noise generator is finetuned from the Segment Anything Model (SAM) via bilevel optimization on an interactive segmentation dataset towards minimizing the training error of a surrogate model that shares the same architecture with SAM but is trained from scratch. We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures, and show that the unlearnable images can reduce the segmentation performance by a large margin. Our work provides useful insights into how to leverage foundation models in a data-efficient and computationally affordable manner to protect images against image segmentation models.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
(2 more...)

Add feedback

Black-box Backdoor Defense via Zero-shot Image Purification

Neural Information Processing SystemsMay-31-2025, 09:54:10 GMT

Backdoor attacks inject poisoned samples into the training data, resulting in the misclassification of the poisoned input during a model's deployment. Defending against such attacks is challenging, especially for real-world black-box models where only query access is permitted. In this paper, we propose a novel defense framework against backdoor attacks through Zero-shot Image Purification (ZIP). Our framework can be applied to poisoned models without requiring internal information about the model or any prior knowledge of the clean/poisoned samples. Our defense framework involves two steps.

attacked image, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Information Technology > Security & Privacy (1.00)

Technology: