AITopics | self-adversarial attack

Collaborating Authors

self-adversarial attack

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adversarial Self-Defense for Cycle-Consistent GANs

Dina Bashkirova, Ben Usman, Kate Saenko

Neural Information Processing SystemsFeb-13-2026, 18:56:32 GMT

Neural Information Processing Systems http://nips.cc/

input image, reconstruction, translation, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Industry: Information Technology > Security & Privacy (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

b83aac23b9528732c23cc7352950e880-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 18:56:17 GMT

defense technique, noise, self-adversarial attack, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Adversarial Self-Defense for Cycle-Consistent GANs

Dina Bashkirova, Ben Usman, Kate Saenko

Neural Information Processing SystemsAug-20-2025, 00:13:21 GMT

Neural Information Processing Systems http://nips.cc/

input image, reconstruction, translation, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Industry: Information Technology > Security & Privacy (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Due to the time constraints of the rebuttal, we limited

Neural Information Processing SystemsAug-20-2025, 00:13:05 GMT

We cannot thank the reviewers enough for their valuable feedback on our work. Reviewers 1 and 2: Combine guess loss with additive noise. Most recent advances in adversarial defense methods address "black-box attacks" performed by a The latter incorporates adversarial examples during training to increase the model's robustness to the attack. Therefore the reconstructed image can serve as an adversarially perturbed example of the non-adversarial input image. Reviewer 3: Novelty is not enough as most of the proposed solution or observations are already published.

defense technique, noise, self-adversarial attack, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts

Wu, Yuanwei, Li, Xiang, Liu, Yixin, Zhou, Pan, Sun, Lichao

arXiv.org Artificial IntelligenceJan-20-2024

Existing work on jailbreak Multimodal Large Language Models (MLLMs) has focused primarily on adversarial examples in model inputs, with less attention to vulnerabilities, especially in model API. To fill the research gap, we carry out the following work: 1) We discover a system prompt leakage vulnerability in GPT-4V. Through carefully designed dialogue, we successfully extract the internal system prompts of GPT-4V. This finding indicates potential exploitable security risks in MLLMs; 2) Based on the acquired system prompts, we propose a novel MLLM jailbreaking attack method termed SASP (Self-Adversarial Attack via System Prompt). By employing GPT-4 as a red teaming tool against itself, we aim to search for potential jailbreak prompts leveraging stolen system prompts. Furthermore, in pursuit of better performance, we also add human modification based on GPT-4's analysis, which further improves the attack success rate to 98.7\%; 3) We evaluated the effect of modifying system prompts to defend against jailbreaking attacks. Results show that appropriately designed system prompts can significantly reduce jailbreak success rates. Overall, our work provides new insights into enhancing MLLM security, demonstrating the important role of system prompts in jailbreaking. This finding could be leveraged to greatly facilitate jailbreak success rates while also holding the potential for defending against jailbreaks.

gpt-4v, jailbreak prompt, system prompt, (15 more...)

arXiv.org Artificial Intelligence

2311.09127

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback